you are not logged in


User login

Speakers Corner

Speakers' corner

Thursday: 15:30 – 17:00 (90 minutes)

The workshop-organisers have invited experts in order to obtain information and knowledge that can be used to organise the continuation of the results once the MIXED project is finished. The MIXED project team has a number of specific questions to the experts that might not be addressed without explicitly asking them. The speakers corner was the occasion to address these questions. Here are the questions:

To Alison Heatherington:

The MIXED framework has the intention to implement the “smart migration” strategy. As there are many file formats, priorities have to be set concerning the selection of file formats for which plug-ins will be created. Do you see a role for a format registry (such as PRONOM or UDFR) in this file format assessment process?

-        Short answer: yes. Pronom is technical registry with information about file formats, but also software, hardware and compression algorithms. Details of the format in order to make a risk-score. Different institutions might have different requirements. DROID is identification tool, developed in line with Pronom. Currently there is a consultation round on DROID. Identification based on an internal code. If there are specific questions, we can go to the site and make remarks. UDFR is at very early stages. Requirements are not fixed yet.

-        Dirk: can you elaborate on pathways?

To Amir Bernstein:

A simple and short question: To what extent can MIXED and SIARD benefit from each other?

SIARD: XML based. We use a single file to preserve the database, ZIP64 uncompressed (container). For each table we have another folder. Blops go in a different container. MIXED seems to go in the same direction. Also project eDavid and project in Portugal. Essence: concentrate on the data and less on the presentation. SIARD is related to mandate of the national archive: archive official sensitive databases of government. Can we see schema. SIARD schema can be downloaded from website. Synergies: look at the schemas. Find a way to improve the schemas. Look at the metadata. Usability: look at the interfaces.

To Barbara Sierman:

What is your opinion on the “smart migration strategy”, especially in relation with the “common” emulation and migration strategies for digital preservation?

- It depends on the type of material for which type of migration to use. At the moment most of the material is text based (pdf), not much databases. By changing the strategy a thorough preparation is required. It is more an organisational than a technical issue. Planets project contains information on tools for preservation. It is important that MIXED can be part of the Plato tool and the planets testbed.

To Ellen Kraffmiller:

The Dataverse network project is an open-source software development community. Do you have any recommendations concerning the formation of this community as well as ways to keep this community active? How does the open-source philosophy fit in the business model of the DVN?

Still in the beginning stages of creating a developer community. At the moment own developers work on it. Ingest-component looks like the MIXED converter. The process: to be open on all aspects on the process. Listservers. Bug-tracking. Program source etc. Having users commit: to review outside remarks and make them team-members. The more the stakeholders you have the more developers you get.

To Jeroen Rombouts:

What is your experience regarding the durability of file formats used by researchers in technical disciplines?

The durability of the file formats are not so problematic. What is more problematic are the metadata. What is the value of the data. How many SDFPs would we need? The database type already has a lot of issues: time, geography, etc. can be essential. Currently a lot of questions on the codes and models that relate to the datasets.

To John Doove:

Can you give some recommendations concerning the way the user community (both users and developers) of the MIXED system can be extended?ld

1.     Connection with the dataforum initiative of SURFfoundation

2.     Suggestion: to use RDF format in the MIXED project

Barbara Sierman: repositories-concept has a broad horizon. Not only for universities. Some action concerning digital preservation should be included.

To Marc Kemps-Snijders:

Building on your experience as a developer of software used in scientific settings: can you provide some suggestions concerning the way the software development process can be optimised and continued?

Continuation is always difficult. In the scientific world often software does not have a long term perspective. This is not solved yet. So it is good to go shopping and contact with projects such as Planets and Clarin. Concerning development: stick to what you are good at. Do not do anything that is outside the scope. What functionality should be added. First: create a fast prototype with flaws, but in close cooperation with the researchers. Focus MPI: archive related. In the course of the years a number of tools are developed. E.g. lexicon tools, etc. Ambition to create a set of coherent toolsets. Do not re-invent the wheel.

To Nathan Adams:

To what extend can it be expected that data centres actually use and contribute to the MIXED software? What requirements must be met?

I don’t care about preservation. I am a software developer. Flexibility is important. In case management declares SDFP is standard for certain data types it will be no problem to implement the standard. There are other XML schemas that do the same. E.g. DDI 3 is something that SDFP is doing (combining data and metadata). Fedora has defined FOX ML which to a certain extent SDFP (= container). Schema to represent RDF triples, etc.

To Rainer Schmidt:

Can you elaborate on the way the MIXED software and outcomes of the Planets project might benefit from each other?

In general Planets the project tries to provide an environment with as many as possible tools in order to find a strategy that best fits the situation. It possible to put the MIXED plugins behind the Planets framework. There now about 50 tools in the Planets environment. Nice experiment: roundtrip experiment. This also bring public visibility to the tool. Rainer Schmidt works in interoperability workgroup. Aspect of provenance: Rainer has suggestions to cooperate on this.

To Rob Grim:

What can be done to improve the durability of research data in the social sciences?

I like strategic and practical approach. There is a huge interest in making historical data available. Also from the pre-digital time (only available on paper). What to preserve? Difficult question. Still in discussion. The context is important. The infrastructure for data (compared to publication) is under developed. How relates MIXED to other digital preservation issues. How does this align with ORE (resource maps). “May al your problems be technical”?

To Sebastian Rahtz:

Addressed as a representative of the DARIAH initiative: What is the value of the MIXED framework for the DARIAH initiative? How do you look upon the SDFP as a vehicle to …

plus: Distinction between preservation and representation

plus: Not reinventing a wheel. Building on existing knowledge

plus: Independent components

minus: worry: subject area. Database and spreadsheets are very short on semantics. Risk of losing semantics by simplicfying content model

minus: Standards are not for long period (say about 10 years)

minus: Latex format contains semantics in the document.

To Steven Krauwer:

What are the most important mechanisms to improve the durability of digital data in the scientific speech and language community?

If Clarin wants to provide access to tools, preservation is relevant. Adhere to standards is crucial. There are to many to manage. What is the intended scope of MIXED? Local? International? There is need for preservation of enhanced data. They are more complicated than simple formats. Also attention for audio and other new formats.

To Vladislav Makarenko

Is digital durability an issue in the eSciDoc project? If so, how is it implemented?

- Using standards is important. DC, ontologies. For interoperability a number of standards are used. Open Source is important. Software can be used by others.