DOIs for data
The ILL, with the support of the European Commission, is one of the first institutes in Europe to have developed and implemented a policy for the management of experimental data.
As early as 2012, the ILL began assigning identification numbers to the datasets produced after every single experiment. This number is known as a DOI, or Digital Object Identifier, and is a persistent link. It is vital that users adopt the DOI system to reference their data.
The DOI is an identifier allowing data to be traced from production to publication. By citing DOIs in all their publications, users guarantee the traceability of all the details of their experiment. This includes the request for beamtime, the experimental parameters and conditions, the instrumentation used, the data obtained, the analysis of this data, and the names of the research team members.
Promotion, Reliability & Recognition
Data DOIs promote experiments to peers and potential funding bodies, as well as to publishers and journals. Some of these already insist on access to this data before they will validate a publication. Citing the DOI in an article actually speeds up the review process!
It also helps to demonstrate the reliability of the results, giving access to the experimental conditions, and makes it easier to understand the resulting findings.
Also, making data available will allow new research to be carried out on the same topic in the future, because the DOI is a permanent identifier.
What does it look like?
All our data DOIs start with '10.5291/ILL-DATA.' and simply end with the proposal number of the experiment.
It looks like this, for example: ILL-DATA.8-01-418 as in https://doi.ill.fr/10.5291/ILL-DATA.8-01-418
(or http://dx.doi.org/10.5291/ILL-DATA.8-01-418 )
while the DOI of a publication looks like this, for example: 10.1107/S2052252519008285 as in http://dx.doi.org/10.1107/S2052252519008285
Interview with Petr Čermák, one of our users and a strong advocate of Open Science
We have set up and recorded an interview to discuss with Petr Čermák, one of our users, about his motivations to endorse and promote good practices in open data.
Through a set of 11 questions, he explains the kind of cultural change that is required for Open Science to be a success - and what refrains it most. He details the pivotal role education plays in this change of practice. He tells us how he showed his mini-school students - enrolled in hand-on sessions - how to go about... not only analysing open data, but also making all the necessary scripts available to everyone.
Beyond his own successful experience of data management at the ILL, he shares his views on the major arguments for institutions and large scale facilities to promote open science practices - loosing less experimental data and welcoming more scientists - and more nations - in the neutron community by helping them analyse experiments they did not conduct.
Finally, it describes how better open data practices could also have an impact on the way scientific publications referee and publish research results.
ILL Data Policy
The ILL was the first international scientific user facility to publish a “Scientific Data Policy” in November 2011, just before the opening of the December 2011 proposal round. The text came into force in October 2012, and prescribed a default non-disclosure period of three years during which access to data is restricted to the experimental team; in cases where no request for data has been made, this period would be extended to five years.
Following the publication of the policy, the ILL created an interdisciplinary working group, the DPP (Data Protection and Processing). One of its first missions was to drive the development of the software tools needed to put the policy into practice, with a focus on usability (especially during experiments) and security. A new data portal (data.ill.eu) that implemented the changes was launched in 2014. It enables visitors to search all textual metadata related to an experiment and quickly retrieve all data of interest related to search criteria, while also implementing access restriction for data not yet public.
When inserted in publications, ILL's data DOIs allow readers to obtain more information about the referenced experiments, access the ILL Data Portal and even request access to the experimental team if the data are not yet publicly available.
The DPP continuously upgrades the data policy so as to reflect the evolution of the Data Management tools available at the ILL. The latest version of the ILL Data Policy was adopted in July 2017.
Consult the ILL Data Policy. (pdf - 258 Ki)
Central facilities for neutron scattering and synchrotron X-rays in Europe keep working together to develop and share infrastructure for the data they collect. Such co-operation will make it easier and more efficient for users to access and process their data, and provide more secure means of storage and retrieval. It will also increase the scientific value of the data by opening it up to a wider community for further analysis and fostering new collaborations between scientific groups.
Consult, download, share and manage your data
In order to control access to the experimental data obtained at the ILL in a coherent and secure fashion, the ILL has recently developed a single portal for consulting, downloading and managing your data.
Here “data” is understood to mean raw data (i.e. numor files), processed data, and meta-data (e.g. log files or “logs”).
This webportal offers:
- Global text search on all documents related to ILL experiments (proposals, data files, reports …).
- Advanced search allowing filtering by (co-)proposer, instrument, numor, cycle, dates of experiment …
- Presentation of the list of experiments matching the search criteria (note that a single proposal could involve >1 instruments and thus >1 experiments).
- Data and information relative to these experiments can then be accessed/download depending on your access authorisation. As an ILL user you can access: data obtained from your own experiments, data from an experiment whose main proposer has specifically granted you access, or data which has been made public.
- For each experiment’s proposal a set of tabs points to data files and other information (link to the DOI, the proposal text, the logs …).
- Provided that you are the principal investigator (PI) (normally the main proposer), the “Members” tab allows you to grant another person access to the data, or even to make the data public.
- Once you have access authorization to a given proposal, the “Data folders” tab allows you to access the content of various sub-folders (pertaining to a particular experiment) containing either raw data or processed data or log files.
- Alternatively the “Data Ranges” tab allows an authorized user to select and download specific ranges of raw data files corresponding to e.g. different samples and/or temperatures.
- Be aware that notification will be sent to the PI of the proposal whose data have been downloaded, identifying the downloader to him.
- Also be aware that the step of making data public is understandably irreversible.
Note that other data access tools coexist with this portal:
- IDA allows downloading of raw data files that predate the ILL Data Policy (i.e. before cycle 123 of Autumn 2012).
- Sftp on dt.ill.fr (dt stands for Data Transfer).
data.ill.eu, the ILL web portal is the recommended solution for accessing, managing and dowloading your experimental data, nevertheless in case of large volume the http protocol does not provide sufficient mechanisms to ensure smooth and reliable transfers, you can do sftp on dt.ill.fr or opt for any solution dedicated to large data transfer.
Publications and DOI: If you publish results of ILL data, either your own data, data to which you were granted access, or data that were made public, the ILL expects you to cite the DOI reference using the specified format.
One of our users' testimonial
Everyone speaks about #openscience but who really does it? OUR TEAM! We just opened our data already during the experiment, thanks to @ILLGrenoble it was easy to do! RT and collaborate with us in data analysis - see thread for details. https://t.co/GBApLEbAJe#opendata@EOSC_eupic.twitter.com/deJYJfr6oJ— Petr Čermák (@petrscience) February 9, 2020
This service is intended for ILL users that would like to download large volume of experimental data.
In order to use this service you need a sftp client software on your local computer, we recommend the open-ssh solution (the standard Command Line Interface for unix based systems) or FileZilla (Windows, Linux, Mac Os X) for those who prefer a graphical environement. Nevertheless any sftp software should work out of the box but pay special attention to the fact that it provides the functionallity to resume failed transfers.
This service is hosted by dt.ill.fr (standard SSH port, i.e TCP 22), you need to authenticate using your ILL account. Once connected, you will find the usual "MyData" folder containing the proposal data folders organised by Year or Instrument.
> sftp email@example.com
sftp> cd MyData/byProposal/
exp_8-05-XXXX_in13 exp_8-05-XXXX_in5 exp_9-13-XXXX_figaro exp_TEST-XXX_d1b
sftp> cd exp_TEST-XXX_d1b
histo logfiles processed rawdata
sftp> get -ra rawdata
Fetching /net4/serdon/illdata/141/d1b/exp_TEST-2368/rawdata/ to c:Temp
/net4/serdon/illdata/141/d1b/exp_TEST-23999/rawdata/222998.nxs 100% 14KB 13.9KB/s 00:00
/net4/serdon/illdata/141/d1b/exp_TEST-23999/rawdata/222999.nxs 100% 14KB 13.9KB/s 00:00
The '-a' option is important as it allows to resume failed transfer.
In case of difficulties, please contact data(at)ill.eu
Note: This service is also usefull if you want to upload reduced or analysed data into the "processed" folder of your experiment.
Rsync, GridFTP, ...
Transfering very large data volume (multi TB datasets) over the internet could be tedious even in 2017, so please, do not hesitate to contact us in case of difficulties with the standard services provided, more specific solutions could be offrered on demand.