Computing for Science

The Computing for Science (CS) group supports ILL scientists, students and visitors in a number of activities including data analysis, instrument simulation and sample simulation.

Back to ILL Homepage

Export Data

retour

Exporting data from ILL

NEWS

The following note remains valid for moving large volumes of data from ILL. You may also use IDA - Internet Data Access from a web-browser to extract data from the raw data archives to your local system.

Summary

"zip" is proposed as a good compression and file packaging utility for transferring data files from ILL both to Unix and non-Unix systems. Documentation on the use of zip and unzip is available here. The user is recommended to prepare the transfers correctly at ILL, since only certain versions, (e.g. Unix) have all features available.

 

The firewall at ILL intercepts direct ftp transfers with the outside Internet, requiring a modified procedure which can only be used while logged on at ILL. Current executable copies of zip and unzip for most systems can be found at the Sun Archive Site for example.

Introduction - Means, Media and File-names

The general introduction of the use of Unix workstations over the past few years has lead inevitably towards complete use of ASCII (text) files both as a raw data storage form, and for treated results. The added overheads of storage are acceptable and the primary advantage is that there is no need for any specialised browser program - the files can be typed or printed directly.

 

Different computer systems, unfortunately, have different ways of signalling the end of each line. On Unix systems the \n or new-line character (LF) is used. On Macintoshes the carriage-return (CR) character is used. On DOS related PCs the terminator is (CR)(LF). On OpenVMS a propriatory variable format record is usually used where the record is preceded by its length as a sixteen bit number, though the system seems happy to accept any of the above choices in addition (version 5.5 onwards tested).

 

In wishing to transfer data to a non-Unix system the simplest method for a small volume of data is to use network services. ftp, the file transfer protocol set to ascii mode transfers will deliver a file correctly formatted. Thus even within ILL transfer using ftp to a Macintosh or PC will allow diskettes to be written in a readable format appropriate to each system, Users are strongly advised to avoid the unreadable confusion which results from copying text files on a Macintosh to a diskette which has been formatted for a PC.

 

When the data volumes, and also the number of files, are larger then some automation of the transfer process is desirable and because most data are simply regular arrays of numbers the files can be reduced dramatically in size (between 3 and 5) by compression techniques. The zip program has been chosen as a means to achieve these two effects, yet still be compatible with a wide range of different systems.

 

A final consideration before transferring many files to remote systems concerns compatibility of file names across heterogeneous systems. From a conservative past many ILL programs produce files which have a simple filename with a single dot separator from a short extension name, and no embedded blanks. Today the DOS 8.3 format can be taken as acceptable as a base standard for all systems. The full directory path specification is in general unique to each system type. Unless one is transferring files between similar system types it is easier to transfer transfer contents of individual directories as separate zip files.

The simplest example

zip outfile *.dat       

zip outfile *.f
creates the archive outfile.zip (or adds files to an existing archive file) and puts all xxxxx.dat files into it (the shell ignores files starting with a period) after compression. The second command adds the xxxx.f files. If a filename is given which is already in the archive the earlier file is effectively overwritten. No subdirectories are included in this case.

Example for creating a PC compatible zip file

 zip -jkl outfile *
This creates the outfile.zip in MSDOS fashion. The switches have the following actions.
-j   only filenames(not full path) are stored

-k   Attempts to convert names(and paths) to MSDOS, marking file as if it were written under MSDOS.  Compatible with PKUNZIP.

-l   Translates Unix end-of-line character LF into MSDOS convention     CR LF.

Remember to transfer zip files to the remote site using ftp in BINARY mode

Raw Data

File names were deliberately chosen as six digits to allow data-compression utilities to add an extension as suffix to indicate the compressed status, while still keeping to the DOS and OpenVMS naming standards. Raw data from the current cycle and the preceding cycle are mostly kept on-line in uncompressed format. Older data are systematically compressed using the Unix compress utility, q.v. gnucompress, etc for other systems.

Export of long sequences of data is most simply performed by writing a permanent archive on a CDROM. A standard volume format, ISO9660, allows the disk to be read on most systems equipped with a CDROM drive.

The operators in ILL19 will help visitors write data, and include suitable de-compression utilities on the disk to resurrect data at the destination. Further information on these procedures may be obtained from the HELP-DESK (tel. 7013) si(at)ill.fr.

 

For small quantities of data it is possible use zip:
e.g. for a short sequence of SANS data:

zip -jkl myd11.zip /usr/illdata/data/d11/0145*
zip -jkl myd11.zip /usr/illdata/data/d11/01460*


will compress the raw ascii files for first runs 14500-14599 then add the files 14600-14609 into the file myd11.zip For D11 a set of one hundred standard 4k datasets occupy about 700Kb when zipped.

Remember to transfer zip files using ftp in BINARY mode

To top