Close This Window

Please download official ILL logos here

 

For using on the web or on a screenFor printing in high resolutionWhite version, for dark backgrounds

Download PNG

Download AI

Download white PNG

Download JPG

 

Download white AI

Cluster

The Computing for Science (CS) group supports ILL scientists, students and visitors in a number of activities including data analysis, instrument simulation and sample simulation.

Back to ILL Homepage
English French Deutsch 

Cluster

HOW TO USE THE ILL CLUSTER

Remy Mudingay (SI) and Mark Johnson (CS)

September 2010

 

Please read this document AND contact either of the authors before using the cluster.

 

Historical

The CS investment budget, with some contributions from other groups, over several years has been used to establish a small computational cluster, which included a range of machines from 2 –processor nodes through to blades with 16 cores, referred to as AMD, FS, Blades, etc. These machines were connected together via Ethernet, allowing several nodes to be used for VASP/DFT calculations, but most other codes, like DL_POLY and NAMD, could not be run efficiently outside one node.

In 2009, the CS budget was used to purchase a new cluster consisting of 43 8-core nodes connected by a fast network (infiniband). At the end of 2011 and 2012, 12 and 16 core machines (respectively) were added (see table below).

There are currently 81 compute nodes giving 780 cores, which imposes the following way of working ...

 

Current set-up: disk usage, SGE queuing system and queues

The cluster can now ONLY be used via a queuing system, the Sun Grid Engine (SGE – also called Oracle Grid Engine: OGE).

Users can ONLY log-in to one machine: “master.ill.fr”.

The majority of user files should be kept in /archive/”username”, only files for current calculations should be in /users/”username”. The /users file-system is rapid and not be saturated to ensure good performance.

All jobs are submitted to the SGE from “master.ill.fr”.

Different machine architectures are grouped together in queues.


Queue

date

# nodes

# cores/node

Total # cores

amd16

Pre 2009

3

16

48

amd16m

Pre 2009

3

16

48

intel8

End 2009

43

8

344

intel12

End 2011

4

12

48

intel16

End 2012

16

16

196

bpc

 

12

8

96

 

 

 

 

780

 


The bpc queue belongs to the bureau d’etudes and is used mainly for reactor simulations


Policy and best practices

 

All calculations needing a large number of cores can be run in the intel queues because of the fast “infiniband” network.

A reasonable, maximum number of cores per job is 32.

In amd16 queues, the number of cores requested should not exceed the number of cores per node, 16, so the job should run within one node. The Ethernet network is not fast enough to allow efficient calculation for these codes across several nodes.

Tools for inspecting the status of the cluster

 

1/ web pages: http://master.ill.fr

The four items in the information section are useful.

“Grid queues” gives a summary of the queues, allowing one to see which queues are full/available.

“Cluster load” gives a visual display of the cluster use. This will be improved to show information per queue.

“Grid engine” shows textual information about all nodes. It is useful to see the “load” of a node e.g. 8 for an “intel8” node, which shows that it is fully loaded but not overloaded. If the load is greater than 8 there is a problem with the SGE because too many processes are running on one node.

2/ at the command line on “master.ill.fr”

qstat: shows jobs submitted or running

qstat –f: shows all jobs

3/ qmonitor

Type “qmon” at the command line. The display gives access to many aspects of the SGE.

Submitting a job to the queue: general


1/ choose which queue to use

2/ prepare a submit file e.g. MySubmitFile

3/ send the job to a queue e.g. to amd2.q

    qsub –q intel8.q MySubmitFile

Use qstat to see which node receives the job. If more than one node is used there will be a file written in the local directory by the SGE indicating the nodes. This information can also be seen at the bottom of the “grid engine” web page (http://master.ill.fr/sge/).

CHECK that the nodes used are not overloaded (see above)

To delete a job, use ‘qdel jobid’ where jobid is the number of the job given by qstat.

Submitting VASP, DL_POLY, NAMD jobs

 

Use and edit the appropriate scripts from /home/cs/model/cluster_utilities

Submitting a Materials Studio job

 

Install Materials Studio client (if necessary, Windows only).

From the “tools/server console” define a new gateway to “master.ill.fr”.

From the chosen calculation module (e.g. CASTEP or FORCITE) use the “job control” panel to select the gateway “master.ill.fr” and the appropriate queue from the next drop-down menu.

Use the tools mentioned above to check where the job is running and that it is running properly.

 

 

 

 

Current set-up: additional machines

 

Cluster: previously head node of cluster, license server for Materials Studio