print

ILL cluster

Description

The ILL cluster is composed of 7 queues :

  • q.2010_08
  • q.2012_16
  • q.2014_16
  • q.2015_16
  • q.2020_32 
  • q.nbpc_16
  • q.test

 and is managed via pbs batch jobs. Few constraints are implemented but one should comply with the Cluster etiquette which is recalled at each connection.

1. Prepare the input files for your job. Verify the code documentation to check your memory/cpu needs
  (some codes provide utilities for this) and how well the code scales with the number of cpus.
   If this is not possible, remember that as a rule of thumb our experience shows that for most of the
   software installed in the cluster using more than 64 processors is rarely efficient. So if you plan/need
   to use a larger number of nodes, test first how your calculation scales with the number of requested cpus.

2. Check the cluster load (command qload or at http://masterp.ill.fr:8080/sysmgt/mainframeset.jsp)

3. If possible, run a short test of your job using a single node. This is useful to check that everything
   is fine and to get a reference of the expected running time in a single node.

4. Select a queue and a number of nodes and launch your job. Please try to follow these indications:

  4.1 If other queues are available and your job does not require a large amount of memory, avoid using
      the queue q.2015_16. The nodes in this queue have much more memory than those in other queues, so
      it is better to reserve them for jobs that really need such large memory.
     
  4.2 Request a SINGLE node in queues q.2014_16 and q.2020_32. At present, using more than one node
      in these queues is highly inefficient or even counterproductive. If you have some examples of a
      particular software or job that scales well there, please inform us about it.
     
  4.3 Request full nodes for your job, i.e. ask for 8 processors per node when using q.2010_08, 16 for
      queues q.2012_16, q.2014_16 and q.2015_16, and 32 for q.2020_32. Pay particular attention to this
      when submitting Materials Studio jobs to the cluster.
     
  4.4 The only hard limitation applied in our cluster is that a user can not have more than 3 jobs at the
      same time on a given queue. Additional jobs will be queued. There are no limits in the number of nodes
      per job, but take into account the cluster load before requesting a very large number of nodes,
      in particular if you are going to run long jobs (> 1 week). It is fine to have a long job running
      on several nodes. It is also possible to have more than one long big job running at the same
      time if there are enough nodes available, but in this case you could be requested to stop the
      additional jobs if other users' jobs start to accumulate in the queue.

5. Check regularly if your job is still running, if it is producing useful results, and if the speed of
   execution is as expected. Delete stalled jobs and reduce the number of nodes used by a job if you realize
   that its scaling is such that it's not worthy to use so many nodes.

6. Clean regularly your space in the cluster disk. The disk capacity is 19 TB, so as a rule of thumb you
   should never occupy more than 500 GB, and possibly much less.

 

Connection

On an Unix machine, the cluster can be accessed by ssh:

ssh username@masterp.ill.fr

On Windows, one can either use an ssh client or use the command line 

NB: to connect to the cluster, one needs to be on the ILL network or behind a VPN. It is also poccible to connect via a visa instance https://visa.ill.fr/home 

Installed software

Installed packages are found in /softs/applications and include

Example scripts are available in /softs/common/examples

How to launch a job on the cluster

TODO