This section is meant only for LAPP and LAPTH users
MUST reference in your publications
Conformément au principe qui avait été retenu dans un comité de pilotage précédent, je vous remercie de bien vouloir faire figurer le phrase suivante dans toute publication dont les résultats s'appuient sur les facilités de calcul et de stockage offertes par le méso-centre de calcul de l'Université Savoie Mont Blanc. "Ce travail a été réalisé grâce aux services offerts par le méso-centre de calcul MUST de l'Université Savoie Mont Blanc".
Please do not forget to add a reference to MUST in your publications if you use the cluster computing facility to obtain your results. "This work has been done thanks to the facilities offered by the Université Savoie Mont Blanc MUST computing center".
Your message will be managed by the LAPP computing departement helpdesk (Request Tracker). Please avoid contacting directly a member of IT department team and use the following generic address: LAPP-LAPTh IT support
>>> LAPP IT Department documentation
Some documentation can also be found here :LAPP IT department documentation
>>> Available CPUs
You can check the load of the system on the MUST monitoring Web page.
>>> Available lapp_data disk space
This link is only accessible from LAPP network: http://lapp-quattor/Monitoring/lapp_data-quota.html
User interface machines (UI)
As LAPP or LAPTH user you have access to the following interface machines :
To login to the machine : ssh <user_name>@lappsl6.in2p3.fr or ssh <user_name>@lapthsl6.in2p3.fr
This machine is an user interface (UI) you will have to log to in order to prepare and submit your jobs to the computation farm. Its computing characteristics are identical to the computing machines one.
Technical characteristics :
- OS : SL6 - 64 bits
- compilers : C, C++, Fortran 77/90
- OPENMPI, MPICH1, MPICH2 (parallel computation)
- libraries : Blas, Lapack
in case you need any other compiler and/or library, send an e-mail to LAPP-LAPTh IT support
Developing/testing batch job
The lappsl6 and lapthsl6 machines environment are identical to the computing machines one. So you can compile and test your jobs on this machines before submiting them to computation farm.
Submiting batch jobs
>>> Very very very important hints
|It is not possible to access your home directory (/home1 or /home3) from computation machine|
If you have created a symbolic link to /lapp_data from your home directory it will not be possible to use this link in your batch job.
Always submit your jobs from your /lapp_data/... working directory
If you specify an output file name in qsub command (-e,-o option), this file must be in your /lapp_data/... working directory
>>> Batch setup
Before submitting jobs you need to initialize some Linux environment parameters:
- A basic setup file can be found here : setup file
You simply need to define the LAPP_APP_SHARED parameter at the top of the file
If you don't specify in which directory your job should work, the default one will be $PBS_O_HOME
If your job is making a lot of I/O access while running, it can be to read/write your file locally on the worker node instead to your working directory. In this case you can use the WN local directory $TMPDIR to read/write temporary your data file.
>>> How to submit a job : qsub
Use qsub command to submit your job to computation farm:
>> qsub -V -j oe -o sOutputFileName -M mailing_address -m aeb -l walltime=01:00:00,mem=512mb sBatchFileName.sh where: sOutputFileName is the output log file name standard output and error are redirected to sOutputFileName via "-j oe -o ..." options -M <your_email_address> -m aeb ( a: abort, e: end, b: begin ) if you wish to receive an e-mail in case of error/end of job you have to specify both -M and -m options -l walltime is maximum time (hours:minutes:seconds) mem is the memory sBatchFileName.sh is the file you want to execute on computing machines
By default, walltime is set to 56 hours. It can be checked using the following command:
>> qmgr -c 'list queue local' | grep walltime
In case you specify a different walltime your job will be stopped if its execution time exceeds defined walltime.
The maximum time allocated to a job is 56 hours. Every job will be killed beyond this time.
>>> Job memory and CPU :
The information relating to job CPU/memory consumption will not be append automatically to log file. The user have to add the option -M <your_email_address> -m aeb whil submitting job to get these information by mail when job is done.
No mail is sent to user if a job is killed because it exceeds walltime limit
>>> Test queue : option -q flash
A specific job queue is dedicated to make test. Maximum wall time allocated to these jobs is 5minutes.
To use this queue add the following option is your qsub command : -q flash
This queue is not intented to be used intensively by short jobs
>>> MPI jobs (multiprocessor jobs)
There are 3 MPI libraries available on the cluster, MPICH-1, MPICH-2 and OpenMPI.
The maximum number of processors allocated to a job is 32
To run a MPI job, you need to create a script including lines like the following ones :
CPU_NEEDED=8 # this parameter is the number of CPUs (from 2 to 32) to be reserved for parallel execution
$MPI_OPENMPI_PATH/bin/mpicc -o <your exe> <your source> # if you wish to compile your code before running it
$MPI_OPENMPI_PATH/bin/mpirun -np $CPU_NEEDED <your exe> # to run the executable on the number of CPUs you just asked for
: If your MPI jobs are really I/O intensive, you can use a specific bunch of computing nodes to improve performance. These nodes are connected within a dedicated Infiniband network and jobs can be submitted to theses nodes using an additional option '-q localIB' to the usual 'qsub -lnodes=....' command.
>>> Jobs with licensed software
Several licensed softwares are currently available on the computation farm for LAPP & LAPTH users :
- Abaqus (for Meca users)
- Samcef (for Meca users)
They have been installed in /grid_sw/software/softs_cc/ directory. Default binaries that can be used are :
An additional module to Mathematica for High Energy Physics (FeynCalc) has been installed in /grid_sw/software/softs_cc/Mathematica/6.0/AddOns/Applications/HighEnergyPhysics
: For further information on all available versions, please refer to this web page.
: If you wish to run another licensed software on MUST, please contact LAPP-LAPTh IT support and we will let you know if it is possible.
Monitoring batch jobs
>>> Job monitoring : showq
The showq command shows the list of all jobs submitted to the computation farm.
This command displays the list of running jobs, the batch farm activity and the list of iddle jobs.
For each job, one can get information about job identifier, user name, number of processor used by job, start and remaining time.
>> showq ACTIVE JOBS ---------------------------------------------------------------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME 52800 plokiju Running 8 9:37:03 Sat Jun 30 19:39:16 53024 lhcb023 Running 1 1:02:27:48 Sun Jul 1 12:30:01 ... 121 Active Jobs 128 of 160 Processors Active (80.00%) 32 of 32 Nodes Active (100.00%) IDLE JOBS ------------------------------------------------------------------------------ JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 53203 lhcb023 Idle 1 1:12:00:00 Sun Jul 1 20:28:17 53204 lhcb023 Idle 1 1:12:00:00 Sun Jul 1 20:28:18 53205 lhcb023 Idle 1 1:12:00:00 Sun Jul 1 20:28:33
Don't worry if your jobs don't appear instantaly in job queue or if they start at last range of iddle jobs queue
: Due to a limitation of the scheduler capacity to deal with a too large number of jobs the idle jobs queue size is limited, so some jobs may appear as "blocked" when the cluster is overloaded. This does not mean that the given jobs were rejected by the scheduler, they will be processed automatically (switch to idle status) as soon as some CPU are available.
>>> How to cancel a job : canceljob
>> canceljob <job_id>
To get the job ID, use the showq command defined above.
>>> Check user's priority : diagnose
The user's priorities can be checked via the diagnose command
>> diagnose -f
The output of this command is not really "human readable"...