LHCb Grid

From LHCB-LAPP

Jump to: navigation, search

There are basically two ways of running jobs on the Grid :

- the main (or common) one guiding by the Ganga software using Dirac backend

- a freer way still using Ganga but allowing you to send Jobs wherever you want and how you want.

You can visit the "support-applicatif"[1] twiki site for more informations

Contents

How to begin on the grid : Ganga

But firstly one has to become a "Grid user", and then will be able to get into the "Grid world".

There are basically two ways as said before to send Jobs on the Grid. Both are sent with Ganga software[2]. Ganga is a very simple software to use [3], it's used by both LHCb and Atlas. Scripts sending jobs have to be written in Python [4], you can also send jobs interactively but good luck, you'll be limited quite quickly... This is why one advices you to use directly scripts. To execute them just write :

You can chose the release of Ganga you want to use in /grid_sw/software/ganga/. By default in /lapp_data/lhcb/LHCBEnv.csh the release /grid_sw/software/ganga/slc3_ia32_gcc323/ganga-4.4.11/install/4.4.11/bin/ganga is aliased with the command "ganga" :

#ganga
alias ganga /grid_sw/software/ganga/ganga-4.4.11/install/4.4.11/bin/ganga

Ganga configuration

Before using ganga for the first time :

/grid_sw/software/ganga/ganga-<Version>/install/<Version>/bin/ganga -g

The first time you run Ganga, it creates the directory $Home/gangadir and the whole tree with

- repository
- workspace
- gui

In repository, all your historic is written. If you delete everything, next time you start Ganga, the number of your first job would be 0.

In workspace, you can find the inputs and outputs you've asked for. You can check

- all your job's characteristics
- why your job crashed
- the output of the application ...

To be coherent with each other in the LHCb group we've all created a $HOME/gangadir/lhcb directory. We put there our :

- .opts and .sh we send in our jobs
- .py we execute on Ganga

>>> Ganga configuration is made in following order :

  • Default configuration file is applied first :

LHCb : /grid_sw/software/ganga/ganga_<release>/install/<release>/python/GangaLHCb/LHCb.ini

  • Configuration parameters are overwritten by those defined in your $HOME/gangarc file

To overwrite default /grid_sw/software/ganga/ganga_<release>/install/<release>/python/GangaLHCb/LHcb.ini create your own file $HOME/GangaLHCb/LHCb.ini and initialize your own parameters. This is done when you source /lapp_data/lhcb/LHCbEn.csh or LHCbEnvII.csh

If you don't do that Ganga doesn't know all the LHCb projects. Just check the projects with

help
and then index

Configuration needed for the tutorial

* start ganga
* print config                              <= will show you all the paramater groups
* print config.LCG                          <= will show you for exemple the LCG backend configuration

* print config.LCG.VirtualOrganisation            <= will show your VO
* print config.LCG.GLITE_ENABLE                   <= will show LCG software configuration
* print print config.defaults_LCG.middleware      <= will show default LCG environment

if config.LCG.VirtualOrganisation is different from atlas or config.LCG.GLITE_ENABLE is not set to True or print config.defaults_LCG.middleware is not set to GLITE

Launching ganga, it checks the file $HOME/.gangarc [5]. You have to specify into this file the name of your voms (~line 750), the WMS you want to use. You need to change the line ~207 with

 voms                   => voms = lhcb
 GLITE_ENABLE = False   => GLITE_ENABLE = True
 middleware = EDG       => middleware = GLITE    
If you're using Ganga release above 5.0.4 included you need to
SetupProject <Application> <Version>

Ganga releases are installed at /grid_sw/software/ganga, you can chose the Ganga you want to use :

/grid_sw/software/ganga/ganga-<version>install/<version>/bin/ganga

and do the same you do on lxplus.

Ganga Releases

Releases less than 5.0.3 (included) work with .opts and others with .py [6]

The above trouble with Ganga can occur only if you send Grid Jobs with the Dirac or Local backend. Otherwise iwth the LCG backend, it works with both .opts and .py.

Before sending a job you can check the variables you can fill with

j=Job()
print j

Hello world example

To submit a job with ganga, you need to create a python script describing your job and submit your job with an execfile("my_script.py").

Create a python script Hello.py :

j = Job()
j.application = Executable(exe="/bin/echo", args=["hello world"])
j.backend=LCG()
j.submit()

And run the script in ganga environment : execfile("Hello.py")

Corresponding jdl file (written by hand):

 Executable = "/usr/bin/echo";
 Arguments="Hello world";
 Requirements = "";                      <= no requirement
 MyProxyServer = "myproxy.cern.ch";
 InputSandbox = {};                      <= input data for job : no input 
 OutputSandbox = {                       <= retrieve job standard and error outputs
    "stdout",
    "stderr",
 };
 StdOutput = "stdout";                   <= name of job standard and error outputs
 StdError = "stderr";

 JobType = "NORMAL";
 RetryCount = 3;

direct job submission (without Ganga)

edg-job-submit --vo lhcb <my_jdl_file>
edg-job-list-match --vo lhcb <myjdl_file> gives the list of all CE's matching the job requirements ( usefull command if job fails while submission with no output file)


Corresponding Ganga jdl file is : ( can be found in $HOME/gangadir/workspace/<user_name>/local/<job_number>/input)

VirtualOrganisation = "lhcb";
Executable = "__jobscript_29684__";
MyProxyServer = "myproxy.cern.ch";
StdError = "stderr";
InputSandbox = {
   "/home1/elles/gangadir/workspace/elles/LocalAMGA/29684/input/_input_sandbox_29684.tgz",
   "/home1/elles/gangadir/workspace/elles/LocalAMGA/29684/input/_input_sandbox_29684_master.tgz",
   "/home1/elles/gangadir/workspace/elles/LocalAMGA/29684/input/./__jobscript_29684__"
};
ShallowRetryCount = 10;
Environment = {
   "GANGA_LCG_VO='lhcb'",
   "LFC_HOST='lfc-lhcb.cern.ch'"
};
RetryCount = 3;
OutputSandbox = {
   "stdout.gz",
   "stderr.gz",
   "__jobscript__.log"
};
StdOutput = "stdout";
JobType = "NORMAL";

Compiling or linking a library

When you arrive on a cluster the environment is setted by the job, so you can work in two different ways :

- compiling on this cluster

- using the existing executable

The first one can be dangerous because, maybe your code won't compile in this site. The other trouble with it, is that for each job you have to compile which can take a huge amount of time.

If you want to link a library, for example like LibDaVinciUser.so, you need to put it into your inputSandBox in Ganga.

Exercices

Send the same DaVinci Job than interactively with the Dirac backend :

- chose the last release of ganga

- build a simple .py to send your job

- here is the answer [7]

"Free way" running Jobs (using the LCG backend)

This is an alternatively way of running jobs because LHCb politics allows analysis to run only on Tiers-0 or Tiers-1 using Dirac backend. But as lappiens, we have advantages running here, this is why we have created this way of sending Grid Jobs using LCG backend. Even thought, we've got a better data access on the lapp-SE running here than on the whole Grid. Running on the whole Grid, access protocols to DATA are more complicated than running on a cluster where your DATA are.

Firstly you have to copy those lines : [8] into the file (which you have to create) $HOME/gangadir/lhcb/ConfigJobOptionFile.py, because as we will see it's included by all executable .py :

import os,commands,sys
import ConfigJobOptionFile

Then in your $HOME/gangadir/lhcb, we propose you to create a tmp directory where you can put the "original" .opts [9] and .sh [10] files which will be modified executing the .py and modified ones will be stored in your $HOME/gangadir/lhcb and sending with your job in the inputsandbox.

The .opts

The .opts are the .opts using to run a normal job, because arriving on a cluster the Application environment is setted so the "#include blabla" are recognised.

Like we will see with the .py files, the .opts can (and usually have to) be modified by the .py. It is used exactly the way they are used to launched jobs interactively on lxplus or lappsl.

An example of a whole DaVinci.opts (without any include) can be seen in [11]. Basically the .py changes only the last line of those files.

You can find example of .opts files for Gauss [12], Boole [13]and Brunel [14] too.

The .sh

The .sh is the bash file which is executed arriving on the cluster. One example (still for a DaVinci Job) is presented here [15]

The .sh is composed that way :

1) Copy your executable <Application_Name>.exe, the library(ies) you want to use if needed and your input(s).

Your executable compiling your application is in your InstallArea, and your executable is in your $HOME/cmtuser/<Application_Name>_<Version>/<Package_Name>/<Application_Name>/<Version>. You have to tar both of them:

tar -zcvhf InstallArea InstallArea.tar.gz
tar -zcvhf <Application_Name> <Application_Name>.tar.gz

The option h is added in order to keep all path copying those files.

You copy and un-tar you library and your executable on the cluster

commandCpyInstallArea="rfcp /dpm/in2p3.fr/home/lhcb/<Your_Library.tar.gz> ${PWD}/<Your_Library>.tar.gz"
${commandCpyInstallArea}
tar zxf InstallArea.tar.gz
commandCpyExecutable="rfcp /dpm/in2p3.fr/home/lhcb/<Your_executable>.tar.gz ${PWD}/<Your_executable>.tar.gz"
${commandCpyExecutable}
tar zxf DaVinci.tar.gz

2) You copy your input(s) on the cluster, there are two ways to do it depending where your input(s) is(are) from.

For this you have to export the catalogue you want to use (here is written the CERN one).

#input on the Grid (using the LFN)
export LFC_HOST=lfc-lhcb.cern.ch
echo $LFC_HOST
commandSRM="lfn:<LFN_Name_of_the_input>"
SRM=${commandSRM}
PFN=file://$PWD/<Input_Name>
lcg-cp --vo lhcb $SRM $PFN
#input on the lapp-SE
SRM="srm://lapp-se01.in2p3.fr/dpm/in2p3.fr/home/lhcb/Gael/Bd_JPsiPi0/BdJPsiPi0_401.dst"
PFN=file://$PWD/BdJPsiPi0_401.dst
lcg-cp --vo lhcb $SRM $PFN

3) Arriving on the cluster the environment of your application has to be seted=

(Good Luck reading at this)[16]. Most of this .sh has been obtained running a Grid Job with Ganga and the Dirac backend on Lxplus.

4) In order to execute your job with your library(ies)

You have to include it :

export LD_LIBRARY_PATH=$sWorkingDir/InstallArea/${PLATFORM}/lib:${LD_LIBRARY_PATH}
commandSetAppRoot="export DAVINCIROOT=$PWD/DaVinci/v19r7"
${commandSetAppRoot}

5) Link your executable

It's linked :

APPLICATIONEXE="$DAVINCIROOT/slc4_ia32_gcc34/DaVinci.exe"
if [ -f ${APPLICATIONEXE} ]; then
   echo 'APPLICATIONEXE set to '${APPLICATIONEXE}
else
   echo 'Application executable '${APPLICATIONEXE}' not found'
   exit
fi

6) and your Option File:

OPTFILE="$sWorkingDir/DaVinci_v19r7.opts"
if [ -f ${OPTFILE} ]; then
   echo 'OPTFILE set to '${OPTFILE}
else
   echo 'Options file '${OPTFILE}' not found'
   exit
fi

7) Go where you want to launch your job and launch it :

cd  $sWorkingDir/${APPLICATION}/${VERSION}/cmt
${APPLICATIONEXE} ${OPTFILE}

8) At the end of the job you want to copy the output(s)

#If you want to copy your output file(s) on the Lapp-SE you have to  
export LFC_HOST=lfc-lhcb.in2p3.fr
sSaveFileName="<Output_name>"
commandCpyFinal="rfcp $sOutputFile /dpm/in2p3.fr/home/lhcb/<Output_directory>/$sSaveFileName"

The .py

That's the file executed on Ganga using the command :

execfile("<Python_File_Name.py>")

Like for the .sh the .py is going to be detailed, following its content.

One example (still for a DaVinci Job) is presented here [17]

Global variables

Those global variables will be used during all the execution of the python script.

Most of them will be needed to change variables into the .sh and the .opts. Others configure your Grid Job.

sMyExecutableVersion        =   "<Version>"
sMyExecutable               =   "<Application>"
sMyProject                  =   "<PROJECT>"
sMyProjectGroup             =   "<Package>"
sMyConfig                   =   "slc4_ia32_gcc34"
sMyLFNInputFile             =   "<Your_List_Of_LFNs>"
sPrefixOut                  =   "<Prefix_Of_OuputFiles>"
sMyOutputType               =   "<OuputFile_Type>"
sMyInputType                =   "<InputFile_Type"
sMySRMDirectory             =   "/dpm/in2p3.fr/home/lhcb/<Tar_Directory>"
sMyPFNSave                  =   "/dpm/in2p3.fr/home/lhcb/<Lapp_SE_Save_Directory>"
#if sInputFrom = "lapp" your input are on the Lapp-SE otherwise on the Grid accessible by LFN 
sInputFrom                  =   "lapp or something else"
sMyLFCHOST                  =   "<lfc_host>"
################ Tar files to bring
sMyExecutableFileName       =    "<Exec_tar_directory>"
sMyInstallAreaFileName      =    "<Library_tar_directory>"
sMyTarFile1                 =    "<Etra_tar_files"
################ Jobs definition
arret                       =    Number of jobs
EvtMax                      =    Number of events
MaxCPU                      =    Max CPUs
CEList                      =    {<List_Of_Contries>,"selection":"selection"}
#the CE selected tells the RB (resource broker) where you want to run your job
CESelected                  =    "lapp" 
CESelection                 =    [CESelected+'*']
CEname                      =    CEList["selection"]

Getting input list on loop over them

As ever said before, input files can have several locations ; so different ways to access them exist. The protocol to copy input files registered on the Grid or on the Lapp-SE are different, and the access to those lists are also different. This is why the variable "sInputFrom" has been created.

To list files from the grid, a .opts has been created with the Bookkeeping [18]. We get the LFNs of all the files we want as input and write them into an .opts (sMyLFNInputCard) file. This file is then read and we loop over the list :

f = open(sMyLFNInputFile)
for k in f.readlines():

For input files physically on the Lapp-SE, we can just do a "ls" (dpns-ls) on the directory of the inputs (sDpnsDirectory), split the list and loop over it.

sDpnsDirectory=sMyPFNInput
sCommand="dpns-ls "+sDpnsDirectory+" | grep "+sMyInputType
sRes=commands.getstatusoutput(sCommand)
if sRes==0:
   sys.exit()
sDstFileList=sRes[1].split('\n')
for k in sDstFileList:

Splitting input names

Input names are splitted in order to change variables into the .opts and the .sh to make those files consistent with what you want to execute on the Grid. Here as well there are two ways of splitting input names depending how you made your input file list.

Fof input files on the Lapp-SE :

if(sInputFrom=="lapp"):
   k=k.split('/')[-1]
   terminaison = event
   datanum = event
   prodname = k

Or on the Grid :

else:
   sNumberFormat="%05d"
   evt=int(sNumberFormat %event)
   event=int(evt)
   l=k.split(' ')[-3].split(':')[-1].split("'")[-2]
   dataname=l.split('/').split('.')[-2]
   datanum=dataname.split('_')[-2]
   prodname=dataname.split('_')[0]
   terminaison=dataname.split('_')[-1]
   sLFNDirectory=str(l.split('_')[-3])


Copy and change the .opts and .sh files

The .opts and .sh have to be changed because some variables into those files depend on the input, the way you want to execute your Job... You have to specify where they are from and where to copy them

# The directory where are my .py and .sh
sHomeDir="/home3/rospabe/gangadir/lhcb/"+sMyJobDir+"/"
sTmpDir="/home3/rospabe/gangadir/lhcb/tmp/"
sCommand="ls "+sHomeDir
sRes=commands.getstatusoutput(sCommand)
sMyExecutableOptsFileName = sHomeDir+sMyExecutable+"_"+sMyExecutableVersion+".opts"
sMyExecutableShellFileName = sHomeDir+sMyExecutable+"_"+sMyExecutableVersion+".sh"
sCommand="cp "+sTmpDir+sMyExecutableOptsFileName+" "+sMyExecutableOptsFileName
sRes=commands.getstatusoutput(sCommand)
sCommand="cp "+sTmpDir+sMyExecutable+"_"+sMyExecutableVersion+".sh "+sMyExecutableShellFileName
sRes=commands.getstatusoutput(sCommand)

Modifying outputs and input names

The ouput and input files names have to be modified and written into the new .opts and .sh files. Here we still have two different way of naming :

if(sInputFrom=="lapp"): 
   sInputFile = k
   sNewOutputFile = sPrefixOut+str(evt)+".root"
   sNewOutputFile_SE = str(prodname)+'_'+str(datanum)+'_'+str(newtreminaison)+sMyOutputType
else:
  sInputFile = m
  sNewOutputFile = sPrefixOut+str(evt)+".root"
  sNewOutputFile_SE = m
   commandSetLFCHOST="export LFC_HOST="+sMyLFCHOST

The copy of the output file depends where you execute your job (CESelected) and you specify where your copy it (sMyPFNSave/sSaveFileName):

if(CESelected == "lapp"):
   commandCpyFinal="rfcp $sOutputFile "+sMyPFNSave+"$sSaveFileName"
else:
   commandCpyFinal="lcg-cp file:${PWD}/"+sNewOutputFile+" srm://lapp-se01.in2p3.fr/"+sMyPFNSave+sNewOutputFile

Commands to copy and untar the executable and the library(ies)

We have to specify where tar files are and the copy commands, because original tar files have to be registered on the Grid into <sMySRMDirectory>. So the access to those files depend where you excute your job (CESelected == "lapp" or somewhere else):

sExecutableFileName= sMySRMDirectory + sMyExecutableFileName +".tar.gz"
sInstallAreaFileName= sMySRMDirectory + sMyInstallAreaFileName +".tar.gz"
if(CESelected == "lapp"):
  commandCpyExecutable="rfcp "+ sExecutableFileName+" ${PWD}/"+sMyExecutableFileName+".tar.gz"
  commandCpyInstallArea="rfcp "+sInstallAreaFileName+" ${PWD}/"+sMyInstallAreaFileName+".tar.gz"
else :
  commandCpyExecutable="lcg-cp --vo lhcb  srm://lapp-se01.in2p3.fr/"+sExecutableFileName+" file://${PWD}/"+sMyExecutableFileName+".tar.gz"
  commandCpyInstallArea="lcg-cp --vo lhcb srm://lapp-se01.in2p3.fr/"+sInstallAreaFileName+" file://${PWD}/"+sMyInstallAreaFileName+".tar.gz"

Once on the cluster, we have to untar those files and specify where the executable and library(ies) are:

commandUntarExecutable="tar zxf "+sMyExecutableFileName+".tar.gz"
commandUntarInstallArea="tar zxf "+sMyInstallAreaFileName+".tar.gz"
commandSetAppRoot="export "+sMyProject+"ROOT=$PWD/"+sMyExecutable+"/"+sMyExecutableVersion
sAppExe="$"+sMyProject+"ROOT/"+sMyConfig+"/"+sMyExecutable+".exe"
sMyOptionFile=sMyExecutable+"_"+sMyExecutableVersion+".opts"
sOPTFILE="$sWorkingDir/"+sMyOptionFile

You have to configure your job to send it to the Grid

To configure your job you have to specify several things :

The executable and the exe

j.application=Executable()
j.application.exe=sMyExecutable+"_"+sMyExecutableVersion+".sh"
j.application.args=[]

The backend you want to send your job with :

j.backend=LCG()
   

What you want to be into the input (opts ans sh files) and output sandbox (results of your job):

j.inputsandbox=[sMyExecutableOptsFileName,sMyExecutableShellFileName]
j.outputsandbox=["stderr","stdout"]

Requirements you ask for (CPU time, CEName, your VO) :

  req=LCGRequirements()
  Software='"VO-lhcb-'+sMyExecutable2+"-"+sMyExecutableVersion+'"'
  if CEname=="undefined":
      req.other= [ CPUrequirement,
                   'other.GlueHostMainMemoryRAMSize > 512','Member('+Software+',other.GlueHostApplicationSoftwareRunTimeEnvironment)']
  else:
      req.other= [ CPUrequirement,
                   'other.GlueHostMainMemoryRAMSize > 512','Member('+Software+',other.GlueHostApplicationSoftwareRunTimeEnvironment)',
                   CErequirement
               ]

Replace into .opts and .sh files

You've defined all variables before to make modifications into the .opts and .sh. You want to replace some lines into the .opts and .sh picked up into your $HOME/gangadir/lhcb/tmp/ and copy the final files in your $HOME/gangadir/lhcb before sending them with your job into the input sandbox. And one more time some replacements depend where you execute your job, and where your input(s) are from.

First the .sh :

pFile=ConfigJobOptionFile.ConfigJobOptionFile(sMyExecutableShellFileName)
pFile.ReplaceInstance("PROJECT=",sMyProject)
pFile.ReplaceInstance("VERSION=",sMyExecutableVersion)
pFile.ReplaceInstance("GROUP=",sMyProjectGroup)
pFile.ReplaceInstance("APPLICATION=",sMyExecutable)
pFile.ReplaceInstance("PLATFORM=",sMyConfig)
pFile.ReplaceInstance("export CMTCONFIG=",sMyConfig)
pFile.ReplaceInstance("sInputFile=",sInputFile)
pFile.ReplaceInstance("sOutputFile=",sNewOutputFile)
pFile.ReplaceInstance("sOutputFile_SE=",sNewOutputFile_SE)
pFile.ReplaceInstance("sSaveFileName=",sNewOutputFile)
pFile.ReplaceInstance("sExecutableFileName=",sExecutableFileName)
pFile.ReplaceInstance("commandCpyExecutable=",commandCpyExecutable)
pFile.ReplaceInstance("sInstallAreaFileName=",sInstallAreaFileName)
pFile.ReplaceInstance("commandCpyInstallArea=",commandCpyInstallArea)
pFile.ReplaceInstance("commandUntarExecutable=",commandUntarExecutable)
pFile.ReplaceInstance("commandUntarInstallArea=",commandUntarInstallArea)
pFile.ReplaceInstance("commandSetAppRoot=",commandSetAppRoot)
pFile.ReplaceInstance("APPLICATIONEXE=",sAppExe)
pFile.ReplaceInstance("OPTFILE=",sOPTFILE)
pFile.ReplaceInstance("commandSetLFCHOST=",commandSetLFCHOST)
pFile.ReplaceInstance("commandCpyFinal=",commandCpyFinal)
#sSRM= l
if(sInputFrom=="lapp"):
   sSRM='"srm://lapp-se01.in2p3.fr'+sDpnsDirectory+sInputFile+'"'
   pFile.ReplaceInstance("commandSRM=",sSRM)
   sPFN="file://$PWD/"+sInputFile
   pFile.ReplaceInstance("PFN=",sPFN)
else:
   sSRM='"lfn:/grid'+l+'"'
   pFile.ReplaceInstance("commandSRM=",sSRM)
   sPFN="file://$PWD/"+sInputFile
   pFile.ReplaceInstance("PFN=",sPFN)

Then the .opts file :

pFile=ConfigJobOptionFile.ConfigJobOptionFile(sMyExecutableOptsFileName)
#sInputFile = "./"+sInputFile
sText="DATAFILE='"+sInputFile+"' TYP='POOL_ROOTTREE' OPT='READ'"
pFile.ReplaceInstance("EventSelector.Input=",'{"'+sText+'"}')
sText="FILE1 DATAFILE='./"+sNewOutputFile+"' TYP='ROOT' OPT='NEW'"
pFile.ReplaceInstance("NTupleSvc.Output=",'{"'+sText+'"}')
sEvtMax=str(EvtMax)
pFile.ReplaceInstance("ApplicationMgr.EvtMax =",sEvtMax)


Submitting and stop of the loop

After having define your job and change your .opts and .sh you can submit it on the Grid and chose when you want to stop the loop:

event=event+1
j.submit()
if event==arret:
   sys.exit()

Exercises

Do the same than you've done interactively to reconstruct Pi0.

Data are into the lapp-SE :

dpns-ls -l /dpm/in2p3.fr/home/lhcb/Gael/Bd_JPsiPi0/

You need DaVinci_v19r7.opts [19] .sh [20] and the .py [21].

Reconstruction of Pi0 for MinBias events with the data on your account and on the grid using the LFN.

The answers are at /lapp_data/lhcb/Tutorial/Grid

Personal tools