ROOT Analysis

From LHCB-LAPP

Jump to: navigation, search

This page explains several ways of working with the analysis of ROOT files. This analysis is done for different configurations:

  • emplacement of DATA,
  • where you analysis takes place (where you work),
  • and how much DATA you want to analyze.

This analysis has been shown three times for different publics, here you can find a summary of them : [1]

DATA can be registered either on your computer, either on your HOME directory on Lappsl or on the storage element (SE) of the Lapp Cluster. Your analysis can take place into these three emplacements too. The own computer won’t be studied. We will show only cases where data and analysis take place either on Lappsl or on the Grid.

The different configurations are shown by the following draws.

  • Analysis on Lappsl and data on SE or the SE :

  • Analysis on Lappsl or on the Cluster and data on the SE :

Firstly, PyROOT is introduced and then some tools which have been built specially for this analysis.

Contents

PyROOT (a short introduction)

Use ROOT which versions can be found at /grid_sw/lhcb/lib/lcg/external/root because to use PyROOT you have to setup ROOT and its python interface which are both in the LHCb releases.

Python versions can be found at /grid_sw/lhcb/lib/lcg/external/python. A tutorial has been made an can be found at [2]

  • First here is the script to setup the environment :
setenv ROOTSYS /grid_sw/lhcb/lib/lcg/external/root/5.18.00d/slc4_ia32_gcc34/root
setenv PYTHONDIR /grid_sw/lhcb/lib/lcg/external/Python/2.5/slc4_ia32_gcc34/
setenv PATH ${ROOTSYS}/bin:${PYTHONDIR}/bin:${PATH}
setenv LD_LIBRARY_PATH ${ROOTSYS}/lib:${PYTHONDIR}/lib:${LD_LIBRARY_PATH}
setenv PYTHONPATH ${ROOTSYS}/lib:${PYTHONPATH}
#librairie castor
ln -s ${LCG_LOCATION}/lib/libdpm.so libshift.so.2.1 
setenv LD_LIBRARY_PATH ${PWD}:${LD_LIBRARY_PATH}
  • Then you can run PyROOT making a .py file with the header :
import ROOT
from ROOT import <what you want as TFile, TH1F, ....>

To open files you can find two ways:

  • One after another (no more than 30Mo) with:
for file in ListOfFiles:
    input = TFile.Open(file)
    mytree=input.Get('TreeName')
  • Chaining your files with:
 chain = TChain('n1')
 for file in ListOfFiles:
     chain.Add(file)
     entries=chain.GetEntries()
  • Another way is to merge all your files into a single one, you can do it with a ROOT macro [3] which should be compil with this Makefile [4]and a top job script in python [5]. But the trouble with this method is that your files should numbered (File_1.root, File_2.root,...) ; no matter if one number (or severql) misses.


A special trick has been found to create a tree with several branches in it and several leaves :

#Creates new file and new tree
fNewFile = TFile("Name","Create,Recreate,…")
fNewTree=TTree("Name", "Name") 
#Creates structure branche avec feuilles
sBranch_struct="struct cluster_t{\
	Int_t size;\
	Float_t eX[SIZE];\
        };"
#putting structure on the tree 
sClusters_struct=sClusters_struct.replace("SIZE",str())
gROOT.ProcessLine(sClusters_struct);
self.Views.Branch('clusters',cluster_t,'@size/I:clusters.eX['+str(self.nPoints)+']/F')        
#Fill variables of the branch
clusters.eX=array('f',range(self.nPoints))
for j in range(0,self.nPoints):
       clusters.eX[j]=float(self.Variables[0][0][j])
self.Views.Fill()

Tools used for the distributed analysis

The basic script

Like we've ever said into the introduction, either you work with your ROOT Ntuple in the same machine than you analyze them or you can open them at distance (with rfio). Both ways are shown in the following scripts.

  • Here is the basic script used to open the files and to fill a simple histogram with one of the variables of the Ntuple :
 def defineArray(type,size):
     obj = array(type)
     for f in range(0,size):
         obj.append(0)
     return obj
 
 #Liste of files to chain (a list has been done using the command "ls" into the wanted directory)
 for File in sFileList:
      #if you work with local files
      sRootName = '/lapp_data/lhcb/rospabe/Bs_JPsiEta/Prod/'+File
      #if you open away files
      sRootName = 'rfio:/lapp_data/lhcb/rospabe/Bs_JPsiEta/Prod/'+File
      chain.Add(sRootName)

 #get the entries
 entries=chain.GetEntries()

 #declaration of the variables
 R = array( 'l' , [0] )
 chain.SetBranchAddress("R",R)

 #define your array with the function defineArray with the wanted typt and size 
 BsM = defineArray( 'd' , 20000 )
 chain.SetBranchAddress("BsM",BsM)

 #getting the number of events
 nEvt=nEvt+chain.GetEntriesFast()

 #looping over events
 for jentry in xrange(0,nEvt ):
     nb=chain.GetEntry(jentry)
     #loop over reconstructed Bd
         for j in range(0,R[0]):
             nBs+=1
             mBs.Fill(BsM[j]/1000.)

 #filling the histogram
 file.write("Mean Histo = "+str(mBs.GetMean())+"\n")

Transfer (local copy) and analysis in parallel

In order to gain time, a last trick has been done, files are copied locally while the last transferred are analyzed. The first transfer should be done before beginning the analysis and it is launched this way :

sFileTransfer=FileTransfer.FileTransfer(sFileList,sCopyDirectory,"rfcp",iFirstFile_trf,iLastFile_trf,False)

You have to specify the list of file, the directory you want to copy them, the first and last file and the transfer.

The file transfer is also done in parallel, the different files are copied in parallel in order to gain time too. Here is the script doing it :

#!/usr/bin/python
import re, sys
import commands
import os
import time
import threading
import Queue

bal = Queue.Queue (100)

#class which defines the parallel copies by threadings
class myThread(threading.Thread):
    def __init__(self, sName, sTransferCommand):
        threading.Thread.__init__(self, target=self.run, name=sName, args=() )
        self.sName=sName
        self.sCommand=sTransferCommand
        self.bAllThreadDone=False
        self.start()

    def run(self):

        print "START : ",self.sCommand
        sResCommand=commands.getstatusoutput(self.sCommand)
        print "STOP : ",sResCommand

#class for the file transfer
class FileTransfer:
    #list of protocols
    sProtocolList=["cp","rfcp","lcg-cp"]
   
    def __init__(self, sFileList, sCopyDirectory, sProtocol,iFirstFile,iLastFile
,bWaitTillEndOfProcess=True):
        self.sProtocol=sProtocol
        self.sFileList=sFileList
        self.sDirectory=sCopyDirectory
        self.iNbStream=1
        self.iFirstFile=iFirstFile
        self.iLastFile=iLastFile
        self.bWaitTillEndOfProcess=bWaitTillEndOfProcess

    #Retrieving Files to transfer
    def RetrieveFiles(self):

        self.threadList=[]
        # Thread creation over the file to transfer 
        for i,sFile in enumerate(self.sFileList[self.iFirstFile:self.iLastFile])
:

            sDestFile=self.sDirectory+"/"+sFile.split("/")[-1]

            #checking the copy protocol
            if self.sProtocol=="cp":
                sTransferCommand="cp "+sFile+" "+sDestFile
            elif self.sProtocol=="rfcp":
                sTransferCommand="rfcp "+sFile+" "+sDestFile
            elif self.sProtocol=="lcg-cp":
                sTransferCommand="lcg-cp -l lfn:"+sFile+" file://"+sDestFile

            #Make the thread to transfer the wanted file
            sThreadName="Trf("+str(i)+")"
            self.threadList.append(myThread(sThreadName,sTransferCommand))

        # Loop to wait until all thread are done
        self.bAllThreadDone=False

        if self.bWaitTillEndOfProcess==False: return
        
        #Until thread not finished
        while not self.bAllThreadDone:
            tmp_bAllThreadDone=True
            sRunningThread="Running thread : "
            sDoneThread="Done thread : "
            for thread in self.threadList:
                if thread.isAlive():
                    tmp_bAllThreadDone=False
                    sRunningThread=sRunningThread+" "+thread.sName
                else:
                    sDoneThread=sDoneThread+" "+thread.sName
            self.bAllThreadDone=tmp_bAllThreadDone
            if self.bAllThreadDone==True : return time.sleep(0.5)

    #if thread is active
    def IsActive(self):

        if self.bAllThreadDone==True: return False
        tmp_bAllThreadDone=True
        sRunningThread="Running thread : "
        sDoneThread="Done thread : "
        for thread in self.threadList:
            if thread.isAlive():
                tmp_bAllThreadDone=False
                sRunningThread=sRunningThread+" "+thread.sName
            else:
                sDoneThread=sDoneThread+" "+thread.sName
        self.bAllThreadDone=tmp_bAllThreadDone
        if self.bAllThreadDone==True : return False
       return True


    #if analysis finished delete analyzed files
    def DeleteRetrievedFiles(self,iFirstFile,iLastFile):
        
        for sFile in self.sFileList[iFirstFile:iLastFile]:
            sDestFile=self.sDirectory+"/"+sFile.split("/")[-1]
            if os.path.isfile(sDestFile): os.remove(sDestFile)

    #Remove all transfered files
    def DeleteAllRetrievedFiles(self):
        
        for sFile in self.sFileList:
            sDestFile=self.sDirectory+"/"+sFile.split("/")[-1]
            if os.path.isfile(sDestFile): os.remove(sDestFile)

Once a serie of files has been analyzed, this serie is removed.

Launching the job on the grid

Those jobs have been launched on the grid the same way than in [6]. The script is [7] you need to use Ganga to do it.

Personal tools