Local CRAB
This page is intended to describe how to use CRAB in T3-Higgs
Environment :
I'm hoping that users are familiar with the
usual CRAB recipe.
For the specific environments here they are :
source /cvmfs/cms.cern.ch/cmsset_default.sh # CMSSW
cmsrel RELEASE ; cd RELEASE/src ; cmsenv # Just a guideline on how to get cmsenv to run *the order is important*
source /cvmfs/oasis.opensciencegrid.org/osg-software/osg-wn-client/3.2/current/el5-x86_64/setup.sh # Grid tools
source /cvmfs/cms.cern.ch/crab/crab.sh
Alternatively, if you only need the usual Grid tools that were under /opt/osgwn-1.2.5/ that was upgraded and now lives at :
/etc/osg/wn-client/setup.sh
Configuration
Everything is usually whatever you want, except these :
[USER]
return_data = 1
copy_data = 0
[CRAB]
scheduler = condor
This will ensure that we will find the right CMSSW.sh that will help us achieve faster the goal of setting the stage-out to Hadoop.
Options for output file destinations
You have 3 choices :
- Use the CRAB Framework, where crab -getoutput should work. Worked in Samir's tests.
- Use the CRAB Framework for T2 stage-out via SRM (copy_data = 1). Look at "Running jobs with SRM stage-out" section.
- Hack the CRAB scripts to output the files directly to hadoop.
The 2 first options are standard. For the third, here's our home-grown procedure.
Local stage-out to Hadoop
Steps
After your environment is set and the crab.cfg is configured, time to create the jobs with crab -create. This will give you a task directory :
working directory /home/samir/CMSSW_5_3_11/src/crab_0_140705_005804/
Now is the part where we set the stageout location, you need to use the taskdir to edit CMSSW.sh :
/home/samir/CMSSW_5_3_11/src/crab_0_140705_005804/job/CMSSW.sh
Towards the end of the file, search for "file_list". Just below a line like this :
file_list="$SOFTWARE_DIR/outfile_$OutUniqueID.root"
Is where we will place the copy command :
cp $RUNTIME_AREA/outfile_* /mnt/hadoop/store/user/$USER/crabtest/
NOTE: "outfile" was in my case where I specify it in crab.cfg. Make sure that you pick the right filename for the copy command. You will have a hint some lines above in the same script.
output_file = outfile.root
That should be all. Now you can submit your jobs and if the input data is in the right place (/mnt/hadoop) your jobs should run and copy the output files to the directory you specify
Running jobs with Xrootd input
Is out of the scope of this document to teach how to use Xrootd, but if you already know, here is how you successfully run a job with Xrootd input, in our T3 :
- The main point is that we need to get rid of the canonical proxy location and use one in our home directories, which is shared with the worker nodes, check where is your proxy :
-bash-3.2$ voms-proxy-info
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samir/CN=695732/CN=Samir Cury Siqueira/CN=proxy
issuer : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samir/CN=695732/CN=Samir Cury Siqueira
identity : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samir/CN=695732/CN=Samir Cury Siqueira
type : proxy
strength : 1024 bits
path : /tmp/x509up_u2611
timeleft : 191:59:58
If you see it there,
delete it. It will only stay on your way.
- Create a proxy in your home directory, I have chosen to do it in my CMSSW project area :
-bash-3.2$ voms-proxy-init -voms cms -valid 192:00 -out $PWD/samir.proxy
export X509_USER_PROXY=$PWD/samir.proxy # You also need to tell the system about the unusual location
-bash-3.2$ pwd
/home/samir/CMSSW_5_3_11/src
- Make sure that all is fine by running voms-proxy-info :
-bash-3.2$ voms-proxy-info
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samir/CN=695732/CN=Samir Cury Siqueira/CN=proxy
issuer : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samir/CN=695732/CN=Samir Cury Siqueira
identity : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samir/CN=695732/CN=Samir Cury Siqueira
type : proxy
strength : 1024 bits
path : /home/samir/CMSSW_5_3_11/src/samir.proxy
timeleft : 191:58:21
- Good. As in our system the environment of job submission is the job environment, your home directory is visible from the nodes so the proxy will be found and used. In my case the output was happily sitting in the output dir, as specified previously.
Running jobs with SRM stage-out (most recommended)
If you follow the steps above for Xrootd Input, you get for free the ability of copying your output to any grid site. Including of course, T2_US_Caltech. The method is pretty standard, as you would with any CRAB task, in my file for example :
[USER]
return_data = 0
copy_data = 1
storage_element = T2_US_Caltech
data_location_override = T2_US_Caltech
user_remote_dir = CrabXrootd2
So I get under /store/user/samir/CrabXrootd2 the output. That's all you need to worry about -- and you won't need to patch CSSW.sh as described in the first local stage-out method.
-- Main.samir - 2014-07-05