Condor quick start
This page is supposed to guide users to start using Condor in our T3 infra-structure, only mentioning the very essential, adaptations can be done from there and integrated in this page.
The submission host is our usual t3-higgs login node. Everything is already setup so you can submit from your home area. Nodes passed the usual checklist.
Preparing the job for submission
The recommended is that you separate a directory for this. In my case, the job is a bash script that prints "alive" and sleeps for X seconds. Here's how my directory looks like :
-bash-3.2$ ll
-rw-r--r-- 1 samir users 0 May 28 12:59 sleep.err
-rw-r--r-- 1 samir users 31355 May 28 14:53 sleep.log
-rw-r--r-- 1 samir users 5 May 28 14:42 sleep.out.1
-rwxr-xr-x 1 samir users 35 May 28 14:42 sleep.sh
-rw-r--r-- 1 samir users 116 May 28 14:43 submit.sub
Now we will look at our submit.sub (you can call it anything). This is the file that will tell Condor what to do :
Executable = sleep.sh
Universe = vanilla
Output = sleep.out.$(Process)
Log = sleep.log
Error = sleep.err
getenv = True
Queue
It couldn't be simpler. Don't ever change the Universe. All the rest is self-explanatory. The Queue parameter tells condor how many copies of this very same job we want to send, default is 1. I could do :
Queue 4
And have 4 identical jobs running. That's when the $(Process) variable makes a difference, the output files will be called in the same directory :
-rw-r--r-- 1 samir users 5 May 28 14:42 sleep.out.1
-rw-r--r-- 1 samir users 5 May 28 14:42 sleep.out.2
-rw-r--r-- 1 samir users 5 May 28 14:42 sleep.out.3
-rw-r--r-- 1 samir users 5 May 28 14:42 sleep.out.4
Submitting the job(s)
Once you got familiar with how to configure your job, is time to submit it :
-bash-3.2$ condor_submit submit.sub
Submitting job(s)....
4 job(s) submitted to cluster 32.
Then you can monitor with :
-bash-3.2$ condor_q
-- Submitter: t3-higgs.ultralight.org : <10.4.255.253:43446> : t3-higgs.ultralight.org
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
15.0 amott 7/2 11:18 0+00:00:00 H 0 17.1 ZeeSelectorApp
32.0 samir 5/28 15:13 0+00:00:25 R 0 0.0 sleep.sh
32.1 samir 5/28 15:13 0+00:00:25 R 0 0.0 sleep.sh
32.2 samir 5/28 15:13 0+00:00:25 R 0 0.0 sleep.sh
32.3 samir 5/28 15:13 0+00:00:25 R 0 0.0 sleep.sh
5 jobs; 0 completed, 0 removed, 0 idle, 4 running, 1 held, 0 suspended
Or if you want to go do something else and they are gone later, you can confirm that they actually finished by spotting them on the history :
-bash-3.2$ condor_history
ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD
31.3 samir 5/28 14:43 0+00:05:05 C 5/28 14:53 /home/samir/condor-test/sleep.sh
31.2 samir 5/28 14:43 0+00:05:04 C 5/28 14:53 /home/samir/condor-test/sleep.sh
31.1 samir 5/28 14:43 0+00:05:03 C 5/28 14:48 /home/samir/condor-test/sleep.sh
31.0 samir 5/28 14:43 0+00:05:03 C 5/28 14:48 /home/samir/condor-test/sleep.sh
30.3 samir 5/28 14:42 0+00:00:09 C 5/28 14:42 /home/samir/condor-test/sleep.sh
30.2 samir 5/28 14:42 0+00:00:09 C 5/28 14:42 /home/samir/condor-test/sleep.sh
With this, you should be able to do the basic. Feel free to edit this page and add more content and tips from your experiences.
-- Main.samir - 2014-05-28