Condor quick start

This page is supposed to guide users to start using Condor in our T3 infra-structure, only mentioning the very essential, adaptations can be done from there and integrated in this page.

The submission host is our usual t3-higgs login node. Everything is already setup so you can submit from your home area. Nodes passed the usual checklist.

Preparing the job for submission

The recommended is that you separate a directory for this. In my case, the job is a bash script that prints "alive" and sleeps for X seconds. Here's how my directory looks like :

-bash-3.2$ ll
-rw-r--r-- 1 samir users     0 May 28 12:59 sleep.err
-rw-r--r-- 1 samir users 31355 May 28 14:53 sleep.log
-rw-r--r-- 1 samir users     5 May 28 14:42 sleep.out.1
-rwxr-xr-x 1 samir users    35 May 28 14:42 sleep.sh
-rw-r--r-- 1 samir users   116 May 28 14:43 submit.sub

Now we will look at our submit.sub (you can call it anything). This is the file that will tell Condor what to do :

Executable = sleep.sh
Universe = vanilla
Output = sleep.out.$(Process)
Log = sleep.log
Error = sleep.err
getenv = True

Queue

It couldn't be simpler. Don't ever change the Universe. All the rest is self-explanatory. The Queue parameter tells condor how many copies of this very same job we want to send, default is 1. I could do :

Queue 4

And have 4 identical jobs running. That's when the $(Process) variable makes a difference, the output files will be called in the same directory :

-rw-r--r-- 1 samir users     5 May 28 14:42 sleep.out.1
-rw-r--r-- 1 samir users     5 May 28 14:42 sleep.out.2
-rw-r--r-- 1 samir users     5 May 28 14:42 sleep.out.3
-rw-r--r-- 1 samir users     5 May 28 14:42 sleep.out.4

Submitting the job(s)

Once you got familiar with how to configure your job, is time to submit it :

-bash-3.2$ condor_submit submit.sub 
Submitting job(s)....
4 job(s) submitted to cluster 32.

Then you can monitor with :

-bash-3.2$ condor_q


-- Submitter: t3-higgs.ultralight.org : <10.4.255.253:43446> : t3-higgs.ultralight.org
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  15.0   amott           7/2  11:18   0+00:00:00 H  0   17.1 ZeeSelectorApp    
  32.0   samir           5/28 15:13   0+00:00:25 R  0   0.0  sleep.sh          
  32.1   samir           5/28 15:13   0+00:00:25 R  0   0.0  sleep.sh          
  32.2   samir           5/28 15:13   0+00:00:25 R  0   0.0  sleep.sh          
  32.3   samir           5/28 15:13   0+00:00:25 R  0   0.0  sleep.sh          

5 jobs; 0 completed, 0 removed, 0 idle, 4 running, 1 held, 0 suspended

Or if you want to go do something else and they are gone later, you can confirm that they actually finished by spotting them on the history :

-bash-3.2$ condor_history
 ID     OWNER          SUBMITTED   RUN_TIME     ST COMPLETED   CMD            
  31.3   samir           5/28 14:43   0+00:05:05 C   5/28 14:53 /home/samir/condor-test/sleep.sh 
  31.2   samir           5/28 14:43   0+00:05:04 C   5/28 14:53 /home/samir/condor-test/sleep.sh 
  31.1   samir           5/28 14:43   0+00:05:03 C   5/28 14:48 /home/samir/condor-test/sleep.sh 
  31.0   samir           5/28 14:43   0+00:05:03 C   5/28 14:48 /home/samir/condor-test/sleep.sh 
  30.3   samir           5/28 14:42   0+00:00:09 C   5/28 14:42 /home/samir/condor-test/sleep.sh 
  30.2   samir           5/28 14:42   0+00:00:09 C   5/28 14:42 /home/samir/condor-test/sleep.sh 

With this, you should be able to do the basic. Feel free to edit this page and add more content and tips from your experiences.

-- Main.samir - 2014-05-28

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2014-05-29 - samir
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback