---+ Condor quick start This page is supposed to guide users to start using Condor in our T3 infra-structure, only mentioning the very essential, adaptations can be done from there and integrated in this page. The submission host is our usual t3-higgs login node. Everything is already setup so you can submit from your home area. Nodes passed the usual checklist. ---++ Preparing the job for submission The recommended is that you separate a directory for this. In my case, the job is a bash script that prints "alive" and sleeps for X seconds. Here's how my directory looks like : <verbatim> -bash-3.2$ ll -rw-r--r-- 1 samir users 0 May 28 12:59 sleep.err -rw-r--r-- 1 samir users 31355 May 28 14:53 sleep.log -rw-r--r-- 1 samir users 5 May 28 14:42 sleep.out.1 -rwxr-xr-x 1 samir users 35 May 28 14:42 sleep.sh -rw-r--r-- 1 samir users 116 May 28 14:43 submit.sub </verbatim> Now we will look at our submit.sub (you can call it anything). This is the file that will tell Condor what to do : <verbatim> Executable = sleep.sh Universe = vanilla Output = sleep.out.$(Process) Log = sleep.log Error = sleep.err Queue </verbatim> It couldn't be simpler. Don't ever change the Universe. All the rest is self-explanatory. The Queue parameter tells condor how many copies of this very same job we want to send, default is 1. I could do : <verbatim> Queue 4 </verbatim> And have 4 identical jobs running. That's when the $(Process) variable makes a difference, the output files will be called in the same directory : <verbatim> -rw-r--r-- 1 samir users 5 May 28 14:42 sleep.out.1 -rw-r--r-- 1 samir users 5 May 28 14:42 sleep.out.2 -rw-r--r-- 1 samir users 5 May 28 14:42 sleep.out.3 -rw-r--r-- 1 samir users 5 May 28 14:42 sleep.out.4 </verbatim> ---++ Submitting the job(s) Once you got familiar with how to configure your job, is time to submit it : <verbatim> -bash-3.2$ condor_submit submit.sub Submitting job(s).... 4 job(s) submitted to cluster 32. </verbatim> Then you can monitor with : <verbatim> -bash-3.2$ condor_q -- Submitter: t3-higgs.ultralight.org : <10.4.255.253:43446> : t3-higgs.ultralight.org ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 15.0 amott 7/2 11:18 0+00:00:00 H 0 17.1 ZeeSelectorApp 32.0 samir 5/28 15:13 0+00:00:25 R 0 0.0 sleep.sh 32.1 samir 5/28 15:13 0+00:00:25 R 0 0.0 sleep.sh 32.2 samir 5/28 15:13 0+00:00:25 R 0 0.0 sleep.sh 32.3 samir 5/28 15:13 0+00:00:25 R 0 0.0 sleep.sh 5 jobs; 0 completed, 0 removed, 0 idle, 4 running, 1 held, 0 suspended </verbatim> Or if you want to go do something else and they are gone later, you can confirm that they actually finished by spotting them on the history : <verbatim> -bash-3.2$ condor_history ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD 31.3 samir 5/28 14:43 0+00:05:05 C 5/28 14:53 /home/samir/condor-test/sleep.sh 31.2 samir 5/28 14:43 0+00:05:04 C 5/28 14:53 /home/samir/condor-test/sleep.sh 31.1 samir 5/28 14:43 0+00:05:03 C 5/28 14:48 /home/samir/condor-test/sleep.sh 31.0 samir 5/28 14:43 0+00:05:03 C 5/28 14:48 /home/samir/condor-test/sleep.sh 30.3 samir 5/28 14:42 0+00:00:09 C 5/28 14:42 /home/samir/condor-test/sleep.sh 30.2 samir 5/28 14:42 0+00:00:09 C 5/28 14:42 /home/samir/condor-test/sleep.sh </verbatim> With this, you should be able to do the basic. Feel free to edit this page and add more content and tips from your experiences. -- Main.samir - 2014-05-28
This topic: Main
>
ComputingCondor
Topic revision: r1 - 2014-05-28 - samir
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback