CE Troubleshoot

Find here the potential pitfalls and ways to debug a OSG GRAM Compute Element 3.2

Find job submission rate, submission errors

This is a bit tricky because logging although verbose, doesn't say a lot of what you mostly want to see -- job submission.

There is a lot of operations that get logged (most of them I'd say, in a production CE) that are "how's the job I submitted X?" or "give me the output for the job Y" and much more.

However, I found that when a job submission actually happens, here's what you can see :

TIME: Wed May 14 18:11:03 2014
 PID: 29145 -- Notice: 0: Child 29147 started
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029146.0000000000 for /DC=ch/DC=cern/OU=computers/CN=cmspilot04/vocms0167.cern.ch on ::ffff:129.79.53.27
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029146.0000000000 mapped to uscms4257 (20707, 504)
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029146.0000000000 has GRAM_SCRIPT_JOB_ID 075.000.000 manager type condor
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029145.0000000000 for /DC=ch/DC=cern/OU=computers/CN=cmspilot05/vocms0167.cern.ch on ::ffff:129.79.53.27
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029145.0000000000 mapped to uscms4251 (20701, 504)
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029145.0000000000 has GRAM_SCRIPT_JOB_ID 076.000.000 manager type condor
TIME: Wed May 14 18:14:46 2014

So a grep like this will show you better about the job submission rate :


[root@cithep231 ~]# grep 'has GRAM_SCRIPT_JOB_ID' /var/log/globus-gatekeeper.log   | grep condor 

# or even rate per hour :

[root@cithep231 ~]# grep 'has GRAM_SCRIPT_JOB_ID' /var/log/globus-gatekeeper.log   | grep condor | awk -F':' '{print $1}' | sort | uniq -c
      1 JMA 2014/05/14 04
      1 JMA 2014/05/14 05
      7 JMA 2014/05/14 06
      8 JMA 2014/05/14 07
      7 JMA 2014/05/14 08
     30 JMA 2014/05/14 09
     20 JMA 2014/05/14 10
     10 JMA 2014/05/14 11
      6 JMA 2014/05/14 12
      9 JMA 2014/05/14 13
      6 JMA 2014/05/14 14
     24 JMA 2014/05/14 15
     12 JMA 2014/05/14 16
      7 JMA 2014/05/14 17
      4 JMA 2014/05/14 18

For general errors, there is always the USER SPECIFIC GRAM logs :

/var/log/globus/gram_$(LOGNAME).log

And one can configure the log levels at :

/etc/globus/globus-gram-job-manager.conf

Default is only the chaotic messages so even a healthy CE will be full of errors over there, watch out!

-- Main.samir - 2014-05-15

Topic revision: r1 - 2014-05-15 - samir
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback