CE Troubleshoot
Find here the potential pitfalls and ways to debug a OSG GRAM Compute Element 3.2
Find job submission rate, submission errors
This is a bit tricky because logging although verbose, doesn't say a lot of what you mostly want to see --
job submission.
There is a lot of operations that get logged (most of them I'd say, in a production CE) that are "how's the job I submitted X?" or "give me the output for the job Y" and much more.
However, I found that when a job submission actually happens, here's what you can see :
TIME: Wed May 14 18:11:03 2014
PID: 29145 -- Notice: 0: Child 29147 started
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029146.0000000000 for /DC=ch/DC=cern/OU=computers/CN=cmspilot04/vocms0167.cern.ch on ::ffff:129.79.53.27
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029146.0000000000 mapped to uscms4257 (20707, 504)
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029146.0000000000 has GRAM_SCRIPT_JOB_ID 075.000.000 manager type condor
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029145.0000000000 for /DC=ch/DC=cern/OU=computers/CN=cmspilot05/vocms0167.cern.ch on ::ffff:129.79.53.27
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029145.0000000000 mapped to uscms4251 (20701, 504)
JMA 2014/05/14 18:11:08 GATEKEEPER_JM_ID 2014-05-15.01:11:02.0000029145.0000000000 has GRAM_SCRIPT_JOB_ID 076.000.000 manager type condor
TIME: Wed May 14 18:14:46 2014
So a grep like this will show you better about the job submission rate :
[root@cithep231 ~]# grep 'has GRAM_SCRIPT_JOB_ID' /var/log/globus-gatekeeper.log | grep condor
# or even rate per hour :
[root@cithep231 ~]# grep 'has GRAM_SCRIPT_JOB_ID' /var/log/globus-gatekeeper.log | grep condor | awk -F':' '{print $1}' | sort | uniq -c
1 JMA 2014/05/14 04
1 JMA 2014/05/14 05
7 JMA 2014/05/14 06
8 JMA 2014/05/14 07
7 JMA 2014/05/14 08
30 JMA 2014/05/14 09
20 JMA 2014/05/14 10
10 JMA 2014/05/14 11
6 JMA 2014/05/14 12
9 JMA 2014/05/14 13
6 JMA 2014/05/14 14
24 JMA 2014/05/14 15
12 JMA 2014/05/14 16
7 JMA 2014/05/14 17
4 JMA 2014/05/14 18
For general errors, there is always the
USER SPECIFIC GRAM logs :
/var/log/globus/gram_$(LOGNAME).log
And one can configure the log levels at :
/etc/globus/globus-gram-job-manager.conf
Default is only the chaotic messages so even a healthy CE will be full of errors over there, watch out!
-- Main.samir - 2014-05-15
Topic revision: r1 - 2014-05-15
- samir