CIT HEP Computing

This page is dedicated to centralize all computing related subjects

Workplan guidelines

Our priorities are in this order :

  • Uptime
  • User support (blocker issues)
  • Resource utilization -- making sure that all hardware is being used
  • User support (non-blocker important issues)
  • Backups
  • Monitoring
  • User support (potential non-issues)
  • Security
  • R&D - This probably needs extra manpower. We have ideas but no time.

Extremely handy links :

MonaLisa HepSpec table

USCMS T2 HepSpec table

Sites pledges

Related links

Upgrades (software)

USCMS Upgrades twiki

Pledges, etc

Monitoring links

We need a page to aggregate those, plus some DashBoard + PhEDEx + central CMS monitoring tools plots.

Central CMS

OSG

Local pages/systems

Documentation

Monitoring shifts

Daily

Weekly

  • Check RAID status on servers
    • SRM
    • CEs
    • GridFTPs
    • GUMS
      • Do this fast by : [root@t2-headnode-new lists]# pssh -i -h raid-check-list.txt "cat /proc/mdstat" | grep "blocks super" | grep -v chunks
    • PhEDEx
    • T3 Headnode
    • T3 Login node
    • T3 JBOD
      • Query with : storcli64 /c0/v0 show
    • Newman
      • Query with : areca_cli64 ; vsf info
    • LDAP2

  • Check that the important servers still have a working backup
  • Nodes / Cores / Storage counts - once we update the inventory and have defined totals.

Monitoring requirements

  • RSV
    • Cert exists
    • Cert validity

  • Nodes
    • GLEXEC
    • CVMFS
    • SWAP Trigger
    • IOPS on HDFS
    • IOPS on /
    • Proc / User
    • Open Files / User

  • Servers
    • RAID states

  • NameNode/HDFS
    • Health
    • Checkpoints
    • NN - All filesystems usage maximum alert

  • External
    • PhEDEx data on transfers
    • Data from SAM
    • Data from RSV
    • Data from DashBoard - Blackhole nodes
-- Main.samir - 2014-04-01
Edit | Attach | Watch | Print version | History: r24 < r23 < r22 < r21 < r20 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r24 - 2016-03-15 - dkcira
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback