CIT HEP Computing

This page is dedicated to centralize all computing related subjects

Workplan guidelines

Our priorities are in this order :

  • Uptime
  • User support (blocker issues)
  • Resource utilization -- making sure that all hardware is being used
  • User support (non-blocker important issues)
  • Backups
  • Monitoring
  • User support (potential non-issues)
  • Security
  • R&D - This probably needs extra manpower. We have ideas but no time.

Extremely handy links :

MonaLisa HepSpec table

USCMS T2 HepSpec table

Sites pledges

Related links

Upgrades (software)

USCMS Upgrades twiki

Pledges, etc

Monitoring links

We need a page to aggregate those, plus some DashBoard + PhEDEx + central CMS monitoring tools plots.

Central CMS

OSG

Local pages/systems

Documentation

Monitoring shifts

Daily

  • Check Readiness
  • Check Site Status Board
  • Check PhEDEx Transfers
  • HDFS status, corrupted blocks
  • Check Job Failure rate, reasons if there is a site problem
  • Check Ganglia plots
  • Check GridFTP Screen on Zabbix

Weekly

  • Check RAID status on servers
  • Check that the important servers still have a working backup
  • Nodes / Cores / Storage counts - once we update the inventory and have defined totals.

Monitoring requirements

  • RSV
    • Cert exists
    • Cert validity

  • Nodes
    • GLEXEC
    • CVMFS
    • SWAP Trigger
    • IOPS on HDFS
    • IOPS on /
    • Proc / User
    • Open Files / User

  • Servers
    • RAID states

  • NameNode/HDFS
    • Health
    • Checkpoints
    • NN - All filesystems usage maximum alert

  • External
    • PhEDEx data on transfers
    • Data from SAM
    • Data from RSV
    • Data from DashBoard - Blackhole nodes
-- Main.samir - 2014-04-01
Edit | Attach | Watch | Print version | History: r24 | r22 < r21 < r20 < r19 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r20 - 2014-12-08 - samir
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback