Computing Topics
Intended to list the internal task list and make sure that the priorities are agreed.
Necessary - ASAP
- Memory review/purchase -- How much memory the HT capable nodes need. Aim for 3 GB per core. 2 would be ok too.
- 2014 Deployment plan -- power / placement
- Get quotes for Nodes/Disks. Compare 6378 with 6382 SE
- Before finalizing it would be nice to have
- Monitoring review - test SMS alerts and fine-tune priorities of alerts and where they go
- PhEDEx migration to better hardware - will support thousands of parallel transfers instead of hundreds.
Necessary - Long term
- Write CMSSW benchmark suite that could be ran by Condor/CRAB jobs. Compare performance of whole-node to 1 core.
- Bonus feature - Have CouchDB centralizing jobs outputs. We would know how much EV/s each CPU could do. Automated systems such as WMAgent could use this information.
- Not so long term -- Figure out the HEP Cluster usage.
Enhancement
- Start a Cloud site with the $500 Google Cloud promotion. Needs quite some testing. Setting up a minimalistic site would be fast though. Might have problems in the Middleware.
- CRAB submitting to the T2 through HTCondor CE.
- T3 Storage - CEPH commissioning, start testing with WN storage.
- Try registering T3-Higgs as T3_US_Caltech and mimic the T2 activities. Could observe how a "CEPH site" would behave.
- In principle no mistery, if file reading by jobs and GridFTP works. Already tested ROOT access. Works better than Hadoop (support merging natively)
- CEPH Validation checklist
- Works with all ROOT operations - opening, merging and updating file -- doesn't work in HDFS, works in CEPH
- Running CMSSW jobs -- test stage-out
- GridFTP transfers
- Checksum calculation for standard Linux tools
- Test Caching policy and functionality -- Cache node disks have to serve as a RAID0.
- Partial replication -- less than factor of 2 usable space.
-- Main.samir - 2014-07-10
Topic revision: r3 - 2014-07-24
- samir