Difference: ComputingTopics (1 vs. 3)

Revision 32014-07-24 - samir

Line: 1 to 1
 

Computing Topics

Line: 26 to 26
 
  • T3 Storage - CEPH commissioning, start testing with WN storage.
  • Try registering T3-Higgs as T3_US_Caltech and mimic the T2 activities. Could observe how a "CEPH site" would behave.
    • In principle no mistery, if file reading by jobs and GridFTP works. Already tested ROOT access. Works better than Hadoop (support merging natively)
Added:
>
>
  • CEPH Validation checklist
    • Works with all ROOT operations - opening, merging and updating file -- doesn't work in HDFS, works in CEPH
    • Running CMSSW jobs -- test stage-out
    • GridFTP transfers
    • Checksum calculation for standard Linux tools
    • Test Caching policy and functionality -- Cache node disks have to serve as a RAID0.
    • Partial replication -- less than factor of 2 usable space.
 -- Main.samir - 2014-07-10

Revision 22014-07-22 - samir

Line: 1 to 1
 

Computing Topics

Changed:
<
<
Intended to contain all topics to be discussed in meetings. Between an agenda and a notebook.

/raidX deprecation and new homes

Pretty complex plan described in the thread "Migration of $HOME/raidX to DFS". The idea is to get rid of all RAIDs and turn them into hadoop datanodes. the $HOME area will be in an actual RAID5 + HSP. 20 TB of space.

Condor Migration

Supposed to happen in July 16th. 260 cores already in Condor. 80 on their way. SGE will be decommissioned by then.

AAA Usage - tests and instructions

  • We have ComputingAAATests which is intended to be a knowledge base of the group of that works or not in terms of Xrootd access.
  • In principle, works at any site that has a proper site-local-config.xml.
    • The basic procedure with CRAB is - tell it to ignore data location, whenever it runs, it won't find the data, fallback to Xrootd, query the redirector for the file and in principle run fine.
  • Tested in the T3. Works as long as we have $X509_USER_PROXY defined to a location in a shared filesystem.

CRAB 3 testing

We might get some interesting suggestions of exercise for CSA14, including usage of MINIAOD. Despite of that we should evaluate/commission usage of CRAB3 by itself.

Here is the page that contain instructions for CRAB3.

>
>
Intended to list the internal task list and make sure that the priorities are agreed.
 
Added:
>
>

Necessary - ASAP

  • Memory review/purchase -- How much memory the HT capable nodes need. Aim for 3 GB per core. 2 would be ok too.
  • 2014 Deployment plan -- power / placement
    • Get quotes for Nodes/Disks. Compare 6378 with 6382 SE
    • Before finalizing it would be nice to have
  • Monitoring review - test SMS alerts and fine-tune priorities of alerts and where they go
  • PhEDEx migration to better hardware - will support thousands of parallel transfers instead of hundreds.

Necessary - Long term

  • Write CMSSW benchmark suite that could be ran by Condor/CRAB jobs. Compare performance of whole-node to 1 core.
    • Bonus feature - Have CouchDB centralizing jobs outputs. We would know how much EV/s each CPU could do. Automated systems such as WMAgent could use this information.
  • Not so long term -- Figure out the HEP Cluster usage.

Enhancement

  • Start a Cloud site with the $500 Google Cloud promotion. Needs quite some testing. Setting up a minimalistic site would be fast though. Might have problems in the Middleware.
  • CRAB submitting to the T2 through HTCondor CE.
  • T3 Storage - CEPH commissioning, start testing with WN storage.
  • Try registering T3-Higgs as T3_US_Caltech and mimic the T2 activities. Could observe how a "CEPH site" would behave.
    • In principle no mistery, if file reading by jobs and GridFTP works. Already tested ROOT access. Works better than Hadoop (support merging natively)
 -- Main.samir - 2014-07-10

Revision 12014-07-10 - samir

Line: 1 to 1
Added:
>
>

Computing Topics

Intended to contain all topics to be discussed in meetings. Between an agenda and a notebook.

/raidX deprecation and new homes

Pretty complex plan described in the thread "Migration of $HOME/raidX to DFS". The idea is to get rid of all RAIDs and turn them into hadoop datanodes. the $HOME area will be in an actual RAID5 + HSP. 20 TB of space.

Condor Migration

Supposed to happen in July 16th. 260 cores already in Condor. 80 on their way. SGE will be decommissioned by then.

AAA Usage - tests and instructions

  • We have ComputingAAATests which is intended to be a knowledge base of the group of that works or not in terms of Xrootd access.
  • In principle, works at any site that has a proper site-local-config.xml.
    • The basic procedure with CRAB is - tell it to ignore data location, whenever it runs, it won't find the data, fallback to Xrootd, query the redirector for the file and in principle run fine.
  • Tested in the T3. Works as long as we have $X509_USER_PROXY defined to a location in a shared filesystem.

CRAB 3 testing

We might get some interesting suggestions of exercise for CSA14, including usage of MINIAOD. Despite of that we should evaluate/commission usage of CRAB3 by itself.

Here is the page that contain instructions for CRAB3.

-- Main.samir - 2014-07-10

 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback