T2 Puppet
The goal of this page is to document notes for a puppet deployment that is generic enough to work on any OSG CMS T2 site (and potentially T3s).
In principle there are some different ways for puppet design. We could think of :
- role/profile pattern details here - I talked to PuppetLabs guys in IRC and they think that this is the "golden" design pattern out there.
- Very autonomous modules, but they will all come together in site.pp
Some notes of what would be nice to happen :
- Site specific configuration has to go through Hiera
- It would be cool if we can create a model where one can configure Hiera, site.pp, install the modules and be happy.
- For worker-nodes, not assume that everyone is using Condor. Or Hadoop. These guys have to be modular. CMS and OSG software(OASIS) though, should be standard.
Hiera
Where Hiera will be used for. It's not in a lot of places. But here and there in different profiles :
- Compute Element
- ALLOW_WRITE network -- Goes in the local condor config for CEs
- HDFS configuration
- namenode variable
- rack aware script
Profiles
Base (osg_base?)
This is the module intended to hold the VERY BASIC building blocks for most of the Grid services, they are potentially :
- LCMAPS
- LCMAPS-glexec or ...?
- fetch-crl # includes osg-ca-certs
- osg32Repository
- CVMFS / OASIS ? ==> Maybe based in CERNOps/cvmfs ?
So the higher level modules will depend on the ones they need to pull
Worker-Node
What it has to have :
- glexec
- This pulls a special glexec lcmaps setup, maybe another module for this? GUMS URL comes from Hiera.
- cvmfs
- oasis
- HEP_OSLibs
- osg-wn-client
Batch system wise, this will have to be plugabble, so we are better off defining Condor/PBS/LSF profiles and importing them in the same role in site.pp or somewhere else.
Same for storage. It could be hadoop or not.
gridftp-hdfs
The package list is easy, there is a tricky part though. In Caltech's setup, yum-priorities always get installed after some globus pakcages. The result is a mess with some packages from EPEL (voms usually) and some packages from OSG. We should be careful to enforce this to happen in the first puppet run, not the second as I do now.
- gridftp-hdfs packages (replace precise list here)
- lcmaps (different of the glexec one) -- I call this lcmaps-simple (from what I remember the same for CE or SE).
- (not mandatory) Gridftp configuration file - it would be cool to have a way to configure in the puppet module, a limit for active transfers (in case of servers that can't handle a lot).
Deployment notes for nodes/modules
CE
- Setup OSG 3.1, hadoop client. Make sure that ls /mnt/hadoop works ========> YPBIND HAS TO BE OFF - OR HADOOP RPM INSTALL FAILS
- Install CVMFS through compute::cvmfs
- Setup OSG 3.2
- yum --enablerepo=osg clean all =======> after the OSG 3.2 Repository is installed !!!
- Run puppet module
- Copy config.d tarball
- Configure GUMS authentication (LCMAPS) by hand (no big deal) -- USE GLEXEC ONE!!
- yum install lcmaps-plugins-glexec-tracking
- Look at CEMON
- Test with RSV
- Check GRATIA probe and create by hand the missing directory
- Go through the "Services" part and update module
- Make Condor less secure and hassle free - "GSI_SKIP_HOST_CHECK=true" in local configuration file
- MAKE SURE /WNTMP/ HOME DIRECTORIES ARE CREATED AND ACCESSIBLE, CHECK ALL NFS and AUTOMOUNT HOMEDIRS
- If you fail to comply with this, you might face a working CE for tests, that doesn't work for real jobs. You have been warned!
- MAKE SURE that this has the correct permissions per user : /var/lib/globus/gram_job_state/
- Usually this does the trick :
for i in $(ls /var/lib/globus/gram_job_state/ | grep -v condor) ; do chown -R $i /var/lib/globus/gram_job_state/$i ; done
-- Main.samir - 2014-04-11
Topic revision: r8 - 2014-05-21
- samir