Tags: view all tags

T2 Puppet

The goal of this page is to document notes for a puppet deployment that is generic enough to work on any OSG CMS T2 site (and potentially T3s).

In principle there are some different ways for puppet design. We could think of :

role/profile pattern details here - I talked to PuppetLabs guys in IRC and they think that this is the "golden" design pattern out there.
Very autonomous modules, but they will all come together in site.pp

Some notes of what would be nice to happen :

Site specific configuration has to go through Hiera
It would be cool if we can create a model where one can configure Hiera, site.pp, install the modules and be happy.
For worker-nodes, not assume that everyone is using Condor. Or Hadoop. These guys have to be modular. CMS and OSG software(OASIS) though, should be standard.

T2 Puppet

Hiera

Where Hiera will be used for. It's not in a lot of places. But here and there in different profiles :

Compute Element
- ALLOW_WRITE network -- Goes in the local condor config for CEs
HDFS configuration
- namenode variable
- rack aware script

Profiles

Base (osg_base?)

This is the module intended to hold the VERY BASIC building blocks for most of the Grid services, they are potentially :

LCMAPS
LCMAPS-glexec or ...?
fetch-crl # includes osg-ca-certs
osg32Repository
CVMFS / OASIS ? ==> Maybe based in CERNOps/cvmfs ?

So the higher level modules will depend on the ones they need to pull

Worker-Node

What it has to have :

glexec
- This pulls a special glexec lcmaps setup, maybe another module for this? GUMS URL comes from Hiera.
cvmfs
oasis
HEP_OSLibs
osg-wn-client

Batch system wise, this will have to be plugabble, so we are better off defining Condor/PBS/LSF profiles and importing them in the same role in site.pp or somewhere else.

Same for storage. It could be hadoop or not.

gridftp-hdfs

The package list is easy, there is a tricky part though. In Caltech's setup, yum-priorities always get installed after some globus pakcages. The result is a mess with some packages from EPEL (voms usually) and some packages from OSG. We should be careful to enforce this to happen in the first puppet run, not the second as I do now.

gridftp-hdfs packages (replace precise list here)
lcmaps (different of the glexec one) -- I call this lcmaps-simple (from what I remember the same for CE or SE).
(not mandatory) Gridftp configuration file - it would be cool to have a way to configure in the puppet module, a limit for active transfers (in case of servers that can't handle a lot).

Deployment notes for nodes/modules

CE

Setup OSG 3.1, hadoop client. Make sure that ls /mnt/hadoop works ========> YPBIND HAS TO BE OFF - OR HADOOP RPM INSTALL FAILS
Install CVMFS through compute::cvmfs
Setup OSG 3.2
yum --enablerepo=osg clean all =======> after the OSG 3.2 Repository is installed !!!
Run puppet module
Copy config.d tarball
Configure GUMS authentication (LCMAPS) by hand (no big deal) -- USE GLEXEC ONE!!
- yum install lcmaps-plugins-glexec-tracking
Look at CEMON
Test with RSV
- Check GRATIA probe and create by hand the missing directory
Go through the "Services" part and update module
Make Condor less secure and hassle free - "GSI_SKIP_HOST_CHECK=true" in local configuration file
MAKE SURE /WNTMP/ HOME DIRECTORIES ARE CREATED AND ACCESSIBLE, CHECK ALL NFS and AUTOMOUNT HOMEDIRS
- If you fail to comply with this, you might face a working CE for tests, that doesn't work for real jobs. You have been warned!
MAKE SURE that this has the correct permissions per user : /var/lib/globus/gram_job_state/
- Usually this does the trick :

 for i in $(ls  /var/lib/globus/gram_job_state/ | grep -v condor) ; do chown -R $i /var/lib/globus/gram_job_state/$i ; done

-- Main.samir - 2014-04-11

Topic revision: r8 - 2014-05-21 - samir

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback