2016 CMS-Caltech-CERN Summer Students
Students
Kai Chang
kchang2@caltech.edu
Caltech, Travel dates:
Project:
Machine Learning (CERN/Caltech)
Nikolaus Howe
nhh1@williams.edu
CERN, Travel dates: ?? -> June 30
Project: Photon identification and calorimeter imaging with deep learning
Supervisors: Maurizio Pierini, Jean-Roch Vlimant
Aytaj Aghabayli
agaytac14509@sabah.edu.az
CERN, Travel dates:
Project: Optimizing data Quality Monitoring with Machine Learning algorithms
Supervisors: Jean-Roch Vlimant, Federico
Partner: Yandex/EP-CMG-CO
The Data quality monitoring system of the CMS experiment is a software infrastructure producing in real time histograms of sensitive quantities, associated to specific detector components or high-level physics objects (e.g. jet spectra, etc). These histograms are compared to reference plots and the outcome of the comparison is used by dedicated online and offline shifters to judge the quality of the collected data and identify transient problems with the detector. This kind of activity is the textbook case for the application of advanced Machine Learning techniques. Not only one could expand the number of monitored quantities beyond the limit of what is humanly possible. ML algorithms are also extremely good in identify correlations and patterns between features, allowing for the possibility of defining a system capable to predict problems. The student will investigate the possibility of building specific applications for this.
Jayesh Mahapatra
jayeshmahapatra@gmail.com
CERN, Travel dates: 20 June → 20 August
Project: Jet Identification with imaging algorithms
Supervisors: Maurizio Pierini, Jean-Roch Vlimant
Jet tagging, i.e. the identification of the nature of the particle starting a jet showering, is one of the most important tools to perform data analyses at the LHC. Traditionally, Machine Learning (ML) techniques have been exploited to b-jet tagging. Recently, new kind of jet taggers were introduced, to extend the physics reach of the LHC experiments: charm tagging, top tagging, H tagging, W/Z tagging. The use of modern ML techniques could boost the performances of jet tagging algorithms and improve the quality of CMS physics analyses. The candidate will optimise some of the tagging algorithms, investigating two research lines: (i) find the optimal set of variables to be used in the algorithm; (ii) find the best algorithms among those available in recent computing-science literature.
Kaustuv Datta
dattak@reed.edu
CERN,
Travel dates:
Project: Self-teaching photon ID algorithm to maximise discovery chances
Supervisors: Maurizio Pierini, Jean-Roch Vlimant
We propose to train a Deep Neural Network on a dataset of two photon candidates. The DNN should try to cluster the samples in two categories (real photons and fake photons), using as input features the cluster hope variables normally employed for photon IDs. The DNN will learn how to optimally cluster the events maximizing the likelihood ratio L(S+B)/ L(B), where L(B) is the likelihood computed under the hypothesis that the diphoton distribution is described by a background falling function, while the L(S+B) is the probability that a peak of some kind (e.g., Higgs boson at 125
GeV, or some other new- physics resonance at higher values). The ratio L(S+B)/L(B) is intended to be profiled over the signal and background yield.
Federico Presutti
presutti@caltech.edu
CERN, Travel dates:
Project: Real time event classification in CMS
Supervisors: Maurizio Pierini, Dustin Andersen
Partners: Intel/Cloudera/EP-UCM
Olga Lyudchik
<a target="_blank" href="mailto:
helga.lyudchik@gmail.com">
helga.lyudchik@gmail.com</a>
Caltech, Travel dates:
Project: Unsupervised ML algorithms at the LHC
The study of top quark at the LHC makes use of so-called supervised machine learning (ML) algorithms (e.g., the Neural Network used to train the jet b tagging). Unlike other particles decaying to jets (e.g. W, Z, or H), the top quark is abundantly produced at the LHC. The large signal-to-background ratio makes it an ideal case to test the discovery potential of the LHC. We propose to define a search for top quarks in 1l+jets events, based on unsupervised ML algorithms. Without teaching the algorithm what a top quark is and what it looks like, we propose to test the possibility of highlighting the existence of the top quark by clustering similar events into classes, that could then be compared to the known background. The top signal should emerge as an unassociated event category. This would be the first application of unsupervised ML algorithms at the LHC.
Danny Weitekamp
dannyweitekamp@gmail.com
Supervisors: Jean-Roch Vlimant
Caltech, Travel dates:
Project: LHC Event Classification with LSTM-RNN
The typical use of classifier in high energy physics analysis is for
discrimination between two classes being in object identification
(signal versus fakes) of event categorisation (signal versus
background). The typical implementation is using a fixed number of high
level features of the object or the event to be discriminated. Decision
trees in their simplest implementation or with various types of
ensembling are seldom substituted with less fashionable feedforward
neural nets. For a given search or measurement, the model output is used
as a discriminating variable in a cut-and-count or template fit analysis
type. A given physics process at the hard scatter level has often a wide
range of final signatures in the particle detector by virtue of multiple
decay possibilities of elementary particles in stable counterparts
resulting. The number of observable objects (electron, muons, photons,
jets, b-jets, …) can naturally therefore vary. Because of the fixed size
of the input and output of the trained model, a solution often adopted
is to perform the analysis in separate categories or channels and
conduct a combination of results in one final measurement. Another
solution is to use quantile of features in the analysis that are
independent on the number of observable objects in the event (sum of
transverse momenta, invariantes masses, or other combinations). Using
high level combinating features is a potential information loss that is
hard to estimate, while making multiple categories essentially means
duplicating analysis and results in increased work and complications.
Natural language processing is a field of data science that has seen
great improvement in the last decades using deep learning thanks to the
increase of computing power towards training of model with a very large
number of parameters and with the advent of the long short-term memory
(LSTM) cells in recurrent neural nets models (RNN). Recurrent neural
nets are fixed size models that are trained with sequence of inputs are
a time. This make this model adapted to variable size input like texts
made of multiple words in various numbers. Such models are used to
extract and learn the context and meaning of text. The LSTM allows to
correlate the information of inputs far in the input sequence and
outperform regular RNN in text processing.
Instead of establishing various channels or high level feature in high
energy physics analysis for the aforementioned reasons, this technique
should allow to perform the classification across all signatures. We
propose in this project to classify signal and background events of high
energy physics detector using RNN with LSTM. This could be used in
several ways depending on what we want to classify and the observables
chosen for training the model We detail below a few possible angle as
possible starting points. The event description often used in analysis
is in terms of lepton, missing energy (neutrinos) and jets (hadron). The
jets are aggregation of multiple particles as an attempt to collect all
particles from the decay chain of partons originating from the hard
scatter and therefore approximate their kinematic. Particle flow
reconstruction is a method that aims at having individual object for all
stable particle through the detector, which is therefore a more granular
representation of the events. De-facto, in most CMS analysis jet objects
are constructed from the aggregation of particle flow jets.
Ben Bartlett
bartlett@caltech.edu
Caltech, Travel dates:
Project: HGC Reconstruction
Timing (Caltech/FNAL)
Gillian Kopp
gkopp@caltech.edu
Travel dates:
Project:
Daniel Gawerc
dgawerc@caltech.edu
Travel dates:
Project:
BSM/DM Physics (Caltech)
Nicholas Bower
nicholas_bower@brown.edu
Travel dates:
Project:
Tutorials
CMS-Caltech-CERN tutorials
CERN summer student schedule (Anyone can attend, not just CERN students!)
CMS-Caltech-CERN Group
Profs. Maria Spiropulu and Harvey Newman
CERN
Dustin Anderson (Grad student)
Josh Bendavid (Post-Doc)
Adi Bornheim (Staff Scientist)
Jay Lawhorn (G)
Thong Nguyen (G)
Maurizio Pierini (CERN, SS)
Jean-Roch Vlimant (PD)
Zhicai Zhang (G)
Caltech
Dorian Kcira (Caltech Computing Guru)
Javier Duarte (G)
Cristian Pena (G)
FNAL
Si Xie (PD)
Practicalities
Computing Accounts
Follow these instructions carefully to register with CERN and CMS:
Get Account
Complete the CMS pre-registration form.
Request a CMS Computing Account by email. AFS & NICE passwords are now the same for new users.
Validate your accounts by passing
CERN computer security and rules course.
- Register your laptop for DHCP to gain access to the CERN domain.
- Register with HyperNews
- If you are going to CERN, please review the CERN Users' Office web page.
For a Caltech T3 account, send Dorian Kcira (
dkcira@caltech.edu) your:
- full name (Name Surname)
- preferred username
- preferred shell (bash, tcsh)
- ssh public key(s)
- email address
- Certificate DN
- CERN hypernews account name
Housing at CERN
From a local university, including housing and health insurance information:
http://mygisa.ch/guide/
CERN-specific housing advice (rest of site is also very useful):
http://newcomerwelcomecenter.weebly.com/short-term-rentals.html
Hostel preferred in Geneva :
http://www.cstb.ch/
Also see the CERN User's Office site: http://usersoffice.web.cern.ch/regional-info-geneva-france
http://www.glocals.com/
https://www.airbnb.com/
http://www.residhome.com/uk/hotel-residence-aparthotel-prevessinmoens-192.html -- a bit out of the way, but with a CERN bike or the CERN shuttle from Prevessin site it's tolerable.
More from Ms Yasemin Uzunefe-Yazgan
Yasemin.uzunefe.yazgan@cern.ch
Relocation Assistant
US-CMS Project Office at CERN
Building 40-R-A02 E19200
CH-1211 Geneva Switzerland
tel:(+41) 76 487 5868 or 165868
Fax:(+41) 22 766 8361 or 68361
https://twiki.cern.ch/twiki/bin/view/Main/USCMSProjectOfficeCERN
Immediately after arrival at CERN
Register your laptop for DHCP to gain access to the CERN domain.
Register with the Users' Office in the CERN main building (near the main cafeteria). Before you go to the Users' Office, fill out the CERN Registration Form and Home Institution Declaration
Twikis from previous years
Summer Student Index
2015
2014
2013
2012
2011
2009 and 2010
-- Main.jlawhorn - 2016-02-02