2016 CMS-Caltech-CERN Summer Students

Students

Kai Chang

kchang2@caltech.edu

Caltech, Travel dates:

Project:

Machine Learning (CERN/Caltech)

Nikolaus Howe

nhh1@williams.edu

CERN, Travel dates: ?? -> June 30

Project: Photon identification and calorimeter imaging with deep learning

Supervisors: Maurizio Pierini, Jean-Roch Vlimant

Aytaj Aghabayli

agaytac14509@sabah.edu.az

CERN, Travel dates:

Project: Optimizing data Quality Monitoring with Machine Learning algorithms

Supervisors: Jean-Roch Vlimant, Federico

Partner: Yandex/EP-CMG-CO

The Data quality monitoring system of the CMS experiment is a software infrastructure producing in real time histograms of sensitive quantities, associated to specific detector components or high-level physics objects (e.g. jet spectra, etc). These histograms are compared to reference plots and the outcome of the comparison is used by dedicated online and offline shifters to judge the quality of the collected data and identify transient problems with the detector. This kind of activity is the textbook case for the application of advanced Machine Learning techniques. Not only one could expand the number of monitored quantities beyond the limit of what is humanly possible. ML algorithms are also extremely good in identify correlations and patterns between features, allowing for the possibility of defining a system capable to predict problems. The student will investigate the possibility of building specific applications for this.

Jayesh Mahapatra

jayeshmahapatra@gmail.com

CERN, Travel dates: 20 June → 20 August

Project: Jet Identification with imaging algorithms

Supervisors: Maurizio Pierini, Jean-Roch Vlimant

Jet tagging, i.e. the identification of the nature of the particle starting a jet showering, is one of the most important tools to perform data analyses at the LHC. Traditionally, Machine Learning (ML) techniques have been exploited to b-jet tagging. Recently, new kind of jet taggers were introduced, to extend the physics reach of the LHC experiments: charm tagging, top tagging, H tagging, W/Z tagging. The use of modern ML techniques could boost the performances of jet tagging algorithms and improve the quality of CMS physics analyses. The candidate will optimise some of the tagging algorithms, investigating two research lines: (i) find the optimal set of variables to be used in the algorithm; (ii) find the best algorithms among those available in recent computing-science literature.

Kaustuv Datta

dattak@reed.edu

CERN, Travel dates:

Project: Self-teaching photon ID algorithm to maximise discovery chances

Supervisors: Maurizio Pierini, Jean-Roch Vlimant

We propose to train a Deep Neural Network on a dataset of two photon candidates. The DNN should try to cluster the samples in two categories (real photons and fake photons), using as input features the cluster hope variables normally employed for photon IDs. The DNN will learn how to optimally cluster the events maximizing the likelihood ratio L(S+B)/ L(B), where L(B) is the likelihood computed under the hypothesis that the diphoton distribution is described by a background falling function, while the L(S+B) is the probability that a peak of some kind (e.g., Higgs boson at 125 GeV, or some other new- physics resonance at higher values). The ratio L(S+B)/L(B) is intended to be profiled over the signal and background yield.

Federico Presutti

presutti@caltech.edu

CERN, Travel dates:

Project: Real time event classification in CMS

Supervisors: Maurizio Pierini, Dustin Andersen

Partners: Intel/Cloudera/EP-UCM

Olga Lyudchik

<a target="_blank" href="mailto: helga.lyudchik@gmail.com"> helga.lyudchik@gmail.com</a>

Caltech, Travel dates:

Project: Unsupervised ML algorithms at the LHC

The study of top quark at the LHC makes use of so-called supervised machine learning (ML) algorithms (e.g., the Neural Network used to train the jet b tagging). Unlike other particles decaying to jets (e.g. W, Z, or H), the top quark is abundantly produced at the LHC. The large signal-to-background ratio makes it an ideal case to test the discovery potential of the LHC. We propose to define a search for top quarks in 1l+jets events, based on unsupervised ML algorithms. Without teaching the algorithm what a top quark is and what it looks like, we propose to test the possibility of highlighting the existence of the top quark by clustering similar events into classes, that could then be compared to the known background. The top signal should emerge as an unassociated event category. This would be the first application of unsupervised ML algorithms at the LHC.

Danny Weitekamp

dannyweitekamp@gmail.com

Supervisors: Jean-Roch Vlimant

Caltech, Travel dates:

Project: LHC Event Classification with LSTM-RNN

The typical use of classifier in high energy physics analysis is for discrimination between two classes being in object identification (signal versus fakes) of event categorisation (signal versus background). The typical implementation is using a fixed number of high level features of the object or the event to be discriminated. Decision trees in their simplest implementation or with various types of ensembling are seldom substituted with less fashionable feedforward neural nets. For a given search or measurement, the model output is used as a discriminating variable in a cut-and-count or template fit analysis type. A given physics process at the hard scatter level has often a wide range of final signatures in the particle detector by virtue of multiple decay possibilities of elementary particles in stable counterparts resulting. The number of observable objects (electron, muons, photons, jets, b-jets, …) can naturally therefore vary. Because of the fixed size of the input and output of the trained model, a solution often adopted is to perform the analysis in separate categories or channels and conduct a combination of results in one final measurement. Another solution is to use quantile of features in the analysis that are independent on the number of observable objects in the event (sum of transverse momenta, invariantes masses, or other combinations). Using high level combinating features is a potential information loss that is hard to estimate, while making multiple categories essentially means duplicating analysis and results in increased work and complications. Natural language processing is a field of data science that has seen great improvement in the last decades using deep learning thanks to the increase of computing power towards training of model with a very large number of parameters and with the advent of the long short-term memory (LSTM) cells in recurrent neural nets models (RNN). Recurrent neural nets are fixed size models that are trained with sequence of inputs are a time. This make this model adapted to variable size input like texts made of multiple words in various numbers. Such models are used to extract and learn the context and meaning of text. The LSTM allows to correlate the information of inputs far in the input sequence and outperform regular RNN in text processing.

Instead of establishing various channels or high level feature in high energy physics analysis for the aforementioned reasons, this technique should allow to perform the classification across all signatures. We propose in this project to classify signal and background events of high energy physics detector using RNN with LSTM. This could be used in several ways depending on what we want to classify and the observables chosen for training the model We detail below a few possible angle as possible starting points. The event description often used in analysis is in terms of lepton, missing energy (neutrinos) and jets (hadron). The jets are aggregation of multiple particles as an attempt to collect all particles from the decay chain of partons originating from the hard scatter and therefore approximate their kinematic. Particle flow reconstruction is a method that aims at having individual object for all stable particle through the detector, which is therefore a more granular representation of the events. De-facto, in most CMS analysis jet objects are constructed from the aggregation of particle flow jets.

Ben Bartlett

bartlett@caltech.edu

Caltech, Travel dates:

Project: HGC Reconstruction

Timing (Caltech/FNAL)

Gillian Kopp

gkopp@caltech.edu

Travel dates:

Project:

Daniel Gawerc

dgawerc@caltech.edu

Travel dates:

Project:

BSM/DM Physics (Caltech)

Nicholas Bower

nicholas_bower@brown.edu

Travel dates:

Project:

Tutorials

CMS-Caltech-CERN tutorials

CERN summer student schedule (Anyone can attend, not just CERN students!)

CMS-Caltech-CERN Group

Profs. Maria Spiropulu and Harvey Newman

CERN

Dustin Anderson (Grad student)

Josh Bendavid (Post-Doc)

Adi Bornheim (Staff Scientist)

Jay Lawhorn (G)

Thong Nguyen (G)

Maurizio Pierini (CERN, SS)

Jean-Roch Vlimant (PD)

Zhicai Zhang (G)

Caltech

Dorian Kcira (Caltech Computing Guru)

Javier Duarte (G)

Cristian Pena (G)

FNAL

Si Xie (PD)

Practicalities

Computing Accounts

Follow these instructions carefully to register with CERN and CMS: Get Account

Complete the CMS pre-registration form.

Request a CMS Computing Account by email. AFS & NICE passwords are now the same for new users.

Validate your accounts by passing CERN computer security and rules course.

  1. Register your laptop for DHCP to gain access to the CERN domain.
  2. Register with HyperNews
  3. If you are going to CERN, please review the CERN Users' Office web page.
For a Caltech T3 account, send Dorian Kcira ( dkcira@caltech.edu) your:

  • full name (Name Surname)
  • preferred username
  • preferred shell (bash, tcsh)
  • ssh public key(s)
  • email address
  • Certificate DN
  • CERN hypernews account name

Housing at CERN

From a local university, including housing and health insurance information: http://mygisa.ch/guide/

CERN-specific housing advice (rest of site is also very useful): http://newcomerwelcomecenter.weebly.com/short-term-rentals.html

Hostel preferred in Geneva : http://www.cstb.ch/

Also see the CERN User's Office site: http://usersoffice.web.cern.ch/regional-info-geneva-france

http://www.glocals.com/

https://www.airbnb.com/

http://www.residhome.com/uk/hotel-residence-aparthotel-prevessinmoens-192.html -- a bit out of the way, but with a CERN bike or the CERN shuttle from Prevessin site it's tolerable.

More from Ms Yasemin Uzunefe-Yazgan

Yasemin.uzunefe.yazgan@cern.ch

Relocation Assistant

US-CMS Project Office at CERN

Building 40-R-A02 E19200

CH-1211 Geneva Switzerland

tel:(+41) 76 487 5868 or 165868

Fax:(+41) 22 766 8361 or 68361

https://twiki.cern.ch/twiki/bin/view/Main/USCMSProjectOfficeCERN

Immediately after arrival at CERN

Register your laptop for DHCP to gain access to the CERN domain.

Register with the Users' Office in the CERN main building (near the main cafeteria). Before you go to the Users' Office, fill out the CERN Registration Form and Home Institution Declaration

Twikis from previous years

Summer Student Index

2015

2014

2013

2012

2011

2009 and 2010

-- Main.jlawhorn - 2016-02-02

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r10 - 2016-05-19 - dkcira
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback