Title: Nurcan Ozturk
1Data Discovery Tools, DQ2 Enduser Tools
andPhysics Analysis Tools
- Nurcan Ozturk
- University of Texas at Arlington
- SCHOOL ON HEP_at_TR-GRID
- April 30 May 2, 2008
- Turkish Atomic Energy Authority (TAEA), Ankara,
Turkey
2Outline
- Users work-flow for Data Analysis
- Data Discovery Tools
- AMI - ATLAS Metadata Interface
- TAG Browser - ELSSI
- DQ2 Enduser Tools
- ATLAS Analysis Model
- Analysis Model Forum Recommendations
- Derived Physics Data (DPD)
- Analyzing the Data (inside or outside Athena)
- AthenaRootAccess (ARA)
- EventView
3Users Work-flow for Data Analysis
Setup the analysis code
Locate the data
Setup the analysis job
Submit to the Grid
Retrieve the results
Analyze the results
4Data Discovery Tools
5ATLAS Metadata Interface (AMI)
http//ami3.in2p3.fr8080/opencms/opencms/AMI/www/
index.html
- AMI is a bookkeeping project.
- AMI is a generic cataloging system (a database
application). The majority of datasets currently
catalogued in AMI are Monte Carlo datasets. AMI
reads information from the task request system,
and correlates it with information read from the
production database. - AMI contains the physics metadata for
- 2008 real data
- 2008 FDR exercise
- 2007 Cosmics runs (M5 data)
- 2006/2007 service challenge datasets
- StreamTest
- Data Challenges DC1 and DC2 / Rome Production
System - Combined Test Beam
- AMI also powers the TagCollector release
management tool.
6AMI Tutorial
- http//ami3.in2p3.fr8080/opencms/opencms/AMI/www/
Tutorial/ - Or
- http//ami3.in2p3.fr8080/opencms/opencms/AMI/www/
Tutorial/FastTrackTutorial.html - What is AMI?
- Where does AMI get its Information?
- How do I search for a dataset?
- Which information can I get from the result of an
AMI dataset search? - What is the schema of the AMI dataset catalogue?
- Why can I sometimes not find a dataset when I can
see its existence in other catalogues? - Can I refine the search?
- Can I simply browse all of the information in
AMI? - Can I bookmark an AMI page?
- Why doesn't the back button of my browser work?
- Can I use AMI without going through the web
interface? - How can I extract information from AMI?
- How to I write to AMI?
7How Do I Search For A Dataset? Simple Search
Follow the link to the simple search interface
from the tutorial page
type here
8Results From Simple Search (1)
pull down menu
link
link
links
9Results From Simple Search (2)
When you click on Provenance link it shows what
version of Athena software used in making
evgen/digit/reco
10Results From Simple Search (3)
When you click on DQ2 link it shows DQ2 Dataset
Metadata, existing replicas of the dataset, a
link to PanDA monitor
11Results From Simple Search (4)
When you click on PANDA link It gets you to the
dataset browser
12How Do I Search For A Dataset? Advanced Search
Follow the link to the Advanced search
interface from the tutorial page
13Results From Advanced Search
14TAG
- ATLAS will produce petabytes of data, a system of
event-level metadata is needed to quickly
identify and select events that are interested
for a given analysis. This is provided by TAG
files, and the TAG database. - TAG files are built from AOD according to offline
analysis-style code. TAG files are then loaded
into TAG database. - TAG files store information about the status of
each sub-detector, trigger and physics object ID. - For instance for FDR-1 data TAGs contain
- Event information
- Run number, event number, luminosity block,
number of vertices and tracks, primary vertex
position. (Luminosity has an entry but not
filled) - Variables such as the summed cell Et, missing Et
magnitude, and phi - Trigger information BitMasks encode pass, pass
after prescale for each trigger item/chain - Physics objects
- multiplicity of physics objects and the Pt, eta,
phi for the highest Pt objects - A tightness criterion for e/mu/gamma is included
as is b-tag likelihoods and tau candidate
likelihood. - PhysWords 32-bit TAG Word. For b-physics for
instance - Bit 0 HighPtMuonPair, Bit 1 J/Psi candidate,
Bit 2 Upsilon candidate. - See more details for FDR TAGs from a talk by
James Frost, April Exotics Working Group meeting
15How Does TAG Selection Work?
- Use the TAG file as an input to EventSelector or
PoolTAGInput. - Make sure the matching Pool file (eg. AOD) is in
the PoolFileCatalog. - Define you query of the TAG content.
- Run the job.
- Very flexible
- Can use the TAG to preselect the events from an
AOD in which you are interested, passing only
those to an analysis algorithm. - Can use the ATG to write out an AOD (or ESD, RDO)
of only the selected events. - How to learn more? Good tutorials are available
already - https//twiki.cern.ch/twiki/bin/view/Atlas/FeedBac
kForTags - https//twiki.cern.ch/twiki/bin/view/Atlas/TagForE
ventSelection - https//twiki.cern.ch/twiki/bin/view/Atlas/TagForE
ventSelectionBuilding_Tags_Under_12_0_31 (create
tag files) - https//twiki.cern.ch/twiki/bin/view/Atlas/Physics
AnalysisWorkBookTAG - https//twiki.cern.ch/twiki/bin/view/Atlas/Physics
AnalysisWorkBookTAGAnalysis - https//twiki.cern.ch/twiki/bin/view/Atlas/TopFdrT
ag - http//twiki.mwt2.org/bin/view/Main/TutorialTag080
318 (All the above links are available from this
one.)
16TAG Browser ELSSI (1)
- TAGs are accessed by users via a web interface
called ELSSI, the ATLAS Event Level Selection
Service Interface. - For FDR-1 data (tutorial) https//atldbdev01.cern
.ch/tagservices/tutorial/index.htm - For FDR-1 data https//atldbdev01.cern.ch/tagser
vices/fdr/index.htm
You need Firefox to see this page As Jack
Cranshaw informed me.
17TAG Browser ELSSI (2)
- How to use ELSSI
- Define a query to select runs, streams, data
quality, trigger chains, - Review the query
- Execute the query and retrieve the TAG file (a
root file)
18DQ2 Enduser Tools
19The Client Tools to Retrieve Data
- DQ2 enduser tools
- Includes dq2_xxx (dq2_ls, dq2_get, etc) commands
- Available to download from
- https//twiki.cern.ch/twiki/bin/view/Atlas/U
singDQ2Download - The setup files are edited to accommodate local
needs (dq2.sh, setup.sh) - Available on AFS at CERN
- source /afs/cern.ch/project/gd/LCG-share/curr
ent/etc/profile.d/grid_env.sh - source /afs/cern.ch/atlas/offline/external/GR
ID/ddm/endusers/setup.sh.CERN - gLite UI (User Interface)
- Includes lcg-cp, egee-gridftp-xxx
- Available on AFS at CERN
- source /afs/usatlas.bnl.gov/lcg/current/etc/p
rofile.d/grid_env.sh - source /afs/cern.ch/project/gd/LCG-share/curr
ent/external/etc/profile.d/grid-env.sh - Why glite UI may be needed in OSG
- dq2_put/get may use some gLite commands
depending on the site they interact with
(TiersOfATLASCache.py description) lcg-lg,
lcg-rf, glite-gridftp-ls, lcg-gt - More Info
- https//twiki.cern.ch/twiki/bin/view/Atlas/D
DMEndUserTutorial
20DQ2 Enduser Tools
- dq2_ls returns a list of datasets matching a
given pattern - dq2_ls fdr08_run1.0003051.StreamEgamma.merge.AOD.
o1_r6_t1 - dq2_get copies the files from DQ2 to a local
area - dq2_get rv fdr08_run1.0003051.StreamEgamma.merge
.AOD.o1_r6_t1 - dq2_put registers datasets to DQ2
- dq2_poolFCjobO creates PoolFileCatalog and
Athena job-option for DQ2 datasets - dq2_register uploads and registers external
generator input files to DQ2 - dq2_cleanup deletes a dataset from a site's
catalog and storage. - dq2_sample copies a portion of an existing
dataset and registers it to DQ2 - More info
- https//twiki.cern.ch/twiki/bin/view/Atlas/UsingDQ
2DQ2_end_user_tools
21ATLAS Analysis Model
22Analysis Model Forum Recommendations on the
Analysis Model
includes metadata simple UserData
23Derived Physics Data - DPnD
- Primary DP1D POOL-based DPD produced by the GRID
production system. There are expected to be O(10)
primary DPDs, so the contents will not be very
specific to an analysis. It is expected to be
skimmed (keeping only interesting events),
slimmed (keeping only interesting objects, for
example electrons and muons), and thinned
(keeping only the subset of information inside
objects that is relevant in future steps)
compared to the AOD. - An Example Job Options file AODtoDPD.py (see CVS)
- Packages In CVS TopDPDMaker, TauDPDMaker,
BPhysicsDPDMaker, SUSYDPDMaker - Secondary DP2D POOL-based DPD with more
analysis-specific information. Typically, this is
produced from Primary DPD and may be created
using an Athena tool like EventView. - SimpleThinningExample
- HighPtViewDPDThinningTutorial
- Tertiary DP3D Does not need to be POOL-based, it
includes flat ntuples.
24Analyzing the Data
- Inside Athena
- Interactive or batch using C, python code.
- Needs a part from Athena (depends on user needs).
- Provides full access to all tools and services.
- Outside Athena AthenaRootAccess (ARA)
- CINT, or using python, or compiled C code.
- Does not need full Athena installation (expected
1GB) - Not all classes are available (example,
calo-Cells) - Important both methods use the same files as
input.
25ARA - AthenaRootAccess
- Allows to read an AOD in ROOT like you would read
a normal ntuple (without using Athena). - The goal is to seamlessly use Athena tools.
- One can use identical code/tools to run on ESDs,
AODs, DPDs. - The names of the variables in the AOD ROOT tree
are the same as in the AOD. - Limitations
- However it uses the transient classes and
converters of the ATLAS software so a portion of
the offline is needed. A 1GB distribution
including Athena libraries. - Tools and data that need detector description,
conditions, B-field etc, cannot be called in ARA.
However this type of info can be put in UserData
in DPD. - Gaudi based classes (like AlgTools, Services)
dont work in ARA. Wrapping machinery is needed
to reuse the code in Athena/ARA.
26ARA Examples (1)
- CINT macros
- Easy development (change code and run),
- Run time is slow x10 C compiled code
- C compiled code
- Slower development (change code, recompile,
cannot reload libs) - Fastest runtime
- Integrates easily back into Athena
- Python scripts
- Easy development (change code, reload and run)
- Simple example shows runtime x3 C compiled
code - May be able to compile Python
- Integration of developed code into Athena?
- Examples on Twiki and in Release
- https//twiki.cern.ch/twiki/bin/view/Atlas/AthenaR
OOTAccess - PhysicsAnalysis/AthenaROOTAccessExamples
27ARA Examples (2)
- Available in CVS under PhysicsAnalysis/AthenaROOTA
ccessExamples - Need python script to open file and setup
transient tree - lxplusgt get_files AthenaROOTAccess/test.py
- Compiled C Example
- lxplusgt root
- root 0 TPythonExec("execfile('test.py')")
- root 1 CollectionTree_trans (TTree
)gROOTgtGet("CollectionTree_trans") - root 2 ClusterExample ce // Example class in
AthenaROOTAccessExamples - root 3 ce.plot(CollectionTree_trans)
- root 4 TruthInfo ti
- root 5 ti.truth_info(CollectionTree_trans)
- test.py takes about 20 secs to load necessary
dictionaries - One can recompile and then restart from the
beginning
28ARA Examples (3)
- CINT Example
- lxplusgt root
- root 0 TPythonExec("execfile('test.py')")
- root 1 CollectionTree_trans (TTree
)gROOT-gtGet("CollectionTree_trans") - root 2 gROOT-gtLoadMacro("AthenaROOTAccessExample
s/macros/cluster_example.C") - root 3 plot(CollectionTree_trans)
- One can now edit cluster_example.C and re-run
LoadMacro - Python Example
- lxplusgt python -i test.py
- gtgtgt import AthenaROOTAccessExamples.cluster_exampl
e - gtgtgt AthenaROOTAccessExamples.cluster_example.plot(
tt) - One can now edit cluster_example.py and re-run
- gtgtgt reload(AthenaROOTAccessExamples.cluster_exampl
e) - gtgtgt AthenaROOTAccessExamples.cluster_example.plot(
tt)
29Analysis Frameworks EventView (1)
- This framework provides general tools for common
analysis tasks like - particle selection
- overlap removal
- observable calculation
- combinatorics
- Recalibration
- systematics evaluation
- generating ntuples
- Users can perform a great deal of their analyses
in Athena by chaining and configuring a set of
these tools and producing an ntuple for further
analysis in ROOT. - Twiki page
https//twiki.cern.ch/twiki/bin/view/Atlas/EventVi
ew
30Analysis Frameworks EventView (2)
- Though this style of "modular" analysis usually
does not require writing C, the EventView
framework is completely extensible, so if
necessary users can easily develop and mix their
own C tools with the common EventView tools and
share their configurations and tools with other
collaborators. - Most users are introduced to EventView through
one of the "View" packages (eg TopView, SusyView,
HighPtView) which for the most part collect
configurations of EventView tools for a specific
set of analyses and produce a standard ntuple
output. - These users typically start by analyzing the View
ntuples produced by the various physics working
groups, and then continue to re-configuring and
re-running the respective View package if they
require additional tuning for their specific
analyses. - There also efforts to evolve (the persistent
piece of) EventView in the context of
AthenaROOTAccess.
31We will practice with the tools during the
tutorial.