ITR Proposals for 2003 - PowerPoint PPT Presentation


PPT – ITR Proposals for 2003 PowerPoint presentation | free to download - id: 233be5-OTg3N


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

ITR Proposals for 2003


work at the scientific frontier sure that they are using the most up-to-date ... based data validation and comparison; enable rapid response to new requests; ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 25
Provided by: Claudio48
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: ITR Proposals for 2003

ITR Proposals for 2003
  • Ruth Pordes, Fermilab
  • Joint DOE and NSF Review of
  • U.S. LHC Software and Computing
  • Lawrence Berkeley National Lab, Jan 14-17, 2003

Problems Steps towards Solutions
  • Missing capabilities of Data Handling Systems
    with insufficient effort and organization to
    provide them
  • End to end distributed system for
    hundreds/thousands of Physicists to do Analyses
    as Community Working Groups whose Results can be
    Trusted and Validated by the Experiment
    Enterprise for Publication.
  • Recent experiences with large and data flooded
    experiments shows need for early planning,
    development and performance proof of these
    capabilities - together with their scalability
    and flexibility to respond to changing conditions
    - essential immediately an experiment starts to
    take data.
  • Each generation of experiments with step
    functions in requirements has faced need to
    develop new paradigms and thus increased need for
    attention and effort targetted to this problem
    together with a recognition that the problems are
  • Insuffucient total effort available across the
    LHC technology providers and experiments to
    address all identified needs.
  • Commitment to sharing of load across the
    community through Joint Projects (e.g. between
    ATLAS and CMS) and division of the
    responsibilities (e.g. between LCG/EU and
  • The process of identifying more specific
    computing needs started mid-2002.

Berkeley W/S Nov 2002 -- The Global Picture
Development of A Science Grid Infrastructure
Lothar Bauerdick, Fermilab
identified specific missing pieces
  • Transition of current infrastructure to
    Production Quality Grids.
  • Basic functionalities of a Distributed Data
    Handling System
  • Catalogs and location services,
  • Storage management,
  • High network/end-to-end throughput for
    multi-Terabyte transfers
  • And the System and Functionalities to Enable
    Global Analysis
  • Collection, access, tracking, management of
    shared, dynamic, meta-data and bookkeeping (of
    datasets, information and code) at a Global
  • Capabilities for the physics scientific analysis
    process where Analysis Communities focus on a
    particular topic perform research collaboratively
    across institutional and geographic boundaries
    for weeks, months or years.
  • User interfaces for controlling, accessing,
    querying the Experiment institutional and Work
    Group Private datasets, meta-data, algorithms.
  • Initial agreement that US efforts would look at
    Global Analysis in more detail while EU looked
    at Hardening of Production Grids.

Lothar Bauerdick, Ruth Pordes, Fermilab
led to Vision on Enabling Global Science
  • Proposal to focus on Joint ATLAS/CMS ITR
    proposals for 2003
  • Large ITR Globally Enabled Analysis Communities.
    Focus on Science Challenges as opposed to Grid
    use cases
  • Science Drivers - exotic physics discovery, data
    validation and trigger modifications
  • Democratization of science process..
  • Enable US Universities to be full players in LHC
  • Capabilities and services to do analysis 9 time
    zones from CERN.
  • Forward looking to facilitate new scientific
    regimes and increase discovery potential from LHC
  • Pre-Proprosal submitted to NSF in November.
  • Medium ITR for Enabling Global Collaboration -
    workshop to be held in UTA Jan 28-29th. Proposal
    to be submitted in February.
  • Goals and Requirements show clear IT Challenges
    and need for Computer Science involvement to
    achieve the vision.

Lothar Bauerdick, Ruth Pordes, Fermilab
Typical Science Challenge

A physicist at a U.S. university presents a plot
at a videoconference of the analysis group she is
involved in. The physicist would like to verify
the source of all the data points in the plot.
  • The detector calibration has changed several
    times during the year and she would like to
    verify that all the data has a consistent
  • The code used to create the standard cuts has
    gone through several revisions, only more recent
    versions are acceptable
  • Data from known bad detector runs must be
  • An event is at the edge of a background
    distribution and the event needs to be visualised

Typical Science Challenge
A physicist at a U.S. university presents a plot
at a videoconference of the analysis group she is
involved in. The physicist would like to verify
the source of all the data points in the plot.
Metadata Data Provenance Data Equivalence Collabor
atory Tools User Interfaces
  • The detector calibration has changed several
    times during the year and she would like to
    verify that all the data has a consistent
  • The code used to create the standard cuts has
    gone through several revisions, only more recent
    versions are acceptable
  • Data from known bad detector runs must be
  • An event is at the edge of a background
    distribution and the event needs to be visualised

The Global Environment
  • Globally Enabled Analysis Communities
  • Enabling Global Collaboration

Articulate a Compelling Vision..
  • Researchers .. get rapid response to changing
    need for resources and priorities.. work at the
    scientific frontier sure that they are using the
    most up-to-date software and the most recent data
    .. enable small groups to make new discoveries
    accepted by the experiment enterprise perform
    reliable exploratory analyses on their own ..
    conduct virtual 'table-top' experiments

Pre-Proposal Signatories..
  • Argonne National Laboratory David Malon (P),
    Lawrence Price (P), Mike Wilde (CS)
  • Boston University James Shank (P), Saul Youssef
  • California Institute of Technology Julian Bunn
    (CS), Harvey Newman (P),
  • Fermilab Lothar Bauerdick (P), Dan Green (P),
    Greg Graham (CS), Ruth Pordes (CS),
  • Harvard University John Huth (P), Margo Seltzer
  • Indiana University Frederick Luehring (P),
  • Massachusetts Institute of Technilogy Maarten
    Ballintijn (P G. Roland (P),), Bolek Wyslouch
    (P), J. Zhang (P)
  • Michigan University Jeremy Herr (CS), Peter
    Honeyman (CS), Shawn Mckee (P) , Homer Neal (P),
  • Northeastern University Stephen Reucroft (P)
  • Princeton University David Stickland (P)
  • Oklahoma University Patrick Skubic (P)
  • Rice University Pablo Yepes (P)
  • University of California, Berkeley Marjorie
    Shapiro (P),
  • LBNL Deborah Agarwal (CS), Brian Tierney (CS),
    Craig Tull (CS)
  • University of California, Davis Winsten Ko (P)
  • University of California, Riverside Gail Hanson
    (P), Robert Clare (P)
  • University of California, San Diego James
    Branson (P), Ian Fisk (P)
  • University of Chicago Ian Foster (CS), Robert
    Gardner (P), Frank Merritt (P), Jennifer Schopf
  • University of Florida Paul Avery (P), Richard
    Cavanaugh (P), Sanjay Ranka (CS)

Lothar Bauerdick, Ruth Pordes, Fermilab
Make Relevant to Physics..
  • Topics such as from last Nov collaboration
  • Higgs observation in gluino/ squark
  •  progress on ttH production,      
  • Higgs in ZZ to 4 muons,
  • Higgs self couplings at LHC, 
  • Invisible Higgs in associated production,
  • Charged Higgs production
  • A/H to bb in bbH final states,   
  • Study of WW scattering and related Wjets bkgds, 
  • tau polarisation in stau decays, 
  • CP in Bd to pi-pi,   
  • B0s oscillation measurements
  • And

Management Plan Directed to Experiment needs..
  • John Huth (PI), Lothar Bauerdick and Miron
    Livny (Co-Pis).
  • Will form teams that span Physicists and CS to
    address the different research tasks.
  • Requirements for developing autonomous analysis
    methodologies will be driven by the applications
    physicists and appropriate tools will be
    developed in conjunction with the collaborating
    computer scientists on the teams.
  • Communities of LHC researchers will test these
    tools through testbed activities and Data
    Challenges, feeding back results to the
    development teams in an iterative fashion.
  • The coordination .. will be achieved by close
    collaboration with the LCG (LHC Computing Grid)
  • The management team will administer funds and
    track the progress of the sub-projects using the
    existing project offices that have been set up
    for the U.S. ATLAS and U.S. CMS research
  • A steering body will be established consisting of
    the management and scientific leadership from
    both the physics and the computer science
    communities, from collaborating Grid projects and
    from the LCG.

Commitment to Outreach and Dissemination of
  • Education Work with Quarknet and REU programs
  • Potential Wider Benefits
  • Other scientific analysis communities
  • University based researchers collaborating
    across departmental and institutional boundaries
  • Private grids for industry

Science Challenges as Technology Drivers
  • A small group of University physicists are
    searching for a specific exotic physic signal,
    as the LHC event sample increases over the years.
    Instrumental for this search is a specific
    detector component that those University groups
    have been involved in building. Out of their
    local detector expertise they develop a
    revolutionary new detector calibration method
    that indeed significantly increased the discovery
    reach. They obtain permission to use a local
    University compute center for Monte Carlo
    generation of their exotic signal. Producing the
    required sample and tuning the new algorithm
    takes many months.
  • After analyzing 10 of the available LHC dataset
    of 10 Petabytes with the new method they indeed
    find signals suggesting a discovery! The
    collaboration asks another group of researchers
    to verify the results and to perform simulations
    to increase the confidence by a factor three.
    There is a major conference in few weeks will
    they be able to publish in time?
  • access the meta-data, share the data and transfer
    the algorithms used to perform the analysis
  • quickly have access to the maximum available
    physical resources to execute the expanded
    simulations, stopping other less important
    calculations if need be
  • decide to run their analyses and simulations on
    non-collaboration physical resources to the
    extent possible depending on cost, effort and
    other overheads
  • completely track all new processing and results
  • verify and compare all details of their results
  • provide partial results to the eager researchers
    to allow them to track progress towards a result
    and/or discovery
  • provide complete and up to the minute information
    to the publication decision committee to allow
    them to quickly take the necessary decisions.
  • create and manage dynamic temporary private
    grids provide complete provenance and meta-data
    tracking and management for analysis communities
    enable community based data validation and
    comparison enable rapid response to new
    requests provide usable and complete user
    interaction and control facilities

Science Challenge 2
  • The data validation group is concerned at the
    decrease in efficiency of the experiment for
    collecting new physics signature events, after a
    section of the detector is broken and cannot be
    repaired until an accelerator shutdown. The
    collaboration is prepared to take a short
    downtime of data collection in order to test and
    deploy a new trigger algorithm to increase this
    ratio where each day of downtime has an
    enormous overhead cost to the experiment.
  • The trigger group must develop an appropriate
    modification to the high-level trigger code, test
    it on large sample of simulated events and
    carefully compare the data filter for each of the
    100 triggers in use. During the test period for
    the new algorithm the detector calibration group
    must check and optimize the calibration scheme.
  • identify and define the true configuration of the
    hundreds of thousands of components of the
    detector in the configuration database
  • store and subsequently access sufficient
    information about the previous and this new
    temporary configuration to allow the data
    collected under each condition to be correctly
  • quickly develop and check a suite of new high
    level trigger algorithms integrated with the
    remainder of the official version of the
    application code
  • quickly have access to the maximum available
    physical resources to execute the testing
  • export this information (which is likely to have
    a new metadata schema), to other communities who,
    albeit with less priority, need to adapt and test
    their analyses, and them to the entire
  • evolution and integration of meta-data schema and
    provenance data arbitrarily structured
    meta-data data equivalency

.. And..
  • The SUSY physics community of the collaboration
    consists of a thousand scientists across sixty
    organizations and looks to discover supersymmetry
    (SUSY) a centerpiece of the LHC physics program
    which if it exists has a very small production
    cross section. The most up to date validated
    description of the detector response is vital for
    assessing the physics performance. Initially
    small and then a large event sample corresponding
    to the first physics runs of 10 fb-1 is decided
    upon a dataset of up to 1 Petabyte. Most of the
    computing is associated with simulating
    background interactions which will take months
    of processing on a fair fraction of the complete
    available computing resources. The disparate
    group must coordinate their analyses at every
    stage. They must carefully record the version of
    every data unit and application used, and track
    the calibration and ancillary information used to
    calculate both the final result and the
    systematic and statistical errors.
  • a) process each and every dataset once most
    efficiency through the intelligent co-scheduling
    of data and computation,
  • b) track the complete history of each
    intermediate result,
  • c) manageme to automate reprocessing in the face
    of errors, termination (with tracking) of
    processing in face of unrecoverable problems, and
    complete accounting of failures in processing.
  • d) presentation and delivery of partial results
    and instrumented application monitoring to allow
    informed discussion within the community at
    regular or unscheduled times.
  • e) tools to support community based decisions
    such as whether to kill the complete analysis
  • f) structured data and information management
    of all components and stages of the analysis.
  • g) authorization and access control mechanisms to
    implement both the experiment wide and community
    based policies and protections.

.. A 4th
  • A Professor wishes to work with a graduate
    student to on a subsidiary dataset to research,
    develop and evaluate new parameterization
    algorithms. He wants to provide to guide the
    students research to design and prototype
    innovative techniques and methodologies. This
    investigation needs to be completed in a semester
    for the students research dissertation. There
    are a range of 1000s of major and minor
    variations on the algorithms to be tried, and
    each trial must be run on a dataset sample of a
    few 100 Terabytes to ensure coverage of
    sufficient event signatures.
  • a) describe and integrate specific code modules
    with the broader community versioned analysis
    and processing system.
  • b) estimate and reserve compute and storage
    needs to ensure access to adequate resources
  • c) Automated management of small scale
    submissions and analytic analysis of previous
    submissions to allow prediction of resource
  • d) instrumentation of analysis workflow with
    prioritization policies so that any collaboration
    wide high priority processing can preempt the
    local research investigation
  • e) local provenance tracking and management
    capabilities dependent upon, but loosely coupled
    to the global collaboration wide information
  • f) tools to define and manage private metadata
    schemas and data repositories for the private
  • g) tools to make these private data collections
    available to selected members of the
  • h) tools to bring together private and global
    data for analysis

Goals of the Large ITR
  • Provide individual physicists and groups of
    scientists capabilities from the desktop that
    allow them
  • To participate as an equal in one or more
    Analysis Communities
  • Full representation in the Global Experiment
  • To on-demand receive whatever resources and
    information they need to explore their science
    interest while respecting the collaboration wide
    priorities and needs.
  • Environment for CMS (LHC) Distributed Analysis on
    the Grid
  • Dynamic Workspaces - provide capability for
    individual and community to request and receive
    expanded, contracted or otherwise modified
    resources, while maintaining the integrity and
    policies of the Global Enterprise.
  • Private Grids - provide capability for individual
    and community to request, control and use a
    heterogeneous mix of Enterprise wide and
    community specific software, data, meta-data,

(No Transcript)
  • http//

Working with LCG on Architecture...
Work ing with LCG Architects Forum to extend SC2
endorsed Architecture Blueprint to Interfaces and
Capabilities needed for Global Analysis.
  • Analysis Groups - Communities - are of 1 to many
  • Each community is part of the Enterprise
  • Is assigned or shares the total Computation and
  • Can access and modify software, data, schema
  • is subject the overall organization and
  • Each community has local (private) control of
  • Use of outside resources e.g. local institution
    computing centers
  • Special versions of software, datasets, schema,
  • Organization, policy and practice
  • We must be able to reliably and consistently move
    resources information in both directions
    between the Global Collaboration and the Analysis
  • Communities should be able to share among

Key Issues..
  • Enable remote analysis groups and individual
  • reliable and quick validation, trusted by the
  • demonstrate and compare methods and results
    reliably and improve the turnaround time to
    physics publications
  • quickly respond to and decide upon resource
    requests from analysis groups/physicists,
    minimizing impact to the rest of the
  • established infrastructure for evolution and
    extension for its long life time
  • lower the intellectual cost barrier for new
    physicists to contribute
  • enable small groups to perform reliable
    exploratory analyses on their own
  • increased potential for individual/small
    community analyses and discovery
  • analysis communities will be assured they are
    using a well defined set of software and data
  • Achieving these goals requires innovative
    developments itself.. Appropriate for an ITR.

Next Steps..
  • Define Collaboration and International Support
  • Further workshop at CERN on collaboration and
  • Define Scope and Boundaries within the
  • Refine vision and detail work plans.
  • Write the proposal texts