Ian M. Fisk - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Ian M. Fisk

Description:

Three activities are driving most of the US-CMS Software and Computing efforts ... Capability (Yujun Wu, James Letts, Michael Ernst, Tanya Levshina) ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 53
Provided by: usc1
Category:
Tags: fisk | ian | james | tanya

less

Transcript and Presenter's Notes

Title: Ian M. Fisk


1
Preparations for CMS Milestones
  • Ian M. Fisk

2
Introduction
  • Three activities are driving most of the US-CMS
    Software and Computing efforts through the next
    twelve months
  • The CMS 2004 Data Challenge
  • The preparation for the Physics TDR
  • The roll out of the LCG distributed computing
    grid
  • In order for US-CMS to meet its obligations there
    is substantial effort expended in both
    sub-projects Core Application Software (CAS) and
    User Facilities (UF). We decided to present the
    activities together because the milestones and
    goals are common to both.

3
CMS Milestones DC04
  • Data Challenge 2004 (DC04)
  • The purpose of the milestone is to demonstrate
    the validity of the software baseline
  • Successfully cope with a sustained data-taking
    rate of 25Hz at 0.2x1034 luminosity for a period
    of 1 month
  • Validate the deployed grid model on a sufficient
    number of Tier-0, Tier-1, and Tier-2 sites.
  • Completing this milestone requires
  • Software Development
  • Enhancing the distributed computing
    infrastructure
  • Generating 50 million events

4
CMS Milestones Physics TDR
  • Physics TDR
  • A physics validity test of software, computing,
    and peoples knowledge and skills
  • Consists of two volumes
  • Detector Response, physics objects, calibration
    and parameterization.
  • High Level Analysis small number of full
    analyses and larger number of general physics
    topics
  • Completing this milestone requires
  • Lots of Analysis
  • More Event Production
  • Improvements in software and analysis tools

5
LCG Rollout
  • The deployment of the LCG prototypes is not by
    itself a CMS milestone, but the functionality
    expected from the releases is often tightly
    coupled to CMS milestones
  • CMS is expected some functionality for DC04 for
    data and process management
  • LCG can provide a consistent set of grid
    middleware
  • Their initial methods for this has often been
    more intrusive than is acceptable to
    participating institutions
  • Longer term goal of individual components and
    services
  • Short term techniques of installing a complete
    environment
  • LCG ramp up of functionality has not been as fast
    as they expected
  • US-CMS is evaluating ways of interfacing
    efficiently with the LCG
  • Trying to balance our need for distributed
    computing development and tools for producing
    events, with the need to develop an worldwide
    system.

6
Preparations for DC04
  • CAS Pieces
  • Preparing the Software Framework (Bill Tanenbaum,
    Vladimir Litvin)
  • Preparing the Software Support Components (Lassi
    Tuura, Xhen Xie, Michael Case, Ianna Osborne,
    Shahzad Muzaffar, Michael Case)
  • Preparing the Tools for Distributing Software and
    Running Jobs (Natalia Ratnikova and Greg Graham)
  • UF Pieces
  • Facilities Improvements
  • Capacity (David Fagan, Michael Ernst)
  • Capability (Yujun Wu, James Letts, Michael Ernst,
    Tanya Levshina)
  • Distributed Production (Greg Graham, Anzar Afaq,
    Joe Kaiser)
  • Analysis Environment (Hans Wenzel)

7
Software Framework Improvements
  • A little over a year ago CMS decided to formally
    switch away from Objectivity.
  • Objectivity provided us with a persistency
    mechanism but also
  • a file catalogue, data streaming tools, failure
    recovery tools
  • The LCG developed persistency mechanism POOL is
    our new baseline solution
  • Bill Tanenbaum worked to extract Objectivity for
    the CMS software framework COBRA
  • There was a prototype that wrote directly to
    ROOT-IO files
  • Bill is now working on interfacing POOL to COBRA
  • We are about a month delayed on milestone
    2.1.2.14 for the release of the Cobra POOL
    integration of the pre-challenge production
  • This schedule was very tight to begin with
  • We expect a release on the 14th of July

8
Software Framework Improvements
  • Vladimir Litvin is working on coordinating and
    developing the Calorimetry Framework
  • Improving navigation efficiency.
  • Interfacing the Calorimeter code with the common
    XML detector description database (DDD)
  • Are on schedule for Release of code for
    pre-challenge production reconstruction
    (2.1.3.1.6)

9
Software Support Tools
  • The biggest and most important was the new
    persistency mechanism
  • POOL, which is progressing as planned (WBS
    2.1.1.1) (Zhen Xie)
  • Version 0.5 provided a clean and generic
    interface to persistency and was released
    on-time.
  • Contain references to object identifiers in V1.0
    (WBS 2.1.1.1.7) also released on time
  • POOL Catalogue can either be XML for local sites,
    mySQL for large installations and clusters, and
    soon an EDG Grid catalogue for distributed
    systems
  • POOL has been remarkably true to a fairly
    ambitious schedule the was set in November, but
    the interface of POOL to the Distribute Data
    Management System has been delayed
  • Should still be available in time for DC04, but
    will not be useful in the Pre-Challenge Production

10
Software Support Tools (2)
  • Completing a maintenance and integration phase
    of Detector Description Database (WBS 2.1.3.3)
    (Michael Case).
  • Code has been released in the CMSToolBox
  • Reconstruction now mainly reading geometry from
    DDD
  • Michael Case currently working on cleaning up and
    adding features
  • The expertise gained during this project is being
    directed toward the development of the CMS
    conditions database
  • IGUANA visualization progressing (Ianna Osborne,
    Shahzad Muzaffar
  • Modified to read the new ROOT-IO persistency in
    COBRA without incident.
  • Impressive demonstration at CHEP
  • Common SEAL Framework (Lassi Tuura)
  • New LCG Project
  • Puts a lot of common infrastructure in a common
    library

11
Tools for Running and Distributing
  • Greg Graham has released a version of MC_RunJob
    for Pre Challenge Production (Completing WBS
    2.3.2.2.5)
  • MC_RunJob is the CMS tool which replaced IMPALA
    for specifying CMS production jobs.
  • MC_RunJob is used for local and grid production
  • Interesting additional effort from D0 and CDF
    Groups
  • Natalia Ratnikova released proposed improvements
    for Distribution After Release Tool (DAR)
  • Ability to read software configuration
    automatically from reference database
  • Should reduce human interaction and possibility
    for error
  • Investigating possibility of a simple database of
    installed software distributions
  • Looking at an environment for automating the
    build and install process

12
Software Status
  • The OSCAR GEANT4 based simulation has been slow
    to be validated
  • US-CMS does not currently have an OSCAR
    contribution
  • OSCAR was listed as a necessary component of
    DC04. However,
  • There were significant stability issues at the
    beginning
  • Then there were speed issues
  • OSCAR events were initially 10 times slower than
    CMSIM events. They have since become 2-3 and
    recently less than 2 times slower
  • There were issues with the amount of memory, but
    it will be OK
  • We will start the Pre-Challenge Production with
    10 million CMSIM events. We hope to start OSCAR
    production at the beginning of August.
  • This doesnt leave a lot of time for testing the
    infrastructure, but we need to get started
    running events

13
Preparing the Facilities
  • In order to be ready of the Data Challenge the
    User Facilities
  • Needs to increase the US-CMS Capacity
  • Increase the processing and storage at the Tier1
    and Tier2 centers
  • Need to increase our services offered
  • RD effort to increase the efficiency that we use
    the facilities we have
  • Networking, data serving and transfer, etc.
  • Need to improve the automation of simulation
    processing and later event reconstruction
  • Improvements and extensions to the Distributed
    Processing Environment (DPE)
  • Increase in scale and robustness
  • Testing middleware and components
  • Establishing grid services

14
Increasing the US Resources
  • Predicted resources for CMS
  • The US Tier1 Center represents 10 of all the
    offsite resources

100000
10000
kSI2000.month
1000
100
10
Years Boxes at FNAL
2002 2003 2004 2005
2006 2007 2008 2009
136 190
270 300 700 1100
1500
15
Tier1 Procurements in FY03
  • Currently the US Tier1 Center has
  • 40 Dual CPU P3 750 nodes which will be out of
    maintenance in Oct.
  • We will keep them running until the end of the
    year at RH6.1 for Objectivity Licensing reasons
    to access and analyze old data
  • 66 Dual CPU 1900 Athlon nodes purchased 6 months
    ago
  • Together these are less than half the SI2000
    expected from the US Tier1 center in 2003
  • Currently procuring 76 new dual CPU 2.4 GHz Xeon
    nodes
  • 60 will be used for production and reconstruction
  • 16 will be reserved for analysis users

16
Resources at Tier2s
  • The hierarchical Tier model calls for the total
    of the Tier2 resources to sum to the resources at
    the Tier1 center
  • Through iVDGL support the Tier2 centers have been
    upgrading to fulfill US-CMS obligations to CMS
  • Caltech has procured 32 dual Xeon 2.4 GHz compute
    nodes
  • UCSD has 20 and will procure an additional 10
  • Florida is in the process of upgrading with
    additional support from U-Florida
  • The resources available are about what was
    expected
  • Tier2 centers made significant contributions to
    the last official CMS official production
  • Expected to make a big contribution to DC04

17
Improving our services Networking
  • Offsite data transfer requirements have
    consistently outpaced available bandwidth
  • Upgrade by ESnet to OC12 (12/02) becoming
    saturated already
  • FNAL planning to obtain an optical network
    connection to the premier optical network
    switching center on the North American continent
    StarLight in Chicago, enables network research
    and holds promise for
  • Handling peak production loads for times when
    production demand exceeds what ESnet can supply.
  • Acting as a backup if the ESNET link is
    unavailable
  • Potential on a single fiber pair
  • Wavelength Division Multiplexing (WDM) for
    multiple independent data links
  • Capable of supporting 66 independent 40 GB/s
    links if fully configured
  • Initial configuration is for 4 independent 1 Gbps
    links across 2 wavelength
  • Allows to configure bandwidth to provide a mix of
    immediate service upgrades as well as validation
    of non-traditional network architectures
  • Immediate benefit to production bulk data
    transfers, a test bed for high performance
    network investigations and scalability into the
    area of LHC operations

18
Current Off-site Networking
  • All FNAL off-site traffic carried by ESnet link
  • ESnet Chicago PoP has 1Gb/s Starlight link
  • Peering with CERN, Surfnet, CAnet there
  • Also peering with Abilene there (for now)
  • ESnet peers with other networks at other places

19
Proposed Network Configuration
  • Dark fiber is an alternate path to
    StarLight-connected networks
  • Also an alternate path back into ESnet

20
End-to-End Performance/ Network Performance and
Prediction
  • Need to actively pursue Integration of Network
    Stack Implementations supporting Ultrascale
    Networking for Rapid Data Transactions and
    Data-Intensive Dynamic Workspaces
  • Maintain Statistical Multiplexing End-to-End
    Flow Control
  • Maintain functional compatibility with Reno/TCP
    implementation
  • FAST Project (Caltech) has shown dramatic
    improvements over Reno Stack by moving from loss
    based congestion to delay based control mechanism
    with standard segment size and fewer streams
  • Sender-side only modifications
  • Fermilab/CMS is FAST partner as a well supported
    user having the FAST stack installed on Facility
    RD Data Servers (first results look very
    promising)
  • Aiming at Installations/Evaluations for
    Integration with Production Environment at
    Fermilab, CERN and Tier-2 sites
  • Work in Collaboration with the FAST project team
    at Caltech and Fermilab CCF Department

21
(No Transcript)
22
Improving Data Server Service
  • The Objectivity AMS had a nice set of features.
    Weve been looking at dCache to replace some of
    the functionality
  • D-cache is a disk caching system developed at
    DESY as a front end for Mass Storage System, with
    significant developer support at FNAL
  • We are using it as a way to utilize disk space
    on the worker nodes and efficiently supply data
    in intense applications like simulation with
    pile-up.
  • Applications access the data in d-cache space
    over a POSIX compliant interface. The d-cache
    directory (/pnfs) from the user perspective looks
    like any other cross mounted file system
  • Since this was designed as a front-end to MSS,
    once closed, files cannot be appended
  • Very promising set of features for load balancing
    and error recovery
  • d-cache can replicate data between servers if the
    load is too high
  • if a server fails, d-cache can create a new pool
    and the application can wait until data is
    available.

23
Data Intensive Applications
  • Simulation of CMS detector is difficult
  • There are 17 interactions per crossing on average
  • There are 25ns between crossings
  • The previous 5 crossing and the following 3
    influence the detector response.
  • Each simulated signal event requires 170 minimum
    bias events
  • To simulate new minimum bias events would take
    about 90 minutes
  • A large sample is created and recycled
  • The sample is sufficiently large it doesnt
    usually fit on local disk
  • It is about 70MB per event
  • These events are randomly sampled, so it is
    taxing on the minimum bias servers and the
    network.
  • We were able to saturate 100Mbit network links in
    this application using systems that were three
    times slower than what we are proposing to procure

24
(No Transcript)
25
Data Rate Into the Application
  • Performance is fairly flat across the nodes and
    the performance is good
  • The performance in the pile-up application is
    sufficient that the analysis application should
    be well served.

26
Load Balancing
27
Performance with Load Balancing
28
Improving our automation
  • At the time of the last large CMS official event
    production (Spring 2002) we used 5 large centers
    in the US. With 5 production coordinators
  • Production is not enormously difficult, but it
    does require diligence at each site to make sure
    the system is running efficiently
  • In the Fall of 2002, the first grid enabled
    production system was deployed in the US.
  • The Distributed Production Environment (DPE)
    consists of
  • VDT for grid middleware
  • MC_Runjob for job specification
  • MOP a custom developed application which submits
    jobs to Condor-G
  • 1.5 Million events were produced with jobs submit
    from a single site
  • Looking this year to harden the environment and
    improve the efficiency.

29
Introduction to DPE
  • Rolling Prototypes deployed across three test
    environments
  • DGT Development Grid Testbed
  • RD Efforts, Evaluation of new components,
    shake-out testing
  • IGT Integration Grid Testbed
  • Scalability and stability testing and development
  • PG Production Grid
  • Stable production running
  • Working to carefully version and release DPE
    packages
  • Validating and testing versions on DGT before
    migrating to IGT and PG

30
Version 1.0
  • In Version 1.0
  • Using a few hundred CPUs at 5 sites
  • Based on VDT 1.1.4 (Condor 6.4.3 Globus 2.0) for
    primary middleware
  • MOP and Impala for CMS production tools
  • Locally installed DAR distributions for CMS
    application software
  • What we learned?
  • Found scaling issues with Condor at 200 CPUs
  • Found scaling issues in the in the client
    architectures
  • Issues with writing out too much information into
    common areas
  • Found scaling issues with the way data is
    returned to submitting site
  • Currently using globus-url-copy jobs submit at
    the end of the job
  • Significant load on head node

31
Recent Efforts
  • Increasing number of sites in DGT by adding three
    sites
  • MIT, Iowa and Rice
  • Some of these will be added to IGT
  • Approached by Taiwan to join IGT
  • Increases the scale at which we test the system

MIT
MIT
Iowa
Rice
32
DPE 1.5
  • Created and released DPE version 1.5 May 23
  • VDT 1.1.8 based
  • Changes globus version to 2.2.4
  • We support multiple queuing systems on DPE sites.
    FBSNG used at FNAL is not supported by the
    globus gatekeeper. When the scripting changed
    from shell to perl this had to be rewritten
  • We believe Condor scaling issues have been solved
    for this release, so we expect to be able to
    utilize more CPUs efficiently
  • Current release of MCRunJob for job specification
  • All software for client sites has been packaged
    with Pacman
  • VDT has been Pacman based since the beginning
  • Also added CMS application software
  • Reduces installation DPE client site software to
    one command issued at headnode
  • pacman -get http//computing.fnal.gov/cms/software
    /DPE-downloadDPE-client
  • local queue assumed to be Condor,condor must be
    started on the nodes

33
Current Activities
  • Installing new DPE version on IGT
  • Verify new installation method works on all DGT
    sites. Ran 400 small scale test jobs.
  • Moving the master site software into Pacman.
  • MOP and MCRunJob
  • Should allow rapid installation of additional MOP
    masters
  • Will allow additional groups to easily evaluate
    this effort
  • We should be ready for large scale CMSIM
    production in a week
  • Environment should be well shaken out by the time
    OSCAR production starts
  • Prepare and Deploy a PG version of DPE 1.x for
    PCP
  • On schedule to perform no local production in the
    US

34
DPE Development Activities
  • Working on a Configuration Monitoring tool based
    on MDS
  • Currently the location that the DPE software is
    installed and the space available for output is
    passed between client and master by e-mail
  • prone to error and doesnt allow the client
    administrator sufficient control
  • Current methods of data management are
    insufficient for a large scale distributed
    production system
  • Output is written using globus-url-copy from
    headnode
  • This prevents nodes from needing external network
    access, but stresses headnode
  • At a minimum we need the ability to queue
    transfers.
  • Currently transfers commence as soon as the jobs
    are finished.
  • In the near future a real data management system
    is needed

35
Data Management Systems
  • As we increase the amount of data generated by
    our automated production system, and as we
    prepare for analysis applications, we need to
    improve the data management tools deployed.
  • Weve taken a two pronged approach
  • CMS has adopted the Storage Resource Broker
    (SRB), developed in part by PPDG, to handle our
    data transfers during Pre-Challenge Production
  • SRB is a working solution, which is well
    supported and has a global catalogue
  • The SRB architecture currently has some
    limitations
  • US-CMS and LCG have started to develop the
    Storage Element (SE) which will be used to
    provide data management services
  • Based on the Storage Resource Manager (SRM) and
    Replica Location services (RLS)

36
SRB Architecture
37
SRB Status
  • SRB tests started with a demonstration that a
    sufficient amount of data for PCP could be
    transferred from US sites to CERN using SRB
  • One server at FNAL and one server at UCSD
    transferred data to a server at CERN
  • There was no other complete, tested solution for
    this problem, so CMS decided to move ahead with a
    large scale deployment.
  • Currently there are servers in
  • 2 in Russia
  • 3 in Wisconsin
  • 1 at CERN
  • 1 in Spain
  • SRB has some limitations
  • The current implementation with a single MCAT
    requires good networking and is a single point of
    failure, but it is working for PCP

1 in Ukraine 1 client installation in Pakistan 1
in San Diego 1 in Germany 2 in FNAL (one of
which writes directly to dCache)
38
Storage Element Development
  • While Compute resource scheduling technology is
    fairly advanced,
  • Batch systems are grid enabled, condor-G
    allocated processing requests reasonably
    efficiently over distributed resources
  • However, mechanisms for Data Access and Storage
    Resource Management in the Grid environment are
    lacking. Issues include
  • Shared storage resource allocation scheduling
  • Staging management - Files are typically archived
    on a mass storage system (MSS)
  • Wide area networks minimize transfers
  • File replication and caching
  • A working group was mandated by the GDB to
    understand the Grid File Access requirements and
    present a proposal to enable applications perform
    file access using the LCG infrastructure

39
(No Transcript)
40
(No Transcript)
41
Storage Virtualization with SRM
  • A collaboration of Fermilab, Jefferson Lab, LBNL
  • Participating CERN, EDG and others
  • Supported by PPDG
  • SRM functionality
  • Manage space
  • Negotiate and assign space to users
  • Manage lifetime of spaces
  • Manage files on behalf of a user
  • Pin files in storage until they are released
  • Manage lifetime of files
  • Manage file sharing
  • Policies on what should reside on a storage
    resource at any one time
  • Get files from remote locations when necessary
  • Manage multi-file requests
  • A brokering function queue file requests,
    pre-stage when possible
  • Provide grid access to/from mass storage systems
  • Enstore (FNAL), Castor (CERN), ATLAS (RAL),
    JASMine (JLAB),

42
Advantages of SRM
  • Provides uniform Grid access to heterogeneous
    Mass Storage Systems
  • Synchronization between storage resources
  • Pinning file, releasing files
  • Allocating space dynamically on as needed basis
  • Insulate clients from storage and network system
    failures
  • Transient MSS failure
  • Network failures
  • Interruption of large file transfers
  • Facilitate file sharing
  • Eliminate unnecessary file transfers
  • Support streaming model
  • Use space allocation policies by SRMs no
    reservations needed
  • Use explicit release by client for reuse of space
  • Control number of concurrent file transfers
    (queuing and traffic shaping)
  • From/to MSS avoid flooding Head/Gateway Node,
    MSS and thrashing
  • From/to network avoid flooding and packet loss

43
Status of SE Development Project
  • We have identified manpower in the US to start
    the evaluation and development of the Storage
    Element
  • SRM is relatively stable and understood
  • The local area data serving tools (dCache, rfio,
    ROOTD, nfs) are stable and well debugged
  • The Replication Location Service (RLS) and the
    Replication Location Interface (RLI) and the
    functionality that creates all the data
    cataloguing services are still in prototypes
  • Several implementations not all of which are
    compatible
  • The interface of the catalogue services to the
    applications in still in the development phase
  • Still trying to understand how the SE
    communicates with POOL and the applications

44
New Project for Distributed Production
  • As the DPE work shifts to stable running toward
    the latter portion of the summer, we expect to
    migrate the functionality to GRID3
  • GRID3 is a combined US-ATLAS, US-CMS, PPDG,
    GriPhyN, iVDGL effort to form a persistent and
    large scale, interoperable grid of computing
    resources.
  • We are currently in the process of determining
    the middleware required by US-CMS and US-ATLAS
  • Try to keep as close as possible to the existing
    DPE, but make it compatible with US-ATLAS
    installations
  • This will increase the scale at which we validate
    our environment and increase the resources
    available to us.
  • We expect a minimal set of services to be
    installed at the beginning of August. The
    complexity and functionality will increase
    through the fall
  • We are in the process of defining the metrics for
    success. Complexity achieved, data transferred,
    concurrent users and jobs running.

45
Summary of DC04 Preparations
  • Software preparations are nearly finished
  • We are behind on POOL preparations, but this has
    not slowed down PCP preparations
  • CMS is more behind on OSCAR validation
  • Software support tools are in good shape
  • User Facility preparations are in progress
  • We have a reasonable procurement to perform
  • Service improvements are proceeding well, but
    more work is needed.
  • DPE is proceeding well and we should be able to
    avoid local production in the US.
  • We look forward to the additional resource and
    scale expected by the GRID3 efforts.

46
Analysis Preparation
  • After DC04 is completed, CMS will enter a year of
    intense analysis activity preparing for the
    Physics TDR. US-CMS needs to increase the
    analysis capability of the Tier1 center and the
    number of users working in the US.
  • A lot of the preparations overlap with DC04
    preparations
  • Data serving
  • Data management
  • Software distributions
  • Some are extensions of DC04 preparations
  • Need some simple VO management for DC04, but the
    multiple user environment of analysis requires
    more services
  • Extensions of the production tools to allow
    custom user simulation and distributed analysis
    applications
  • Some are new efforts
  • Load balancing for interactive users and the user
    analysis environment

47
VO Management
  • The first virtual organization infrastructure
    will be deployed for PCP, but there are only a
    few production users and the application is
    predictable and organized
  • We dont worry production users will do something
    malicious or foolish
  • The analysis environment is much more
    complicated.
  • Many more users with diverse applications,
    abilities, and access patterns
  • The VO Project is working with US-ATLAS to
    developed the infrastructure for authenticating
    and authorizing users
  • First prototypes will concentrate on
    authentication
  • Need to satisfy experiment wide and local site
    policies
  • Authorization at the level of individual
    resources is necessary and it soon couples to
    auditing and usage policies
  • This needs additional work

48
The Analysis User Environment
  • US-CMS is trying to increase the number of people
    working at the Tier1 center
  • Improving the software and developer environment,
    getting a better match to CERN
  • Hans Wenzel has written a nice document
    describing the analysis farm and its use
  • Using Fermilab batch system to balance
    interactive log-ins
  • Working with the networking people at FNAL to
    evaluate the solution CERN uses for this
  • Deploying dFarm and dCache to provide data space
    to analysis users
  • Working on procuring additional analysis
    resources
  • More analysis CPUs
  • More data servers for analysis
  • A high performance replacement for current
    analysis home directories in afs

49
Distributed and Batch Analysis
  • We are working to provide functionality for users
    to produce small custom simulation productions
    and run analysis jobs in a DPE-like environment.
  • This is an extension of tools like MC_RunJob
  • Also an extension of the current software
    packaging we use in DPE
  • Should allow physics groups to investigate new
    ideas and validate applications without involving
    the full production team
  • Should also allow physics users more freedom
  • The grid enabled production through the fall
    should free human resources to work on the
    analysis environment development so that we are
    ready for the Physics TDR analysis
  • The analysis techniques used in the Physics TDR
    are intended to be as close as possible to those
    used at the start of the experiment. US-CMS
    cannot afford to have the analysis centralized at
    CERN.

50
LCG Roll-out
  • In May of 2003 the LCG rolled out the first
    prototype of their production grid (LCG-0)
  • It was based on an older version of the VDT and
    the EDG software
  • It was mainly a packaging and installation test
  • This was installed at FNAL and 5 other Tier1
    centers
  • The installation was successful, but the
    environment was fairly rigid, it only worked with
    specific queuing systems, the components were
    tightly coupled to each other, and the components
    had to be installed with the LCG environment
  • LCG-1 is expected shortly
  • Some functionality has been pushed back because
    it is not available
  • The VDT and many of the EDG components are
    up-to-date
  • The packaging is improved, but the components are
    still tightly coupled

51
LCG Roll-out
  • US-CMS in our DPE development, services
    development and GRID3 integration is pushing for
    interoperable services development
  • We are attempting to stress test the newest
    versions of the middleware
  • Develop and push for components that can be
    installed independently
  • Develop well defined and reasonably lightweight
    interfaces that can be installed on top of large
    computing centers without making too many demands
    on the centers themselves
  • This is the only way we can see leveraging
    resources we dont have complete control over,
    like the TeraGrid
  • We are aiming to be compatible with the LCG but
    not necessarily adopt all the LCG infrastructure

52
Outlook
  • We have been making reasonable progress preparing
    for DC04 and the Physics TDR but there is a lot
    of work left to do
  • Software preparations coming along well. We
    should be ready to commence all steps of
    Pre-Challenge Production by August
  • Delivery of 50 million events by Christmas is
    tight
  • User Facility preparations are proceeding
  • A lot of our activity is still ramping up
  • SE Development
  • Network RD
  • Grid3
  • Many thanks for the reasonable personnel and
    equipment profiles. It is making our current
    efforts possible.
Write a Comment
User Comments (0)
About PowerShow.com