The GIOD Project (Globally Interconnected Object Databases) For High Energy Physics Harvey Newman, Julian Bunn, Koen Holtman and Richard Wilkinson A Joint Project between Caltech (HEP and CACR), CERN and Hewlett - PowerPoint PPT Presentation

About This Presentation
Title:

The GIOD Project (Globally Interconnected Object Databases) For High Energy Physics Harvey Newman, Julian Bunn, Koen Holtman and Richard Wilkinson A Joint Project between Caltech (HEP and CACR), CERN and Hewlett

Description:

Future Directions: GIOD II. Review the advantages of ODBMS vs. (O)RDBMS for ... Use of Self Organizing Maps (e.g. Kohonen) to recluster frequently accessed data ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 18
Provided by: chep200
Category:

less

Transcript and Presenter's Notes

Title: The GIOD Project (Globally Interconnected Object Databases) For High Energy Physics Harvey Newman, Julian Bunn, Koen Holtman and Richard Wilkinson A Joint Project between Caltech (HEP and CACR), CERN and Hewlett


1
The GIOD Project(Globally Interconnected
Object Databases)For High Energy PhysicsHarvey
Newman, Julian Bunn, Koen Holtman and Richard
WilkinsonA Joint Project between Caltech (HEP
and CACR), CERN and Hewlett Packardhttp//pcbun
n.cacr.caltech.edu/
  • CHEP2000 Padova, Italy

2
The GIOD Project - Overview
  • GIOD Project began 1997, a joint effort of
    Caltech and CERN with funding from Hewlett
    Packard for two years
  • with collaboration from FNAL, SDSC
  • Leveraging existing facilities at Caltechs
    Center for Advanced Computing Research (CACR)
  • Exemplar SPP2000, HPSS system, high speed WAN,
    CACR expertise
  • Build a prototype LHC data processing and
    analysis Center using
  • Object Oriented software, tools and ODBMS
  • Large scale data storage equipment and software
  • High bandwidth LAN (campus) and WAN (regional,
    national, transoceanic) connections
  • Measure, evaluate and tune the components of the
    center for LHC data analysis and physics
  • Confirm the viability of the LHC Computing Models

3
Components of the GIOD Infrastructure
  • Supercomputer facilities at CACR
  • Large pool of fully simulated multi-jet events in
    CMS
  • Experienced large-scale systems engineers at CACR
  • Connections at T3- gtOC3 in the Local and Wide
    Area Networks Fiberoptic links Caltech HEP/CACR
  • Strong collaborative ties with CMS, RD45,
    Fermilab and San Diego Supercomputer
    CenterCERN, CALREN-2 and Internet2 Network
    Teams

4
Generation of CMS multi-jet events
Simple Tag class
  • Made possible by 1998, 1999 (NSF-sponsored) NPACI
    Exemplar allocations
  • Produced 1,000,000 fully-simulated multi-jet QCD
    events since May 98 selected from 2 X 109
    pre-selected generated events
  • Directly study Higgs ? ?? backgrounds for first
    time
  • Computing power of the HP-Exemplar SPP 2000
    (0.2 TIPs) made this attainable
  • Events used to populate a GIOD Object Database
    system
  • Tag database implemented and kept
    separately Can be quickly replicated to client
    machines
  • In 2000 Proposal to NPACI requesting 25 of the
    Exemplar has been granted
  • Targeted at event simulation for ORCA (CMS)
  • Replicas of this database were installed at FNAL
    and Padua/INFN (Italy)

5
Scalability tests using the Exemplar
  • Caltech Exemplar used as a relatively convenient
    testbed for multiple client tests with
    Objectivity
  • Two main thrusts
  • Using simple fixed object data
  • Using simulated LHC events
  • Results gave support to the viability of the
    ODBMS system for LHC data
  • CMS 100 MB/sec milestonemet (170 MB/sec achieved)

gt 170 MB/sec writing LHC raw event data to the
database
Up to 240 clients reading simple objects from the
database
6
Java 3D Applet to view GIOD events
  • Attaches to the GIOD database allows to scan all
    events in the database, at multiple detail
    levels
  • Demonstrated at the Internet2 meetings in 1998
    and 1999, and at SuperComputing98 in Florida at
    the iGrid, NPACI and CACR stands

ECAL crystals
Java2 GUI
HCAL towers
Tracker geometry and hitmap
Run/event selection widget
Reconstructed Tracks
Reconstructed Jets
7
Other ODBMS tests
Tests with Versant(fallback ODBMS)
DRO WAN Tests with CERN
Production on CERNs PCSF and file movement to
Caltech
Objectivity/DB Creation of 32000 database
federation
8
Tests with Objy/Java binding and JAS
Objy DIM and analysis using Java Analysis Studio
Java2D Tracker viewer
Java Track Fitter
9
WAN tests Caltech ? SDSC,FNAL
  • Client tests between SDSC/CACR, CACR/FNAL and
    CACR/HEP
  • ftp, LHC event reconstruction, event analysis,
    event scanning
  • Investigated network throughput dependence on
  • TCP window size, MSS, round trip time (RTT), etc.
  • payload (ftp, Objy, Web, telnet etc.)

Simple ftp traffic
Flattened by staggering client startups
Objectivity Schema transfer 8 kB DB Pages
10
WAN tests Caltech ? SDSC,FNAL
  • Using out of the box single-stream ftp,
    achieved
  • 7 MB/sec over LAN ATM _at_ OC3
  • 3 MB/sec over WAN _at_ OC3
  • Expect to ramp up capability by use of
  • Tuned ftp (buffer, packet and window sizes)
  • Jumbo frames
  • New IP implementations or other protocols
  • Predict 1 GB/sec in WAN by LHC 2005 using
    parallel streams
  • Measurements to be used as a basis for model
    parametersin further MONARC simulations

11
Using the Globus Tools
  • Tests with gsiftp, a modified ftp server/client
    that allows control of the TCP buffer size
  • Transfers of Objy database files from the
    Exemplar to
  • Itself
  • An O2K at Argonne (via CalREN2 and Abilene)
  • A Linux machine at INFN (via US-CERN
    Transatlantic link)
  • Target /dev/null in multiple streams (1 to 16
    parallel gsiftp sesssions).
  • Aggregate throughput as a function of number of
    streams and send/receive buffer sizes

25 MB/sec on HiPPI loop-back
4MB/sec to Argonne by tuning TCP window size
Saturating available B/W to Argonne
12
GIOD - Summary
  • GIOD investigated
  • Usability, scalability, portability of Object
    Oriented LHC codes
  • In a hierarchy of large-servers, and medium/small
    client machines
  • With fast LAN and WAN connections
  • Using realistic raw and reconstructed LHC event
    data
  • GIOD has
  • Constructed a large set of fully simulated events
    and used these to create a large OO database
  • Learned how to create large database federations
  • Developed prototype reconstruction and analysis
    codes that work with persistent objects
  • Deployed facilities and database federations as
    testbeds for Computing Model studies

13
Associated Projects
  • MONARC - Models Of Networked Analysis at Regional
    Centers (CERN)
  • Caltech, CERN, FNAL, Heidelberg, INFN, KEK,
    Marseilles, Munich, Orsay, Oxford, Tufts,
  • Specify candidate models performance
    throughputs, latencies
  • Find feasible models for LHC matched to network
    capacity and data handling
  • Develop Baseline Models in the feasible
    category
  • PPDG - Particle Physics Data Grid (DoE Next
    Generation Internet)
  • Argonne Natl. Lab., Caltech, Lawrence Berkeley
    Lab., Stanford Linear Accelerator Center, Thomas
    Jefferson National Accelerator Facility,
    University of Wisconsin, Brookhaven Natl. Lab.,
    Fermi Natl. Lab., San Diego Supercomputer Center
  • Delivery of infrastructure for widely distributed
    analysis of particle
    physics data at multi-PetaByte scales by 100s to
    1000s of physicists
  • Acceleration of development of network and
    middleware infrastructure
    aimed at data-intensive collaborative science.
  • ALDAP - Accessing Large Data Archives in
    Astronomy and Particle Physics (NSF Knowledge
    Discovery Initiative)
  • Caltech, Johns Hopkins University, FNAL
  • Explore data structures, physical data storage
    hierarchies for archival of next generation
    astronomy and particle physics data
  • Develop spatial indexes, novel data
    organisations, distribution and delivery
    strategies.
  • Create prototype data query execution systems
    using autonomous agent workers

14
Future Directions GIOD II
  • Review the advantages of ODBMS vs. (O)RDBMS for
    persistent LHC datain light of recent (e.g.
    Web-enabled) RDBMS developments, for HEPand
    other scientific fields
  • Fast traversal of complex class hierarchies ?
  • Global (federation) schema and transparent
    access ?
  • Impedance match between the database and the OO
    code ?
  • What are the scalability and use issues
    associated with implementing a traditional RDBMS
    as a persistent object store for LHC data?
  • What benefits would the use of an RDBMS bring, if
    any ?
  • Which RDBMS systems, if any, are capable of
    supporting, or projected to support, the size,
    distribution and access patterns of the LHC data
    ?

15
GIOD II Other New Investigations
  • What are the implications/benefits for the
    Globally-distributed LHC computing systems of
  • Having Web-like object caching and delivery
    mechanisms (distributed content delivery,
    distributed cache management)
  • The use of Autonomous Agent query systems
  • Organizing the data and resources in an N-tiered
    hierarchy
  • Choosing (de facto) standard Grid tools as
    middleware
  • How can data migration flexibility be built in ?
  • Schema/data to XML conversion (Wisdom, Goblin) ?
  • Data interchange using JDBC or ODBC
  • Known format binary files for bulk data
    interchangefor simple and efficient transport
    across WANs

16
GIOD II and ALDAP
  • Optimizing performance of Objectivity for
    LHC/SDSS data
  • Use of Self Organizing Maps (e.g. Kohonen) to
    recluster frequently accessed data into
    collections in contiguous storage
  • Use of Autonomous Agents to carry queries and
    data in WAN distributed database system
  • Identify known performance issues get them fixed
    by the vendor
  • Example 1 11,000 cycles cf. 300 cycles overhead
    to open an object
  • Example 2 Selection speeds with simple cuts on
    Tag objects
  • Make new performance comparisons between
    Objectivity and ER database (SQLServer)
  • on identical platforms,
  • with identical data,
  • with identical queries,
  • with all recommended tweaks,
  • with all recommended coding tricks
  • We have begun tests with SDSS sky objects, and
    with GIODTag objects

17
GIOD II and PPDG
  • Distributed Analysis of ORCA data
  • Using Grid middleware (notably gsiftp, SRB) to
    move database files across the WAN
  • Custom tools to select subset of database files
    required in local replica federations, and
    attach them once copied
  • Making compact data collections
  • Remote requests from clients for sets of DB files
  • Simple staging schemes that asynchronously make
    data available, and give ETA for delivery, and
    migrate cool files to tertiary storage
  • Marshalling of distributed resources to achieve
    production task goals
  • Complementary ORCA DB files in Caltech, FNAL and
    CERN replicas
  • Full pass analysis involves distributing task to
    all three sites
  • Move/compute cost decision
  • Task and results carried by Autonomous Agents
    between sites (work in ALDAP)
Write a Comment
User Comments (0)
About PowerShow.com