Building a Large Location Table to - PowerPoint PPT Presentation

About This Presentation
Title:

Building a Large Location Table to

Description:

Heinz Stockinger. CERN/CMS. CHEP '2000. Feb 7-11, 2000. Replication models ... Can replicate every single object / set of objects. Transparent local / remote ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 12
Provided by: Moha150
Category:

less

Transcript and Presenter's Notes

Title: Building a Large Location Table to


1
  • Building a Large Location Table to
  • Find Replicas of Physics Objects
  • Koen Holtman
  • Heinz Stockinger
  • CERN/CMS
  • CHEP 2000
  • Feb 7-11, 2000

2
Replication models
  • Different replication models
  • File based
  • FTP run files, Objectivity DB files, ...
  • Import/attach them to local data store
  • (Objectivity DRO/FTO)
  • Linkable Objectivity AMS server tricks
  • Transparent local / remote file access
  • Page-level replication??
  • Object based (this talk)
  • Can replicate every single object / set of
    objects
  • Transparent local / remote object access
  • Requires large Object Location Table (OLT)
  • In Objectivity, go from logical OID to physical
    OID

3
The problem
  • gt1010 physics objects (event related objects),
    all potentially replicated
  • Lookup Find the best replica of the physics
    object with (logical) OID xyz
  • Automatic (grid-like)
  • Best replica is not always nearest
  • Lookup operation needs to be fast, scalable

4
Goal of this talk
  • Goal show that an OLT can be built
  • By showing one viable implementation
  • This implementation is a spin-off of the work on
    automatic reclustering
  • gt1010 scalability problem can be solved by
  • Exploiting specific properties of physics
    analysis
  • Limiting freedom of lookup operations
  • No Objectivity-style on demand navigation

5
Divide and conquer
  • Partition events into chunks, 1 subjob per chunk
  • Definition oflogical OID
  • (chunk ID,
  • event sequence number,
  • type/version ID)

6
Job and storage model
  • Job model job consists of
  • set of event IDs (chunk ID, event seq nr)
  • 1 or moretype/version IDs
  • physics code
  • code is run usingsubjobs, these get handed
    iterators
  • Storage model
  • unit of storage is collection
  • subset of horizontal bar above

7
  • Structure
  • Insert and delete collections...
  • Next slide 1 subjob, 1 type...

8
1 subjob, 1 type
  • Say subjob at CERN wants to read objects
    (1,3,1),(1,4,1),(1,7,1), ...
  • At start of subjob, global schedule is computed,
    saytry collection 1 first, thentry collection
    3.
  • Subjob iteration is in increasing
    sequencenumber, step throughindices alongside
    withiteration

9
Computing the schedule
  • Optimal schedule computed at start of subjob
  • Optimiser finds schedulewith minimal cost
  • Cost(read 3,4,7,,1,3) Ccoll(read 3,4,
    from 1) Ccoll(read 7, from 3) .
  • Value of Ccoll( ) depends on location of
    collection, network load, access method, etc.
  • Calculations use collection contents indices only
  • Finding optimal schedule is an NP-complete
    problem, exponential time complexity, but in
    practice....

10
Runtime of scheduler
  • Time to compute optimal schedule

1 sec
(micro sec)
  • Can also timeout, gives good schedule

11
Conclusions
  • Large object location tables can be built!
  • They are an important component in implementing
    the grid-like aims of the CMS CTP
  • Implementation shown is spinoff from work on
    reclustering
  • Scalability by using specific properties of
    physics analysis, and by limiting the freedom of
    lookup operations
  • No Objectivity-style on demand navigation, all
    objects needs to be requested from the start
  • Cost functions can model any access method
  • Optimisation is both generic and cheap
  • Long paper will appear in future
Write a Comment
User Comments (0)
About PowerShow.com