GLAST Charts - PowerPoint PPT Presentation

About This Presentation
Title:

GLAST Charts

Description:

Peta-Cache: Electronics Discussion I. Presentation Ryan Herbst, Mike Huffer, Leonid Saphoznikov ... Peta-Cache, Mar28 , 2006. V1 3. Collection of Info (to be checked) ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 12
Provided by: lowellak
Category:
Tags: glast | charts | peta

less

Transcript and Presenter's Notes

Title: GLAST Charts


1
Peta-Cache Electronics Discussion
I Presentation Ryan Herbst, Mike Huffer,
Leonid Saphoznikov Gunther Haller haller_at_slac.sta
nford.edu (650) 926-4257
2
Content
  • Collection of information so base-line is common
  • Couple of slides how it seems to be done
    presently
  • Part 1 Option 3 (Tuesday)
  • Alternative architecture with skim builder and
    event server
  • Part 2 Option 1 and 2 (Thursday)
  • Using Flash with minimum changes to present
    architecture
  • Using Flash in more changes to present
    architecture

3
Collection of Info (to be checked)
  • Total number of BBR events 400 Million
  • Original dataflow events ¼ to ½ Mbytes -
    reconstruction results in 16-kbytes raw events
  • From raw events (16 kbytes) generate
  • Minis
  • Micros (2.5 kbytes) (MCs 5kbytes)
  • Tags (512 bytes)
  • Raw (reconstructed) events 400 Million x 16
    kbytes 6 Terrabytes
  • Fraction of them are needed most often
  • Estimate
  • 200 skims total
  • Large skims use 10-20 of total of events
  • Smallest use
  • Question of how many of those are needed how
    frequently
  • Presently about 6-10 ms disk-access time
  • What is event analysis time
  • Very preliminary benchmark for analysis with
    little processing including decompression
  • 35 ms if data in memory
  • 70 ms if data on disk
  • What is the number as a function of users

4
More info
  • Now each type of new skim involves linking new
    code into the babar (to zeroth order, single
    image!) application, which entails a significant
    amount of time for creating a new build, testing
    and worrying about the effect of its robustness
    on the rest of operation etc.
  • What does Redirector do? (check) effectively
    maps an event number requested by the user to a
    file some where in the storage system (the
    combination of all the disks over all the tape
    system). Basically, if the user doesn't have an
    event in its xdroot server, it finds out which
    one has the file on the disk it serves and then
    asks the original server to make copy of it
    onto its own disks. If no server has the file
    then the user's xrootd server goes to the tape
    (HPSS) system and fetches it
  • Xrootd server 6k
  • Disk 2/Gbyte (including cooling, etc)

5
Raw Events - Tags Skims
Raw-Data (12-kbytes)
Micros (2.5 KBytes)
Skim
Tag data (512 Bytes)
Tag data (512 Bytes)
Tag data (512 Bytes)
Collection of tags (with pointers to raw-data or
can contain raw-data
Alls tags, or all micros, etc
Contains interesting characteristics and pointer
to raw data
  • Tag data and sometimes micro data is used for
    skims
  • Run over tag data with defined algorithm and
    produce a skim
  • Pointer skim collection of tags with pointers to
    raw-data
  • Deep skim contains also raw-data (e.g. for
    transfer to remote institution)
  • About 200 different skims. Time to get a new skim
    can be about 9 months due process (formalism to
    make sure skims are of interest to many)
  • One goal is to be able to produce easily and fast
    new skim

6
Present Architecture (needs check)
  • Skims go thru all tags (sometimes micros, but
    never raw-events) on disk to make collection of
    tags (which is a skim)
  • Client broadcasts to xrootd servers the first
    event to be fetched in a specific skim
  • If no xrootd servers responds (i.e. time-out)
    then go to tape and put event into xrootd server
  • So populate the disks with events which have been
    requested
  • If other xrootd has it, it copies it over
  • Still read 2-6 Tbytes/day from tape?
  • Does xrootd server really know what it has in its
    disks?
  • is redirector involved for every event to find
    where event is from event number client is
    supplying?
  • What if disk is full? How recover new space?
  • Seems inefficient use of space since many copies

4.8 TB Mass Storage (RAID-5) 3 TB usable
4.8 TB Mass Storage (RAID-5) 3 TB usable
40 sets, to be extended to 80 sets
Fibre-Channel
xrootd server
xrootd server
Tape
1-G Ethernet Switch
Redirector
Client (1, 2, or 4 core)
Client (1, 2, or 4 core)
Up to 1,500 cores in 800 units?
7
Present Architecture (needs check) cont
  • At the end, all tags and events which client
    desires on a single xrootd server
  • Why are all of those copied into one place,
    because the client or other clients reuses them
    many times later?
  • What is xrootd server really doing?
  • Xrootd server fetches event from its disk and
    delivers to client
  • So select clients for a server which uses same
    skims?
  • Client decompresses event for analysis

4.8 TB Mass Storage (RAID-5) 3 TB usable
4.8 TB Mass Storage (RAID-5) 3 TB usable
40 sets, to be extended to 80 sets
FiberChannel
xrootd server
xrootd server
Tape
1-G Ethernet
Redirector
Client (1, 2, or 4 core)
Client (1, 2, or 4 core)
Up to 1,500 cores in 800 units?
8
Goals (really need requirements)
  • Reduce time/overhead to make new skim
  • Goal that each user can make own skim quickly
  • Reduce time to get data to client
  • Presently analysis time and time to get next
    event is not concurrent but sequential.
  • To get next event it queries all xrootd servers
  • If one has it, it is copied into its (for that
    analysis) xrootd server
  • If not, its xrootd server gets it from tape
  • Issue is latency to get data from disk
  • Minimize duplicate storage

9
Option with Skim Builder and Event Server (1)
  • Skim builder and event server have access to all
    disks
  • Client gives skim builder code to build skim
  • Results are given to client
  • Client broadcast/multicast to event servers the
    request of events in the list (name of list and
    name of client)
  • Event server is selected which fits the criteria
    best
  • Maybe already has the list, may not doing
    anything, maybe whoever has most in the list is
    selected, that is flexible (or optionally a
    specific event server can be locked to a client
    (directly IO)
  • If no event in cache then after initial latency
    going to disk, get all the data
  • Event server may multicast its events to be
    delivered to all which requested any events in
    the list
  • Clients dont have to get events in list in order
  • Any client can get events at 100 Mbytes/s
    independent of number of clients
  • Event server is cache box (e.g. 16 Gbytes of mem)
  • Pipe between event servers and storage is slower
    but averaged due to cache
  • Client could consume clients, e.g. 4,000. each events is 16kbytes. So
    average thruput 3.6Gbyte/sec

Disk Storage
Disk Storage
Tape
To file system
Event Server
Skim builder (s)
Event Server
Ethernet/PCI-E/etc
Optionally direct IO
Client (1, 2, or 4 core)
Client (1, 2, or 4 core)
Client (1, 2, or 4 core)
Up to 1,500 cores in 800 units?
10
Event Processing Center
file system fabric
disks
pizza box as skim builder
switch
HPSS
switch(s)
switch (s)
sea of cores fabric
out protocol conversion
pizza box
in protocol conversion
Event processing node
11
Pizza box block diagram
(In) PCI Express
x16
PLX8532
x4
PPC 405
PLX8508
RLDRAM II IGbyte
x4
x16
PLX8508
XILNIX XC4VFX40
(Out) PCI Express
PLX8532
Write a Comment
User Comments (0)
About PowerShow.com