A Grid Approach to Geographically Distributed Data Analysis for Virgo - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

A Grid Approach to Geographically Distributed Data Analysis for Virgo

Description:

INFN Napoli 34 Mbit/s. CNAF Bologna 98 Mbit/s. GridFTP tests. period ' ... Finally, the output data of each job were retrieved from Napoli User Interface. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 38
Provided by: grwavsfR
Category:

less

Transcript and Presenter's Notes

Title: A Grid Approach to Geographically Distributed Data Analysis for Virgo


1
A Grid Approach to Geographically Distributed
Data Analysis for Virgo
  • F. Barone, M. de Rosa, R. De Rosa, R. Esposito,
    P. Mastroserio, L. Milano, F. Taurino, G.Tortone
  • INFN NapoliUniversità di Napoli Federico
    IIUniversità di Salerno
  • L. Brocco, S. Frasca, C. Palomba, F. Ricci
  • INFN Roma1Università di Roma La Sapienza

GWADW 2002 Isola dElba (Italy) May 19-26 2002
2
Outline
  • scientific goals and requirements
  • basic concepts of GRID
  • what the Grid offers
  • layout of VIRGO Virtual Organisation
  • application to gravitational waves data analysis
  • conclusions

3
Scientific goals and requirements
  • the coalescing binaries and periodic sources
    analysis needs large computing power
  • 300 Gflops for coalescing binaries search
  • 1000 Gflops for periodic sources search
  • computational grids allows to use computing
    resources
  • available in different laboratories/institutions

4
GRID a definition
  • GRID
  • an infrastructure to allow the sharing and
    coordinated use of resources within large,
    dynamic and multi-institutionals communities

5
Basic resources of DataGrid Middleware
  • DataGrid is an European Community project (3
    years) to develop Grid Middleware and testbed
    infrastructure on European scale
  • need to execute a program
  • Computing Element (CE)
  • need to access data
  • Storage Element (SE)
  • need to move data
  • network

6
Computing Element (CE)
  • GRID resource that provides CPU cycles
  • Examples
  • clusters of PCs
  • supercomputers
  • ...

7
Storage Element (SE)
  • GRID resource that provides disk space to store
    files
  • Examples
  • simple disks pool
  • big Mass Storage System
  • ...
  • Data is accessible to all processes running on
    CEs via multiple protocols

8
Grid resource
  • A Grid resource provides a standard interface
    (protocol and API) that is common to that type of
    resource
  • all CEs talk the same protocol (CE protocol)
    independently of the underlying batch system
  • all SEs talk the same protocol (SE protocol)
    independently of the underlying Mass Storage
    System

9
What the Grid offers
  • independence from execution location
  • the user doesnt want to know where a job will
    run (what CE)
  • independence from data location
  • the user doesnt want to know where is data (what
    SE)
  • security
  • authentication, authorization

10
Independence from execution location
11
Workload Management System
  • Resource Broker (RB)a Resource Broker tries to
    find a good match between the job requirements
    and preferences and the available resources, in
    particular CEs
  • Job Submission Service (JSS)the Job Submission
    Service then guarantees a reliable job submission
    and monitoring

12
  • Scheduling criteria
  • authorization information
  • data availability
  • job requirements
  • job preferences
  • accounting

13
Monitoring/Information System
  • The Resource Broker needs some information
  • what are available resources ?
  • what is their status ?
  • The Resource Broker query the Monitoring
    Information System to locate producers (CE,
    SE,...) and then obtain data directly from
    producers

14
status update pushed on MIS
data obtained from CE
15
Logging and bookkeeping
  • The LB service is a database of events concerning
    jobs and the other service of Workload Management
    System (RB and JSS)
  • provides status info for jobs
  • designed to be highly reliable and available

16
Independence from data location
17
Replica Catalogue (RC)
  • With Replica Catalogue the same file (master) can
    exists in multiple copies (replicas)
  • LFN Logical File Name name for a set of
    replicasexample lfn//virgo.org/virgofile-1.dat
  • PFN Physical File Name location of a
    replicaexample pfn//virgo-se.na.infn.it/virgo/v
    irgofile-1.dat
  • its up to RB to translate LFN in PFN
  • to locate the SE closed to a CE

18
GridFtp
  • GridFtp is an efficient data transfer protocol
  • Features
  • GSI security
  • multiple data channels for parallel transfers
  • partial file transfers
  • third-party (direct server-to-server) transfers
  • interrupted transfer recovery

19
standard FTP average bandwith
saturation of lowest bandwith
INFN Napoli 34 Mbit/s
GridFTP tests period
CNAF Bologna 98 Mbit/s
20
Grid Approach to Geographically Distributed Data
Analysis for Virgo
21
Layout of VIRGO Virtual Organisation
CNAF-Bologna
Computing Element
Worker Node 1
Worker Node 2
Worker Node 3
Storage Element
Storage Element
GARR
Resource Broker
Storage Element
Information Index
Replica Catalogue
22
Job submission mechanism
User Interface
Computing Element
IS
Worker Node 1
PBS
OS
Worker Node 2
IS
Worker Node 3
OS
Storage Element
Computing Element
Computing Element
Worker Node 1
Worker Node 1
Worker Node 1
Worker Node 1
23
Job submission mechanism
  • The general scheme for distributed computation is
    the following
  • multiple jobs submission from the Rome UI
  • the Resource Broker interrogates the Information
    Index and submit each job to an available WN the
    Input Data file is staged from the SE on the WN
  • the output is sent back to the UI or published on
    SE
  • the Resource Broker automatically distributes the
    jobs among the nodes (according to specifications
    in the JDL file) unless we decide to tie a given
    job to a particular node
  • job scheduling at the node level is done via PBS.

24
Grid tests for coalescing binaries search 1/2
  • Algorithm standard matched filters
  • Templates generated at PN order 2 with Taylor
    approximants
  • Data
  • VIRGO E0 run
  • start GPS time 685112730
  • data length 600 s
  •  Conditions
  • raw data resampled at 2 kHz
  • lower frequency 60 Hz
  • upper frequency 1 kHz
  • search space 2 10 solar masses
  • minimal match 0.97
  • number of templates 40000

25
Grid tests for coalescing binaries search 2/2
  • Step 1
  • The data were extracted from CNAF-Bologna Mass
    Storage System. The extraction process reads the
    VIRGO standard frame format, performs a simple
    resampling and publishes the selected data file
    on the Storage Element
  • Step 2
  • The search was performed dividing the template
    space in 200 subspace and submitting from Napoli
    User Interface a job for each template
    subspace.Each job reads the selected data file
    from the Storage Element (located at
    CNAF-Bologna) and runs on the Worker Nodes
    selected by Resource Broker in the VIRGO
    VO.Finally, the output data of each job were
    retrieved from Napoli User Interface.

26
Grid tests for periodic sources search
  • The analysis for periodic sources search is
    based on a hierarchical approach in which
    coherent steps, based on FFTs and incoherent
    ones, based on the Hough Transform, alternates.
    At each iteration a more refined analysis is done
    on the selected candidates.
  • This procedure fits very well in a
    geographically distributed computational scheme.
  • The whole problem can be divided in a number of
    independent smaller tasks, each performed by a
    given computational node. E.g. each node can
    analyze a frequency band and/or a portion of the
    sky.
  • We have performed some preliminary test to
    evaluate the DataGrid software with respect to
    our analysis problem.
  • For the GRID tests we have used the code for the
    Hough Transform. The source
    spin-down is not taken into account. The input of
    the code is given by a peak map in the
    time-frequency plane.

27
Grid tests for periodic sources search 1/2
  • The tests consists of two phases
  • Production of input data on the SE
  • Distributed computation.
  • We start from raw data of engineering run E1 ( 5
    hours) and the steps are the following
  • channel extraction
  • decimation at 1 kHz
  • generation of periodograms by computing
    interlaced and windowed FFT (T_FFT4194.304 s)
  • peaks selection (above two times the average
    noise)
  • The produced time-frequency peaks map covers 20
    Hz in frequency (from 480 to 500 Hz).

28
Grid tests for periodic sources search 2/2
  • Each computing node processes a subset of the
    whole frequency band. Each job runs according to
    this scheme
  • reads its initial reference frequency and the
    velocity vector direction
  • migrates on a worker node
  • takes from the SE the input data corresponding to
    the frequency band associated to that job
  • calculates the current frequency band of
    interest, i.e the Doppler band
  • calculates the Hough Transform
  • iterates on the reference frequency until the
    full band has been processed.
  • The output of each job would be a set of
    candidates which will be followed in the next
    coherent phase.

29
Conclusions
  • we have successfully verified that multiple jobs
    can be submitted and the output retrieved with
    small overhead time
  • computational grids seems very suitable to
    perform data analysis for coalescing binaries and
    periodic sources searches
  • Future plans
  • testing MPI-job submission for coalescing
    binaries search (feature provided in next
    DataGrid release)
  • testing the whole data analysis chain for
    periodic sources search
  • first tests for network analysis among
    interferometers

30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com