A Grid Computing Use case Datagrid - PowerPoint PPT Presentation

About This Presentation
Title:

A Grid Computing Use case Datagrid

Description:

different access policies applied at different sites and in different countries. ... Extension of job description languages (JSL) to express data dependencies. ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 13
Provided by: jeanmarc8
Category:

less

Transcript and Presenter's Notes

Title: A Grid Computing Use case Datagrid


1
A Grid ComputingUse caseDatagrid
  • Jean-Marc Pierson

2
DataGrid a european effort
  • 9.8 million euros
  • 200 researchers involved
  • CERN, ESA, CNRS, INFN
  • objective share huge amounts of distributed
    data over the network infrastructure
  • developed over Globus Toolkit
  • (most figures and material from
    www.eu-datagrid.org)

3
(No Transcript)
4
Applications Domain
  • High Energy Physics (HEP), led by CERN, for LHC
    data
  • Biology and Medical Image processing, led by CNRS
    (France),
  • Earth Observations (EO) led by the European Space
    Agency.

5
Data Grid middleware
  • Five work packages
  • Workload Scheduling and Management
  • Data Management
  • Grid Monitoring Services
  • Mass Storage Management
  • Local Fabric Management

6
Workload Scheduling and Management (1)
  • the problems
  • dynamic relocation of data
  • very large numbers of schedulable components in
    the system (computers and files)
  • large number of simultaneous users submitting
    work to the system
  • different access policies applied at different
    sites and in different countries.

7
Workload Scheduling and Management (2)
  • A need for
  • planning job decomposition
  • and planning task distribution
  • Planning based on knowledge of the availability
    and proximity of computational capacity and the
    required data.
  • a need for cost estimation tools (delays, data
    migration, caching...)
  • Extension of job description languages (JSL) to
    express data dependencies.

8
Data management
  • goals
  • to permit secure access of massive amounts of
    data in a universal global name space
  • to move and replicate data at high speed from one
    geographical site to another
  • to manage the synchronisation of remote data
    copies.
  • tools
  • dynamic automated wide-area data caching and
    distribution
  • generic interface to different mass storage
  • performance and reliability issues associated
    with the use of tertiary storage will be
    addressed.

9
Monitoring the datagrid
  • goal
  • to enable transparent monitoring of the use of
    distributed resources at a large scale.
  • to assess finely the interplay between computer
    fabrics, networking and mass storage
  • tools
  • local monitoring of other middlewares
  • local monitoring of applications themselves
  • developping short time and long term information
    of monitoring (real timearchiving)
  • developping effective means of visual
    presentation of the multivariate data.

10
Local fabric management
  • goals
  • information publication concerning resource
    availability and performance
  • mapping of authentication and resource allocation
    mechanisms to local environment
  • self healing dynamic configuration changes and
    error recovery strategies
  • difficulty to scale well tens of thousands of
    components
  • tools
  • automatic fault detection and isolation,
    automatic reconfiguation of the fabric and
    re-running the tasks
  • automatic incorporation of new or updated
    components

11
Mass Storage Management
  • goals
  • to introduce standards for handling LHC data so
    that they can be exchanged
  • to spread work to other application field
  • tools
  • uniform interface to the very different systems
    used at different sites
  • provide interchange of data and meta-data between
    sites
  • develop appropriate resource allocation and
    information publishing functions

12
Conclusion
  • Globus, and all its services, had to be extended
    !
  • Datagrid a first effort for handling huge
    amounts of data
  • Collaborative work !
  • Some key issues are not really treated
  • data security is basic
  • cache management does not use data semantic
  • Useful for raw data intensive computation and
    management, not for semantically strong data Le
    projet Medigrid !
Write a Comment
User Comments (0)
About PowerShow.com