Talk for NeSC Review - PowerPoint PPT Presentation

Loading...

PPT – Talk for NeSC Review PowerPoint presentation | free to download - id: 16ce8d-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Talk for NeSC Review

Description:

Instruments, detectors, sensors, scanners, ... Organising their effective use is ... Needs to be generalized to encompass other data sources (see next ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 18
Provided by: MalcolmA1
Learn more at: http://www.nesc.ac.uk
Category:
Tags: nesc | review | scanner | slide | talk

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Talk for NeSC Review


1
UCISA How do I Grid enable my
University? Prepare for the Data Deluge Prof.
Malcolm Atkinson Director www.nesc.ac.uk 23rd
October 2003
2
Outline
  • Aspects of the Data Deluge
  • Our approach to Data Access and Integration

3
Sloan Digital Sky Survey Production System
Slide from Ian Fosters ssdbm 03 keynote
4
Global Knowledge Communities Often Driven by
Data E.g., Astronomy
  • No. sizes of data sets as of mid-2002,
    grouped by wavelength
  • 12 waveband coverage of large areas of the
    sky
  • Total about 200 TB data
  • Doubling every 12 months
  • Largest catalogues near 1B objects

Data and images courtesy
Alex Szalay, John Hopkins

5
Database Growth
PDB Content Growth
Bases 45,356,382,990
6
Its Easy to Forget How Different 2003 is From
1993
  • Enormous quantities of data Petabytes
  • For an increasing number of communities
  • gating step is not collection but analysis
  • Ubiquitous Internet gt100 million hosts
  • Collaboration resource sharing the norm
  • Security and Trust are crucial issues
  • Ultra-high-speed networks gt10 Gb/s
  • Global optical networks
  • Bottlenecks last kilometre firewalls
  • Huge quantities of computing gt100 Top/s
  • Moores law gives us all supercomputers
  • Organising their effective use is the challenge
  • Moores law everywhere
  • Instruments, detectors, sensors, scanners,
  • Organising their effective use is the challenge

Derived from Ian Fosters slide at ssdbM July 03
7
Tera ? Peta Bytes
  • RAM time to move
  • 15 minutes
  • 1Gb WAN move time
  • 10 hours (1000)
  • Disk Cost
  • 7 disks 5000 (SCSI)
  • Disk Power
  • 100 Watts
  • Disk Weight
  • 5.6 Kg
  • Disk Footprint
  • Inside machine
  • RAM time to move
  • 2 months
  • 1Gb WAN move time
  • 14 months (1 million)
  • Disk Cost
  • 6800 Disks 490 units 32 racks 7 million
  • Disk Power
  • 100 Kilowatts
  • Disk Weight
  • 33 Tonnes
  • Disk Footprint
  • 60 m2

Now make it secure reliable!
May 2003 Approximately Correct See also
Distributed Computing Economics Jim Gray,
Microsoft Research, MSR-TR-2003-24
8
Infrastructure Architecture
Data Intensive Users

Data Intensive Applications for Application area X

Simulation, Analysis Integration Technology for
Application area X

Generic Virtual Data Access and Integration Layer

OGSA










OGSI Interface to Grid Infrastructure

Compute, Data Storage Resources

Distributed

Virtual Integration Architecture
9
Integrating DBs into the Grid
  • We want to build on existing DBs, not replace
    them.
  • Could produce a Grid-enabled version of JDBC/ODBC
  • Need something more for metadata-driven access to
    data
  • Service-based approach should be better
  • Provide a uniform framework for access to
    databases on the Grid

10
Data as Service OGSA Data Access Integration
  • Service-oriented treatment of data appears to
    have significant advantages
  • Leverage OGSI introspection, lifetime, etc.
  • Compatibility with Web services
  • Standard service interfaces being defined
  • Service data e.g., schema
  • Derive new data services from old (views)
  • Externalize to e.g. file/database format
  • Perform queries or other operations

11
Data Services
  • GGF Data Access and Integration Svcs (DAIS)
  • OGSI-compliant interfaces to access relational
    and XML databases
  • Needs to be generalized to encompass other data
    sources (see next slide)
  • Generalized DAIS becomes the foundation for
  • Replication Data located in multiple locations
  • Federation Composition of multiple sources
  • Provenance How was data generated?

12
OGSA Data Services (Foster, Tuecke, Unger, eds.)
  • Describes conceptual model for representing all
    manner of data sources as Web services
  • Database, filesystems, devices, programs,
  • Integrates WS-Agreement
  • Data service is an OGSI-compliant Web service
    that implements one or more of base data
    interfaces
  • DataDescription, DataAccess, DataFactory,
    DataManagement
  • These would be extended and combined for specific
    domains (including DAIS)

13
OGSA-DAI Approach
  • Reuse existing technologies and standards
  • OGSA, Query languages, Java, transport
  • Build portTypes and services which will enable
  • controlled exposure of heterogenous data
    resources on an OGSI-compliant grid
  • access to these resource via common interfaces
    using existing underlying query mechanisms
  • (ultimately) data integration across distributed
    data resources
  • OGSA-DAI (the software) seeks to be a reference
    implementation of the GGF DAIS WG standard
  • Cant keep up with frequent standard changes, so
    software releases track specific drafts
  • See http//www.ogsadai.org.uk/ for details.

14
Data Access Integration Services
15
Third Party Delivery
16
Future DAI Services?

1a. Request to Registry for
sources of data about x
Data

y

Registry

1b. Registry

responds with

Factory handle

2a. Request to Factory for access and

integration from resources Sx and Sy

Data Access Integration master

2c. Factory

returns handle of GDS to client

3b. Client
2b. Factory creates

tells

GridDataServices network

analyst

Client

3a. Client submits sequence of

scripts each has a set of queries

GDTS

to GDS with XPath, SQL, etc

1
XML
Analyst

GDS

GDTS

database

GDS

2
S

x
GDS

S

y
3c. Sequences of result sets returned to

Relational
analyst as formatted binary described in

GDTS

GDS

GDS

2
3
a standard XML notation

database

1
GDS

GDTS

17
Take Home Message
  • Information Grids
  • Support for collaboration
  • Integrated support for computation and data grids
  • Structured data fundamental
  • Relations, XML, semi-structured, files,
  • Integrated strategies technologies needed
  • OGSA-DAI is here now
  • See http//www.ogsadai.org.uk/ for details.
  • A first step Try it
  • Tell us what is needed to make it better
  • Managing Scientific Data is a Major Requirement
  • The Grid is 30 of the software stack needed for
    e-Science
About PowerShow.com