RGMA A Data Integration System for Grid Monitoring 7112003 - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

RGMA A Data Integration System for Grid Monitoring 7112003

Description:

Institutions own a wealth of computing resources. computers, storage devices, network bandwidth ... Brian Coghlan, Stuart Kenny, David O'Callaghan. CoopIS - 7/11/2003 ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 35
Provided by: Werne67
Category:

less

Transcript and Presenter's Notes

Title: RGMA A Data Integration System for Grid Monitoring 7112003


1
R-GMA A Data Integration System for Grid
Monitoring 7/11/2003
Werner Nutt (Heriot-Watt University,
Edinburgh)
2
Research Context Grids
  • Institutions own a wealth of computing
    resources
  • computers, storage devices, network bandwidth
  • databases
  • specialised equipment (e.g., supercomputers)
  • The Grid idea
  • combine the resources
  • so that they behave as a virtual computer
  • Grid research since mid 1990's
  • Our work is part of the EU project
    DataGrid

3
R-GMA Relational Grid Monitoring
Architecture
  • Within DataGrid (work package 3) we build the
    Grid Monitoring and Information System R-GMA
  • based on the Relational data model
  • refines the Grid Monitoring Architecture
    of the Global
    Grid Forum
  • Code is open source
    and freely available
    Homepage www.r-gma.org

4
Contributors
  • Heriot-Watt, Edinburgh
  • Andrew Cooke, Alasdair Gray, Lisha Ma, Werner
    Nutt
  • IBM-UK
  • James Magowan, Manfred Oevers, Paul Taylor
  • Queen Mary, University of London
  • Roney Cordenonsi
  • CCLRC/PPARC
  • Rob Byrom, Laurence Field, Steve Hicks, Jason
    Leake,Manish Soni, Antony Wilson
  • Linda Cornwall, Abdeslem Djaoui, Steve Fisher
  • SZTAKI, Hungary
  • Norbert Podhorszki
  • Trinity College Dublin
  • Brian Coghlan, Stuart Kenny, David OCallaghan

5
Overview
  • Grid monitoring Requirements
  • The R-GMA approach A virtual monitoring
    database
  • Components of R-GMA
  • Schema
  • Producers, Consumers and their Agents
  • Registry
  • Republishers
  • Query Planning for Republisher Hierarchies

6
Grid Components Mimic a Computers Operating
System
  • DataGrid consists of
  • Computing elements
  • Storage elements
  • Network nodes and connections
  • Replica Catalogues
  • Jobs
  • Resource Brokers
  • Logging and Bookkeeping
  • User interfaces

  • Other Grids have similar components

7
A Birds Eye View of DataGrid
Job Submission
Resource Broker
User Interface
StatusInformation
Logging and Bookkeeping
ReplicaCatalogue
Computer
ComputingElement
Computer
Computer
StorageElement
Computer
Computer
Computer
Data Transfer
8
Grid Monitoring
  • Grid components and users need to know
  • What is going on in the Grid?
  • In particular
  • What is the current state of the Grid?
  • How did the Grid behave in the past ?
  • These questions are answered by a
  • Grid Monitoring and
    Information System

9
A Birds Eye View of DataGrid
Job Submission
Resource Broker
User Interface
StatusInformation
R-GMA Monitoring System
Logging and Bookkeeping
ReplicaCatalogue
Computer
ComputingElement
Computer
Computer
StorageElement
Computer
Computer
Computer
Data Transfer
10
Monitoring Data Come in two Kinds
  • A Grid monitoring system should make available
    two kinds of data
  • static data pools, e.g., databases on
  • network topology
  • applications available (versions, licences, ...)
  • streams of data, e.g.,
  • sensor data (CPU load, network traffic, ...)
  • Data streams may give rise to data pools if they
    are archived

11
Examples of Monitoring Queries (1)
  • Where is currently a computing element CE and a
    storage element SE such that
  • user U is authorised to use CE and SE
  • CE has 5 CPUs available, each with at least 200
    MB of memory
  • CE has software S1, S2, S3 installed
  • SE holds copies of files F1, F2
  • throughput between CE and SE
    is at least 500 Mbps?

  • Resource Broker

12
Examples of Monitoring Queries (2)
  • What is the progress of the jobs of user U?How
    does their status change?

  • Visualisation Tool
  • Between which nodes was yesterday the average
    transportation time for 1 MB packets higher than
    than 0. seconds?
  • Network
    Administrator

13
Grid Monitoring Requirements
  • Support for publishing data pools and
    streams
  • Support for locating data sources
    (automatic, if possible)
  • Queries with different temporal interpretations
    (latest state,
    continuous, history)
  • Flexibility (we dont know which queries
    will be posed)
  • Scalability (there
    may be thousands of data sources)
  • Resilience to failure
    (data sources may become unavailable)

14
Monitoring Data can be Captured by Relations
  • Monitoring data can be represented in terms of
  • relations with keys and timestamps , e.g.
    CPULoad(country, site, facility, load, timestamp)
  • NTP(src, dest, method, pcktSize, time,
    timestamp)
  • and tuples, e.g.
  • CPULoad(UK, HW, ATLAS, 0.3, 19055707112003)
  • NTP(HW, RAL, Pinger, 10, 0.01, 18053707112003)
  • Monitoring queries can be expressed as SQL queries

15
Architecture Approach 1 A Monitoring Data
Warehouse
  • Idea
  • store all data about the Grid status into a huge
    database
  • and query it

16
A Monitoring Data Warehouseis Not Realistic
  • Loading takes time
  • Data occupy space
  • Connections to the warehouse may fail
  • The warehouse itself may break down
  • Often monitoring data flow as data streams, and
    queries ask for data streams as output

17
A Consumer-Producer Architecture May Scale Better
  • Components
  • play the roles of
  • Consumers and
  • Producers
  • of Information
  • The Registry can be replicated
  • The Grid Monitoring Architecture of the Global
    Grid Forum

18
Questions about the Grid Monitoring Architecture
  • How should consumers find relevant producers
  • a human browses the registry?
  • an API supports a fixed set of queries?
  • consumer poses a query
    and a mediator does the job?
  • How should producers describe
  • their data?
  • their query answering capabilities?

19
Refined ApproachA Virtual Monitoring Database
  • Global relational schema vocabulary for
    consumers and producers
  • Consumer poses query over the global schema
    (flagged as continuous,
    latest, or history)
  • Producer
  • has a type (stream p., database p.)
  • publishes relations R1, ,Rk
  • registers for each Ri a simple view
    Vi on the global schema (currently,
    a selection on a relation)

20
Example Producer Registrations
Stream Producer 1 publishes and registers
SELECT FROM CPULoad WHERE country UK and
site RAL
Stream Producer 2 publishes and registers
SELECT FROM CPULoad WHERE country UK and
site HW
21
Producers Contribute toGlobal Relation
22
R-GMA A Virtual Monitoring Data Warehouse
Registry
23
Matchmaking Between Producers and Consumers
  • Suppose P1, P2, P3 have registered for relation
    NTP (Network Throughput)
  • P1 src HW
  • P2 src RAL AND pcktSize gt 20
  • P3 src RAL AND method PINGER
  • Consumer asks query
  • Q SELECT FROM NTP
  • WHERE src RAL AND method PINGER
  • We see P1 is not suitable for Q, but P2 and P3
    are. Why?
  • src HW AND src RAL AND method PINGER
    is unsatisfiable
  • src RAL AND pcktSize gt 20 AND
    is satisfiable

24
R-GMAs Components Need to be Smart
  • The Registry has to
  • find relevant producers for a query
  • notify consumers of new relevant producers
  • A Consumer has to
  • choose among equivalent producers/query plans
  • contact producers
  • combine their output to yield a query answer
  • A Producer has to
  • send its consumers the right output

25
Consumers and Producersare Helped by Agents
  • R-GMA Clients Grid components or Grid
    applications
  • Clients can play the roles of producers or
    consumers
  • A client would need special capabilities for a
    role
  • Clients are supported in their roles by agents
  • Implementation
  • APIs for client roles new
    StreamProducer()
  • Agents are objects on a Web server

26
RefinementClients and Their Agents
27
R-GMAs Components (So Far)
  • Schema fixes relations, attributes, keys
  • Producers publish local relations,
    described by views on schema
  • Consumers defined by query and type

  • (continuous, latest state, history)
  • Agents make query plans, send and retrieve data
  • Registry finds suitable producers and
    consumers
  • How can we answer latest-state and
    history queries?

28
Latest and History Queries Refer to Views over a
Stream
SELECT FROM NetworkThroughput WHERE src
HW AND dest RAL
  • A history query refers to all past tuples
  • A latest state query refers to the latest tuples
    for each key
  • These tuples are answers to simple
    queries
  • (sliding windows of length ? or 1)
  • A stream producer forwards its tuples
    . .
    . and forgets them

29
Republishers Publish Views Over Streams
  • A republisher
  • consumes answers to a simple continuous query

  • (selections currently)
  • publishes
  • the answer stream stream r.
  • a db with the latest answer for each key
    latest-state r.
  • a db with all answers history r.
  • ? Consumer agents have to choose the right
    republishers
  • to answer latest
    state and history queries

30
Stream Republishers Can Form Hierarchies
Stream Relation
CPULoad(country, site, facility, load, timestamp)
National Republisher
country UK
Local/site Republisher
site HW
site RAL
Stream Producers
RAL
HW
31
Query Planning in the Presence of Republishers
  • How can we plan the execution of a continuous
    query, i.e.,
  • which publishers ( producers and republishers)
    should we access?
  • which query should we pose over each publisher?
  • We have studied these questions
    for the simple case of
    selection queries

32
Properties of Streams and
Query Plans
  • A stream can be
  • duplicate free
  • sound and/or complete wrt a view
  • weakly ordered
  • (if the tuples belonging to a
    fixed set of key values

  • occur in the order of their timestamps)
  • A plan
  • has these properties if it always
    produces a stream with these properties
  • is irreducible if no
    subset of its publishers are enough

33
Query Planning Results
  • Plans that are duplicate free, weakly ordered,

    sound and complete
  • can be computed in PTIME
  • if selection conditions are conjunctive
  • Our plans
  • involve a minimal number of publishers
  • if conditions are Horn (e.g., lt and ?
    not together)
  • Irreducibility is NP-hard
  • if both lt and ? can occur

34
Conclusion
  • R-GMA
  • used by various components within DataGrid
    and other
    European Grid projects
  • currently evaluated on various testbeds
  • is a candidate component for the EU's EGEE Grid

    and for the UK Grid
  • Next steps
  • turn the system into a collection of Grid
    services
  • support for more elaborate continuous queries
  • distributed query processing
Write a Comment
User Comments (0)
About PowerShow.com