Searching within Large Grid Infrastructures - PowerPoint PPT Presentation

About This Presentation
Title:

Searching within Large Grid Infrastructures

Description:

Condor meta-scheduler. Condor-specific. DAGMan input file (DAG specification and metadata) ... Condor submission. TRIANA - GridLab. XML. Worklow information ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 56
Provided by: mariosdi
Category:

less

Transcript and Presenter's Notes

Title: Searching within Large Grid Infrastructures


1
Searching within Large Grid Infrastructures
  • Marios D. Dikaiakos
  • University of Cyprus CoreGRID

2
Acknowledgements
  • Wei Xing, University of Cyprus
  • Rizos Sakellariou, U. Manchester, UK
  • Yannis Ioannidis, U. Athens, GR
  • Salvatore Orlando, ISTI-CNR, IT
  • Domenico Laforenza, ISTI-CNR, IT

3
Outline
  • Context and Motivation
  • Limitations of Grid Information Services
  • Semantic Grid and Ontologies
  • A Core Grid Ontology
  • Conclusions and Future Work

4
The Grid
  • A wide-scale, distributed computing
    infrastructure to support resource sharing and
    coordinated problem solving in dynamic,
    multi-institutional Virtual Organizations.
  • Computational Grid Provides the raw computing
    power, high speed bandwidth interconnection and
    associate data storage.
  • Data Information Grid Allows easily accessible
    connections to major sources of information and
    tools for its analysis and visualisation.
  • Knowledge Semantic grid Gives added value to
    the information provides intelligent guidance
    for decision-makers facilitates the generation,
    diffusion and support of knowledge.

5
Near-future Scenarios for the Grid
6
Near-future Scenarios for the Grid
  • The Grid as a Wide-Scale Distributed System
  • Millions of resources of different kinds.
  • Services and Policies in place.
  • Relationships (permanent and transient) between
    organizations, software, data, services,
    applications
  • Different middleware platforms.
  • Common (?) protocols, standards and APIs.
  • The hope is that Grid will grow larger and will
    reach an acceptance as wide as the Web.

7
(No Transcript)
8
Problem Statement Searching the Grid
  • How are individuals and organizations going to
    harness the capabilities of a fully deployed
    Grid, with a massive and ever-expanding base of
    computing and storage nodes, network resources,
    and a huge corpus of available programs,
    services, and data?
  • To this end, users need to identify resources
    that are
  • Interesting (discovery)
  • Relevant (classification)
  • Accessible and available under known policies of
    use, cost (inquiry)
  • Emphasis on summary information, in terms of
    granularity and timing.

9
Searching the Grid
  • Software and Data-sets
  • Policies
  • Relationships
  • Best-practices
  • Computing, Storage, Network Resources

10
Examples of search queries
  • Hardware resources on the Grid, their attributes,
    and applicable policies of their use
  • Find a VO providing exclusive access to a
    shared-memory multiprocessor system with at
    least 16 processors, 8 GB of main memory, and
    a usage charge of not more than 100 euros per CPU
    time?
  • Application services, software, and data-sets
  • Find services running Quantum Chromo-Dynamics
    calculations (QCD) using F90 and MPI.
  • Hardware-software combinations, Grid usage and
    best-practices
  • Find the pricing and prior clientele of Grid
    services that provide access to the XYZ
    workflow for high-performance oil refinery
    simulations.

11
Outline
  • Context and Motivation
  • Grid Information Services and Limitations
  • Semantic Grid and Ontologies
  • A Core Grid Ontology
  • Conclusions and Future Work

12
Grid Information Services
  • Established to help users answer questions on the
    status of individual resources and the Grid.
  • Support the discovery and ongoing monitoring of
    the existence and characteristics of resources,
    services, computations and other entities of
    value to the Grid.
  • Examples
  • GLOBUS, EDG Metacomputing Directory Service
    (MDS)
  • UNICORE Gateway and Network Job Supervisor (NJS)
  • EGEE Relational Grid Monitoring Architecture
    (R-GMA), GridICE
  • Condor Matchmaker

13
MDS Grid Info Services in Globus
Users
GRIP
GIIS
GRIP
GRRP
GRRP
GIIS
Discovery/ Inquiry/ Retrieval
GIIS
GIIS
GRIP
GRRP
GRRP
GRRP
GRRP
GRIS
GRIS
GRIS
Info. Retrieval
LDIF
LDIF
LDIF
Info. Provider
Info. Providers
Info. Providers
Resources
14
Relational Grid Monitoring Architecture
Application
Consumer Servlet
Consumer API
Registry Service
Registry API
Producer API
Sensor Code
15
What information is out there?
  • Virtual Organizations
  • Resources
  • Policies
  • People
  • Resource Specifications
  • Descriptions Types
  • Names
  • Capacity
  • Configuration
  • Resource status
  • Resource use.
  • Availability.
  • Monitoring data.
  • Summary Statistics
  • Logs.
  • Associations.
  • Statistics of use.
  • Software
  • Codes
  • Specs
  • Location
  • Data-sets
  • Data
  • Metadata
  • Replicas
  • Services
  • Interface
  • Metadata
  • Applications
  • Descriptions.
  • I/O requirements.
  • Meta-Data
  • Worklfows

16
Resource Specification info. (examples)
Source Information provided Schema System
Info. Provider (Unix sys-call) Mds-computer-platform Mds-Cpu-model Mds-Host-hn Hierarchical MDS-Globus LDAP
Info. Provider (Unix sys-call) Static info. GlueCEName GlueHostName GlueHostArchitecture GlueHostProcessorClockSpeed GlueSEAccessProtocolType GlueCESEBindGroup GlueHostFileLatency Hierarchical MDS-EDG LDAP
Sensors (Unix sys call) StorageElementProtocol NetworkTCPThroughput NetworkRTT Relational RGMA-EDG HTTP
17
Resource status information (examples)
Source Information provided Schema System
Info. Provider (Unix sys-call) Mds-Memory-Ram-freeMB Mds-FS-Total-freeMB cpuload5 Hierarchical MDS-Globus LDAP
Info. Provider (Unix sys-call) GlueCEStateRunningJobs GlueCEJobLocalID GlueHostProcessorLoadLast1Min Hierarchical MDS-EDG LDAP
Sensors (Unix sys call) StorageElementStatus NetworkUDPPacketLoss NetworkFileTransferThroughput Relational RGMA-EDG HTTP
Condors Sensor modules DiskSpace MemoryUsed SystemLoad ClassAds HawkeyeCondor
NWS probesTraceroute End-to-end bandwidth End-to-end latency End-to-end path XML GridLabs TopoMon GMA arch.
18
VO information (examples)
Source Information provided Schema System
Static info. Cert (info. About local certificate policy) MdsHostContact Hierarchical MDS-Globus LDAP
Static info. GlueCEPolicyMaxWallClockTimeGlueCEPolicyMaxCPUTime GlueSAPolicyMaxFileSize Hierarchical MDS-EDG LDAP
19
Software Dataset information (examples)
Source Information provided Schema System
Info. Provider Mds-Application-Group-config Mds-Application-name Mds-Application-location Mds-Application-info Hierarchical MDS-Globus LDAP
Info. Provider GlueSLFileName GlueSLFileSize GlueSLFilePath Hierarchical MDS-EDG LDAP
GDMP producer ExportCatalogue RGMA Replica Catalogue Service GDMP-EDG
20
Application Logging Information
Source Information provided Schema System
TRIANA Worklow information Metadata XML TRIANA - GridLab
Condor submission DAGMan input file (DAG specification and metadata) Condor-specific Condor meta-scheduler
Workload Management System BrokerInfo file Hierarchical Resource Broker (EDG) LDAP
LDAP queries to JSS, RB. Logging information Bookkeeping information (transient) UserID, JobID, Job State, JobDescription, etc Attributevalue LB Server (EDG) Events, exported API for queries
21
Limitations of Current Approaches
  • Remarks extracted from the description of a
    Grid-application development effort
  • Jobs typically need to access hundreds of files,
    and each site has a different subset of the
    files.
  • Our data system knows what portion of a user's
    data may be at each site, but does not know how
    to submit grid jobs.
  • Our job submission system required users to
    choose grid sites and gave them no assistance in
    choosing.
  • jobs requesting thousands of files and sites
    having hundreds of thousands of files are not
    uncommon in production.
  • it would not be scalable to explicitly publish
    all the properties of jobs and resources in ...

22
Limitations and Challenges
  • Scalability in the context of Millions of
    Resources
  • Infrastructure intrusiveness.
  • Resource Discovery, Retrieval and Classification.
  • Expressiveness of Data Models in terms of
  • Types of captured information.
  • Expressing semantic relationships between
    represented entities.
  • Amenability to Indexing, Query Optimization.
  • Complexity
  • Different protocols for discovery inquiry,
    registration, invocation.
  • Lack of interoperability between different
    platforms.
  • Information Standardization.
  • Missing Functionalities
  • Transient and Historical information.
  • Policies.
  • Complex Queries.

23
Revisiting the problem
  • Very large number of sources.
  • Independent.
  • No common schema.
  • Various, partly unknown semantics.
  • Subject to change, birth, or silence.

24
Revisiting the problem
  • A federated warehouse approach
  • Wrap the various sources to extract their
    information.
  • Store data in a warehouse.
  • Monitor sources and propagate updates to the
    warehouse.
  • Ask queries to the warehouse.

25
Requirements for Searching the Grid
  • Global/Common naming scheme for Grid entities.
  • Resolution mechanism for discovery and retrieval
    of entity-related information/meta-data.
  • Type and representation of retrieved
    entity-related information.
  • Mining and representation of relationships and
    summary data.
  • Complexity of queries and query interpretation.

26
Research Issues
  • Metadata Consolidation
  • Definition local creation of metadata about
    Grid entities.
  • Information Source Discovery
  • Algorithms for Search and Discovery, Management
    of Updates.
  • Metadata Retrieval and Integration
  • Protocols for retrieval Data structures and
    algorithms for integration.
  • Management of meta-data
  • Analysis to build proper indexes Extrapolation
    of semantic relationships.
  • Query mechanisms and interface.
  • Query language definition. Intelligent-agent
    interface to help users formulate queries.

27
Outline
  • Context and Motivation
  • Limitations of Grid Information Services
  • Semantic Grid and Ontologies
  • A Core Grid Ontology
  • Conclusions and Future Work

28
Looking for answers Semantic Grid
An extension of the current Grid in which
information and services are given well-defined
and explicitly represented meaning, so that it
can be shared and used by humans and machines,
better enabling them to work in cooperation.
Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
29
Ontologies and the Semantic Grid
  • Ontologies are among the key building blocks of
    the Semantic Grid.
  • The concepts/terms of Grid entities, resources,
    capabilities and the relationships between them.
  • We develop Grid ontologies to
  • Merge the information from different sources
  • Build a knowledge base for Grid infrastructures
  • Construct a Grid information system
  • Support co-operation with semantic-able Grid
    services, such as Resource Broker, Information
    Service, etc.

30
Ontologies in Computer Science
  • An ontology is an engineering artifact
  • It is constituted by a specific vocabulary used
    to describe a certain reality, plus
  • a set of explicit assumptions regarding the
    intended meaning of the vocabulary.
  • Almost always including how concepts should be
    classified
  • Thus, an ontology describes a formal
    specification of a certain domain
  • Shared understanding of a domain of interest
  • Formal and machine manipulable model of a domain
    of interest

Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
31
Languages
  • Work on Semantic Web has concentrated on the
    definition of a collection or stack of
    languages.
  • These languages are then used to support the
    representation and use of metadata.
  • The languages provide basic machinery that can be
    used to represent the extra semantic information
    needed for the Semantic Web
  • XML
  • RDF
  • RDF(S)
  • OWL

Source Goble, Bechhofer, DeRoure, Semantic Grid
101, GGF16, Athens, 2/2005
32
W3C Stack
  • XML provides a surface syntax for structured
    documents
  • XML Schema is a language for restricting the
    structure of XML documents.
  • RDF is a data-model for objects ("resources") and
    relations between them, provides simple semantics
    for this data-model
  • RDF Schema is a vocabulary for describing
    properties and classes of RDF resources, with
    semantics for generalization and hierarchies of
    such properties and classes.
  • OWL adds more vocabulary for describing
    properties and classes.

33
Outline
  • Context and Motivation
  • Limitations of Grid Information Services
  • Semantic Grid and Ontologies
  • A Core Grid Ontology
  • Conclusions and Future Work

34
Towards a general Ontology for Grids
  • Currently, there are several Grid architectures
    and Grid implementations.
  • Different views of Grid entities and their
    properties.
  • It is practically impossible that one ontology
    can include all aspects of Grids or of many types
    of Grid entities.
  • A Core Grid Ontology (CGO)
  • A core framework for representing a Grid.
  • Open and extensible for all kinds of Grid
    architectures and Grid implementations.

35
Building a Core Ontology
  • The most difficult task for developing an
    ontology
  • Capture a right model for the Grid
  • Our view of a Grid
  • UsersApplicationsMiddleware/ServicesResources
    within VOs
  • A layer-structured model consisting of three
    layers
  • Users/Applications
  • Middleware/services
  • Resources.

36
A Grid Model
37
CGO Classes Overview
38
Defining properties
Based on the Constraints of the CGO Classes.
39
Representing a Grid Entity
40
Representing a Grid Entity using OWL
ltowlClass rdfID"ComputingElement"gt
ltrdfssubClassOfgt ltowlRestrictiongt
ltowlsomeValuesFromgt
ltowlClassgt
ltowlunionOf rdfparseType"Collection"gt
ltowlClass rdfabout"Jobmanager"/
gt ltowlClass
rdfabout"JobScheduler"/gt
lt/owlunionOfgt lt/owlClassgt
lt/owlsomeValuesFromgt
ltowlonProperty rdfresource"runningSevice"/gt
lt/owlRestrictiongt
lt/rdfssubClassOfgt
41
Generating Instances
42
Conclusions
  • The CGO can be used as a common, extensible
    language for
  • Expressing the basic concepts of a Grid
    infrastructure and the relationships thereof.
  • Encoding and storing Grid metadata.
  • Integrating grid-related information extracted
    from different sources.
  • Expressing queries.

43
Next steps
  • Automate the knowledge-base construction and
    maintenance process
  • Information-source discovery
  • Metadata wrapping
  • Metadata integration
  • Consistency updates
  • Investigate mechanisms for efficient
    knowledge-base query implementation.

44
  • Thank you for your attention!
  • Questions?
  • Comments ?

45
References
  • "A Core Grid Ontology for the Semantic Grid." Wei
    Xing, M. D. Dikaiakos, and R. Sakellariou. 6th
    IEEE International Symposium on Cluster Computing
    and the Grid (CCGrid 2006), Singapore, May 2006
    (to appear).
  • "Information Services for Large-scale Grids A
    Case for a Grid Search Engine." M. D. Dikaiakos,
    R. Sakellariou, and Y. Ioannidis. In Engineering
    the Grid status and perspectives, Jack Dongarra,
    Hans Zima, Adolfy Hoisie, Laurence Yang,
    Beniamino DiMartino (Editors), American
    Scientific Publishers, January 2006, ISBN
    1-58883-038-1.
  • "Building a Distributed Digital Library for
    Natural Disasters Metadata with Grid Services and
    RDF." W. Xing, M. D. Dikaiakos, Hua Yang, A.
    Sphyris, G. Eftychidis. Library Management
    Journal (Special Issue on Digital Libraries in
    the Knowledge Era Knowledge Management and
    Semantic Web Technology). Vol. 26, No. 4-5, May
    2005
  • "Search Engines for the Grid A Research Agenda."
    M. D. Dikaiakos, Y. Ioannidis, R. Sakellariou. In
    Grid Computing. First European AcrossGrids
    Conference, Santiago de Compostela, Spain,
    February 2003, Revised Papers, Lecture Notes in
    Computer Science series, vol. 2970, pages 49-58,
    vol. 2970, Springer, 2004.

46
The RDF Data Model
  • Statements are ltsubject, predicate, objectgt
    triples
  • ltSean,hasColleague,Iangt
  • Can be represented as a graph
  • Statements describe properties of resources
  • A resource is any object that can be pointed to
    by a URI
  • The generic set of all names/addresses that are
    short strings that refer to resources
  • a document, a picture, a paragraph on the Web,
    http//www.cs.man.ac.uk/index.html, a book in the
    library, a real person (?), isbn//0141184280
  • Properties themselves are also resources (URIs)

hasColleague
Sean
Ian
Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
47
Linking Statements
  • The subject of one statement can be the object of
    another
  • Such collections of statements form a directed,
    labeled graph
  • The object of a triple can also be a literal (a
    string)

Sean K. Bechhofer
hasName
hasColleague
Sean
Ian
hasHomePage
hasColleague
http//www.cs.man.ac.uk/horrocks
Carole
48
RDF Syntax
  • RDF has an XML syntax that has a specific
    meaning
  • Every Description element describes a resource
  • Every attribute or nested element inside a
    Description is a property of that Resource
  • We can refer to resources by URIs

ltrdfDescription rdfabout"some.uri/person/sean_b
echhofer"gt ltohasColleague resource"some.uri/pe
rson/ian_horrocks"/gt ltohasName
rdfdatatype"xsdstring"gtSean K.
Bechhoferlt/ohasNamegt lt/rdfDescriptiongt ltrdfDesc
ription rdfabout"some.uri/person/ian_horrocks"gt
ltohasHomePagegthttp//www.cs.mam.ac.uk/horrocks
lt/ohasHomePagegt lt/rdfDescriptiongt ltrdfDescripti
on rdfabout"some.uri/person/carole_goble"gt
ltohasColleague resource"some.uri/person/ian_horr
ocks"/gt lt/rdfDescriptiongt
49
What does RDF give us?
  • A mechanism for annotating data and resources.
  • Single (simple) data model.
  • Syntactic consistency between names (URIs).
  • Low level integration of data.

Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
50
RDF(S) RDF Schema
  • RDF gives a formalism for meta data annotation,
    and a way to write it down in XML, but it does
    not give any special meaning to vocabulary such
    as subClassOf or type (supporting OO-style
    modelling)
  • Interpretation is an arbitrary binary relation
  • RDF Schema extends RDF with a schema vocabulary
    that allows you to define basic vocabulary terms
    and the relations between those terms
  • Class, type, subClassOf,
  • Property, subPropertyOf, range, domain
  • it gives extra meaning to particular RDF
    predicates and resources
  • this extra meaning, or semantics, specifies how
    a term should be interpreted

Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
51
Problems with RDFS
  • RDFS is too weak to describe resources in
    sufficient detail
  • No localised range and domain constraints
  • Cant say that the range of hasChild is person
    when applied to persons and elephant when applied
    to elephants
  • No existence/cardinality constraints
  • Cant say that all instances of person have a
    mother that is also a person, or that persons
    have exactly 2 parents
  • No transitive, inverse or symmetrical properties
  • Cant say that isPartOf is a transitive property,
    that hasPart is the inverse of isPartOf or that
    touches is symmetrical
  • It can be difficult to provide reasoning support
  • No native reasoners for non-standard semantics
  • May be possible to reason via FO axiomatisation

Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
52
Web Ontology Language Requirements
  • Desirable features identified for Web Ontology
    Language
  • Extends existing Web standards
  • Such as XML, RDF, RDFS
  • Easy to understand and use
  • Should be based on familiar KR idioms (e.g.
    OO-style, frames etc).
  • Formally specified
  • Of adequate expressive power
  • Possible to provide automated reasoning support

53
OWL
  • W3C Recommendation (February 2004)
  • Well defined RDF/XML serializations
  • A family of Languages
  • OWL Full
  • OWL DL
  • OWL Lite
  • Formal semantics
  • First Order (DL/Lite)
  • Relationship with RDF
  • Comprehensive test cases for tools/implementations
  • Growing industrial takeup.

54
OWL Basics
  • Set of constructors for concept expressions
  • Booleans and/or/not
  • Quantification some/all
  • Axioms for expressing constraints
  • Necessary and Sufficient conditions on classes
  • Disjointness
  • Property characteristics transitivity, inverse
  • Facts
  • Assertions about individuals

55
Metacomputing Directory Service (MDS)
  • Distributed Directory approach collection of
    LDAP servers.
  • Simple LDAP Information Schemas describe resource
    information.
  • Servers
  • Grid Resource Information Server (GRIS) Running
    on each resource and supplying information about
    it. Supports multiple resources as well.
  • Grid Index Information Server (GIIS) Collect
    information from multiple GRIS servers. Support
    particular queries for information spread across
    multiple GRIS servers.
  • Protocols (LDAP based) for
  • Discovery and Inquiry (GRIP).
  • Soft-state Registration (GRRP).
Write a Comment
User Comments (0)
About PowerShow.com