Title: Searching within Large Grid Infrastructures
1Searching within Large Grid Infrastructures
- Marios D. Dikaiakos
- University of Cyprus CoreGRID
2Acknowledgements
- Wei Xing, University of Cyprus
- Rizos Sakellariou, U. Manchester, UK
- Yannis Ioannidis, U. Athens, GR
- Salvatore Orlando, ISTI-CNR, IT
- Domenico Laforenza, ISTI-CNR, IT
3Outline
- Context and Motivation
- Limitations of Grid Information Services
- Semantic Grid and Ontologies
- A Core Grid Ontology
- Conclusions and Future Work
4The Grid
- A wide-scale, distributed computing
infrastructure to support resource sharing and
coordinated problem solving in dynamic,
multi-institutional Virtual Organizations. - Computational Grid Provides the raw computing
power, high speed bandwidth interconnection and
associate data storage. - Data Information Grid Allows easily accessible
connections to major sources of information and
tools for its analysis and visualisation. - Knowledge Semantic grid Gives added value to
the information provides intelligent guidance
for decision-makers facilitates the generation,
diffusion and support of knowledge.
5Near-future Scenarios for the Grid
6Near-future Scenarios for the Grid
- The Grid as a Wide-Scale Distributed System
- Millions of resources of different kinds.
- Services and Policies in place.
- Relationships (permanent and transient) between
organizations, software, data, services,
applications - Different middleware platforms.
- Common (?) protocols, standards and APIs.
- The hope is that Grid will grow larger and will
reach an acceptance as wide as the Web.
7(No Transcript)
8Problem Statement Searching the Grid
- How are individuals and organizations going to
harness the capabilities of a fully deployed
Grid, with a massive and ever-expanding base of
computing and storage nodes, network resources,
and a huge corpus of available programs,
services, and data? - To this end, users need to identify resources
that are - Interesting (discovery)
- Relevant (classification)
- Accessible and available under known policies of
use, cost (inquiry) - Emphasis on summary information, in terms of
granularity and timing.
9Searching the Grid
- Software and Data-sets
- Policies
- Relationships
- Best-practices
- Computing, Storage, Network Resources
10Examples of search queries
- Hardware resources on the Grid, their attributes,
and applicable policies of their use - Find a VO providing exclusive access to a
shared-memory multiprocessor system with at
least 16 processors, 8 GB of main memory, and
a usage charge of not more than 100 euros per CPU
time? - Application services, software, and data-sets
- Find services running Quantum Chromo-Dynamics
calculations (QCD) using F90 and MPI. - Hardware-software combinations, Grid usage and
best-practices - Find the pricing and prior clientele of Grid
services that provide access to the XYZ
workflow for high-performance oil refinery
simulations.
11Outline
- Context and Motivation
- Grid Information Services and Limitations
- Semantic Grid and Ontologies
- A Core Grid Ontology
- Conclusions and Future Work
12Grid Information Services
- Established to help users answer questions on the
status of individual resources and the Grid. - Support the discovery and ongoing monitoring of
the existence and characteristics of resources,
services, computations and other entities of
value to the Grid. - Examples
- GLOBUS, EDG Metacomputing Directory Service
(MDS) - UNICORE Gateway and Network Job Supervisor (NJS)
- EGEE Relational Grid Monitoring Architecture
(R-GMA), GridICE - Condor Matchmaker
13MDS Grid Info Services in Globus
Users
GRIP
GIIS
GRIP
GRRP
GRRP
GIIS
Discovery/ Inquiry/ Retrieval
GIIS
GIIS
GRIP
GRRP
GRRP
GRRP
GRRP
GRIS
GRIS
GRIS
Info. Retrieval
LDIF
LDIF
LDIF
Info. Provider
Info. Providers
Info. Providers
Resources
14Relational Grid Monitoring Architecture
Application
Consumer Servlet
Consumer API
Registry Service
Registry API
Producer API
Sensor Code
15What information is out there?
- Virtual Organizations
- Resources
- Policies
- People
- Resource Specifications
- Descriptions Types
- Names
- Capacity
- Configuration
- Resource status
- Resource use.
- Availability.
- Monitoring data.
- Summary Statistics
- Logs.
- Associations.
- Statistics of use.
- Software
- Codes
- Specs
- Location
- Data-sets
- Data
- Metadata
- Replicas
- Services
- Interface
- Metadata
- Applications
- Descriptions.
- I/O requirements.
- Meta-Data
- Worklfows
16Resource Specification info. (examples)
Source Information provided Schema System
Info. Provider (Unix sys-call) Mds-computer-platform Mds-Cpu-model Mds-Host-hn Hierarchical MDS-Globus LDAP
Info. Provider (Unix sys-call) Static info. GlueCEName GlueHostName GlueHostArchitecture GlueHostProcessorClockSpeed GlueSEAccessProtocolType GlueCESEBindGroup GlueHostFileLatency Hierarchical MDS-EDG LDAP
Sensors (Unix sys call) StorageElementProtocol NetworkTCPThroughput NetworkRTT Relational RGMA-EDG HTTP
17Resource status information (examples)
Source Information provided Schema System
Info. Provider (Unix sys-call) Mds-Memory-Ram-freeMB Mds-FS-Total-freeMB cpuload5 Hierarchical MDS-Globus LDAP
Info. Provider (Unix sys-call) GlueCEStateRunningJobs GlueCEJobLocalID GlueHostProcessorLoadLast1Min Hierarchical MDS-EDG LDAP
Sensors (Unix sys call) StorageElementStatus NetworkUDPPacketLoss NetworkFileTransferThroughput Relational RGMA-EDG HTTP
Condors Sensor modules DiskSpace MemoryUsed SystemLoad ClassAds HawkeyeCondor
NWS probesTraceroute End-to-end bandwidth End-to-end latency End-to-end path XML GridLabs TopoMon GMA arch.
18VO information (examples)
Source Information provided Schema System
Static info. Cert (info. About local certificate policy) MdsHostContact Hierarchical MDS-Globus LDAP
Static info. GlueCEPolicyMaxWallClockTimeGlueCEPolicyMaxCPUTime GlueSAPolicyMaxFileSize Hierarchical MDS-EDG LDAP
19Software Dataset information (examples)
Source Information provided Schema System
Info. Provider Mds-Application-Group-config Mds-Application-name Mds-Application-location Mds-Application-info Hierarchical MDS-Globus LDAP
Info. Provider GlueSLFileName GlueSLFileSize GlueSLFilePath Hierarchical MDS-EDG LDAP
GDMP producer ExportCatalogue RGMA Replica Catalogue Service GDMP-EDG
20Application Logging Information
Source Information provided Schema System
TRIANA Worklow information Metadata XML TRIANA - GridLab
Condor submission DAGMan input file (DAG specification and metadata) Condor-specific Condor meta-scheduler
Workload Management System BrokerInfo file Hierarchical Resource Broker (EDG) LDAP
LDAP queries to JSS, RB. Logging information Bookkeeping information (transient) UserID, JobID, Job State, JobDescription, etc Attributevalue LB Server (EDG) Events, exported API for queries
21Limitations of Current Approaches
- Remarks extracted from the description of a
Grid-application development effort - Jobs typically need to access hundreds of files,
and each site has a different subset of the
files. - Our data system knows what portion of a user's
data may be at each site, but does not know how
to submit grid jobs. - Our job submission system required users to
choose grid sites and gave them no assistance in
choosing. - jobs requesting thousands of files and sites
having hundreds of thousands of files are not
uncommon in production. - it would not be scalable to explicitly publish
all the properties of jobs and resources in ...
22Limitations and Challenges
- Scalability in the context of Millions of
Resources - Infrastructure intrusiveness.
- Resource Discovery, Retrieval and Classification.
- Expressiveness of Data Models in terms of
- Types of captured information.
- Expressing semantic relationships between
represented entities. - Amenability to Indexing, Query Optimization.
- Complexity
- Different protocols for discovery inquiry,
registration, invocation. - Lack of interoperability between different
platforms. - Information Standardization.
- Missing Functionalities
- Transient and Historical information.
- Policies.
- Complex Queries.
23Revisiting the problem
- Very large number of sources.
- Independent.
- No common schema.
- Various, partly unknown semantics.
- Subject to change, birth, or silence.
24Revisiting the problem
- A federated warehouse approach
- Wrap the various sources to extract their
information. - Store data in a warehouse.
- Monitor sources and propagate updates to the
warehouse. - Ask queries to the warehouse.
25Requirements for Searching the Grid
- Global/Common naming scheme for Grid entities.
- Resolution mechanism for discovery and retrieval
of entity-related information/meta-data. - Type and representation of retrieved
entity-related information. - Mining and representation of relationships and
summary data. - Complexity of queries and query interpretation.
26Research Issues
- Metadata Consolidation
- Definition local creation of metadata about
Grid entities. - Information Source Discovery
- Algorithms for Search and Discovery, Management
of Updates. - Metadata Retrieval and Integration
- Protocols for retrieval Data structures and
algorithms for integration. - Management of meta-data
- Analysis to build proper indexes Extrapolation
of semantic relationships. - Query mechanisms and interface.
- Query language definition. Intelligent-agent
interface to help users formulate queries.
27Outline
- Context and Motivation
- Limitations of Grid Information Services
- Semantic Grid and Ontologies
- A Core Grid Ontology
- Conclusions and Future Work
28Looking for answers Semantic Grid
An extension of the current Grid in which
information and services are given well-defined
and explicitly represented meaning, so that it
can be shared and used by humans and machines,
better enabling them to work in cooperation.
Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
29Ontologies and the Semantic Grid
- Ontologies are among the key building blocks of
the Semantic Grid. - The concepts/terms of Grid entities, resources,
capabilities and the relationships between them. - We develop Grid ontologies to
- Merge the information from different sources
- Build a knowledge base for Grid infrastructures
- Construct a Grid information system
- Support co-operation with semantic-able Grid
services, such as Resource Broker, Information
Service, etc.
30Ontologies in Computer Science
- An ontology is an engineering artifact
- It is constituted by a specific vocabulary used
to describe a certain reality, plus - a set of explicit assumptions regarding the
intended meaning of the vocabulary. - Almost always including how concepts should be
classified - Thus, an ontology describes a formal
specification of a certain domain - Shared understanding of a domain of interest
- Formal and machine manipulable model of a domain
of interest
Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
31Languages
- Work on Semantic Web has concentrated on the
definition of a collection or stack of
languages. - These languages are then used to support the
representation and use of metadata. - The languages provide basic machinery that can be
used to represent the extra semantic information
needed for the Semantic Web - XML
- RDF
- RDF(S)
- OWL
Source Goble, Bechhofer, DeRoure, Semantic Grid
101, GGF16, Athens, 2/2005
32W3C Stack
- XML provides a surface syntax for structured
documents - XML Schema is a language for restricting the
structure of XML documents. - RDF is a data-model for objects ("resources") and
relations between them, provides simple semantics
for this data-model - RDF Schema is a vocabulary for describing
properties and classes of RDF resources, with
semantics for generalization and hierarchies of
such properties and classes. - OWL adds more vocabulary for describing
properties and classes.
33Outline
- Context and Motivation
- Limitations of Grid Information Services
- Semantic Grid and Ontologies
- A Core Grid Ontology
- Conclusions and Future Work
34Towards a general Ontology for Grids
- Currently, there are several Grid architectures
and Grid implementations. - Different views of Grid entities and their
properties. - It is practically impossible that one ontology
can include all aspects of Grids or of many types
of Grid entities. - A Core Grid Ontology (CGO)
- A core framework for representing a Grid.
- Open and extensible for all kinds of Grid
architectures and Grid implementations.
35Building a Core Ontology
- The most difficult task for developing an
ontology - Capture a right model for the Grid
- Our view of a Grid
- UsersApplicationsMiddleware/ServicesResources
within VOs - A layer-structured model consisting of three
layers - Users/Applications
- Middleware/services
- Resources.
36A Grid Model
37CGO Classes Overview
38Defining properties
Based on the Constraints of the CGO Classes.
39Representing a Grid Entity
40Representing a Grid Entity using OWL
ltowlClass rdfID"ComputingElement"gt
ltrdfssubClassOfgt ltowlRestrictiongt
ltowlsomeValuesFromgt
ltowlClassgt
ltowlunionOf rdfparseType"Collection"gt
ltowlClass rdfabout"Jobmanager"/
gt ltowlClass
rdfabout"JobScheduler"/gt
lt/owlunionOfgt lt/owlClassgt
lt/owlsomeValuesFromgt
ltowlonProperty rdfresource"runningSevice"/gt
lt/owlRestrictiongt
lt/rdfssubClassOfgt
41Generating Instances
42Conclusions
- The CGO can be used as a common, extensible
language for - Expressing the basic concepts of a Grid
infrastructure and the relationships thereof. - Encoding and storing Grid metadata.
- Integrating grid-related information extracted
from different sources. - Expressing queries.
43Next steps
- Automate the knowledge-base construction and
maintenance process - Information-source discovery
- Metadata wrapping
- Metadata integration
- Consistency updates
- Investigate mechanisms for efficient
knowledge-base query implementation.
44- Thank you for your attention!
- Questions?
- Comments ?
45References
- "A Core Grid Ontology for the Semantic Grid." Wei
Xing, M. D. Dikaiakos, and R. Sakellariou. 6th
IEEE International Symposium on Cluster Computing
and the Grid (CCGrid 2006), Singapore, May 2006
(to appear). - "Information Services for Large-scale Grids A
Case for a Grid Search Engine." M. D. Dikaiakos,
R. Sakellariou, and Y. Ioannidis. In Engineering
the Grid status and perspectives, Jack Dongarra,
Hans Zima, Adolfy Hoisie, Laurence Yang,
Beniamino DiMartino (Editors), American
Scientific Publishers, January 2006, ISBN
1-58883-038-1. - "Building a Distributed Digital Library for
Natural Disasters Metadata with Grid Services and
RDF." W. Xing, M. D. Dikaiakos, Hua Yang, A.
Sphyris, G. Eftychidis. Library Management
Journal (Special Issue on Digital Libraries in
the Knowledge Era Knowledge Management and
Semantic Web Technology). Vol. 26, No. 4-5, May
2005 - "Search Engines for the Grid A Research Agenda."
M. D. Dikaiakos, Y. Ioannidis, R. Sakellariou. In
Grid Computing. First European AcrossGrids
Conference, Santiago de Compostela, Spain,
February 2003, Revised Papers, Lecture Notes in
Computer Science series, vol. 2970, pages 49-58,
vol. 2970, Springer, 2004.
46The RDF Data Model
- Statements are ltsubject, predicate, objectgt
triples - ltSean,hasColleague,Iangt
- Can be represented as a graph
- Statements describe properties of resources
- A resource is any object that can be pointed to
by a URI - The generic set of all names/addresses that are
short strings that refer to resources - a document, a picture, a paragraph on the Web,
http//www.cs.man.ac.uk/index.html, a book in the
library, a real person (?), isbn//0141184280 - Properties themselves are also resources (URIs)
hasColleague
Sean
Ian
Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
47Linking Statements
- The subject of one statement can be the object of
another - Such collections of statements form a directed,
labeled graph - The object of a triple can also be a literal (a
string)
Sean K. Bechhofer
hasName
hasColleague
Sean
Ian
hasHomePage
hasColleague
http//www.cs.man.ac.uk/horrocks
Carole
48RDF Syntax
- RDF has an XML syntax that has a specific
meaning - Every Description element describes a resource
- Every attribute or nested element inside a
Description is a property of that Resource - We can refer to resources by URIs
ltrdfDescription rdfabout"some.uri/person/sean_b
echhofer"gt ltohasColleague resource"some.uri/pe
rson/ian_horrocks"/gt ltohasName
rdfdatatype"xsdstring"gtSean K.
Bechhoferlt/ohasNamegt lt/rdfDescriptiongt ltrdfDesc
ription rdfabout"some.uri/person/ian_horrocks"gt
ltohasHomePagegthttp//www.cs.mam.ac.uk/horrocks
lt/ohasHomePagegt lt/rdfDescriptiongt ltrdfDescripti
on rdfabout"some.uri/person/carole_goble"gt
ltohasColleague resource"some.uri/person/ian_horr
ocks"/gt lt/rdfDescriptiongt
49What does RDF give us?
- A mechanism for annotating data and resources.
- Single (simple) data model.
- Syntactic consistency between names (URIs).
- Low level integration of data.
Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
50RDF(S) RDF Schema
- RDF gives a formalism for meta data annotation,
and a way to write it down in XML, but it does
not give any special meaning to vocabulary such
as subClassOf or type (supporting OO-style
modelling) - Interpretation is an arbitrary binary relation
- RDF Schema extends RDF with a schema vocabulary
that allows you to define basic vocabulary terms
and the relations between those terms - Class, type, subClassOf,
- Property, subPropertyOf, range, domain
- it gives extra meaning to particular RDF
predicates and resources - this extra meaning, or semantics, specifies how
a term should be interpreted
Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
51Problems with RDFS
- RDFS is too weak to describe resources in
sufficient detail - No localised range and domain constraints
- Cant say that the range of hasChild is person
when applied to persons and elephant when applied
to elephants - No existence/cardinality constraints
- Cant say that all instances of person have a
mother that is also a person, or that persons
have exactly 2 parents - No transitive, inverse or symmetrical properties
- Cant say that isPartOf is a transitive property,
that hasPart is the inverse of isPartOf or that
touches is symmetrical - It can be difficult to provide reasoning support
- No native reasoners for non-standard semantics
- May be possible to reason via FO axiomatisation
Source Goble, Bechhofer, DeRoure, Semantic Grid
101 GGF16, Athens, 2/2005
52Web Ontology Language Requirements
- Desirable features identified for Web Ontology
Language - Extends existing Web standards
- Such as XML, RDF, RDFS
- Easy to understand and use
- Should be based on familiar KR idioms (e.g.
OO-style, frames etc). - Formally specified
- Of adequate expressive power
- Possible to provide automated reasoning support
53OWL
- W3C Recommendation (February 2004)
- Well defined RDF/XML serializations
- A family of Languages
- OWL Full
- OWL DL
- OWL Lite
- Formal semantics
- First Order (DL/Lite)
- Relationship with RDF
- Comprehensive test cases for tools/implementations
- Growing industrial takeup.
54OWL Basics
- Set of constructors for concept expressions
- Booleans and/or/not
- Quantification some/all
- Axioms for expressing constraints
- Necessary and Sufficient conditions on classes
- Disjointness
- Property characteristics transitivity, inverse
- Facts
- Assertions about individuals
55Metacomputing Directory Service (MDS)
- Distributed Directory approach collection of
LDAP servers. - Simple LDAP Information Schemas describe resource
information. - Servers
- Grid Resource Information Server (GRIS) Running
on each resource and supplying information about
it. Supports multiple resources as well. - Grid Index Information Server (GIIS) Collect
information from multiple GRIS servers. Support
particular queries for information spread across
multiple GRIS servers. - Protocols (LDAP based) for
- Discovery and Inquiry (GRIP).
- Soft-state Registration (GRRP).