Data and Metadata Architectures in a Robust Semantic Grid

About This Presentation

Title:

Data and Metadata Architectures in a Robust Semantic Grid

Description:

FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. Portal. FS. OS. OS. OS. OS. OS. OS. OS. OS. OS. OS. OS. OS. MD. MD. MD. MD. MD. MD ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 55

Provided by: gridsUcs

Learn more at: http://grids.ucs.indiana.edu

Category:

more less

Transcript and Presenter's Notes

Title: Data and Metadata Architectures in a Robust Semantic Grid

1
Data and Metadata Architectures in a Robust
Semantic Grid

Chinese Academy of Sciences
July 28 2006
Geoffrey Fox
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
http//grids.ucs.indiana.edu/ptliupages/presentati
ons/
gcf_at_indiana.edu http//www.infomall.org

2
Status of Grids and Standards I

It is interesting to examine Grid architectures
both to see how to build great new systems but
also to look at linking Grids together and making
them (or parts of them) interoperable
There is agreement that one should use Web
Services with WSDL and SOAP and not so much
agreement after that
But use non SOAP transport like GridFTP
Can divide Service areas into
General Infrastructure
Compute Grids
Data and Information Grids
Other ..

3
Status of Grids and Standards II

General Infrastructure covers area where
Industry, OASIS and W3C are building the
pervasive Web service environment
There are important areas of debate and vigorous
technical evolution but these are within confined
areas
Relatively clear how to adapt between different
choices
Examples of areas of some contoversy
Security critical but commercial, academic
institution and Grid project solutions still
evolving
Workflow has many choices and BPEL not clearly
consensus standard differences between control
and data flow
Architecture of Service discovery understood but
skepticism that UDDI appropriate it keeps
getting improved
In Management, Notification and Reliable
Messaging, there are multiple standards but
rather trivial to map between them
WSRF symbolizes disagreements in state (which is
roughly meta-data area) but roughly this is
question as to whether metadata in message or
context service or hidden in application
Data transport model unclear GridFTP v.
BitTorrent v. Fast XML

4
The Ten areas covered by the 60 core WS-
Specifications
5
Activities in Open Grid Forum Working Groups
6
The NCES/WS-/GS- Features/Service Areas I
7
Grids of Grids of Simple Services

Grids are managed collections of one or more
services
A simple service is the smallest Grid
Services and Grids are linked by messages
Internally to service, functionalities are linked
by methods
Link serices via methods ? messages ? streams
We are familiar with method-linked
hierarchyLines of Code ? Methods ? Objects ?
Programs ? Packages

8
Mediation and Transformation in a Grid of Grids
and Simple Services
Mediation and Transformation Services Distributed
Brokers between distributed ports
Mediation and Transformation Services Listen,
Queue Transform, Send
External facing Interfaces
Mediation and Transformation Services 1-10 ms
Overhead Use OGSA to Federate?
9
The NCES/WS-/GS- Features/Service Areas II
10
Interoperability etc. for FS11-14

The higher level services are harder as the
systems are more complicated and less agreement
on where standards should be defined
OGF has JSDL, BES (Basic Execution Services) but
might be better to set standards at a different
level
i.e. users might prefer to view Condor or GT4 as
collections of services as the interface
Idea is that maybe we should consider high level
capabilities as Grids (an EGEE or Condor
compute Grid for example whose internals are
black boxes for users) and then you need two
types of interfaces
Internal interfaces like JSDL defining how the
Condor Grid interacts internally with a computer
External Interfaces defining how one sets up a
complex problem (maybe with lots of individual
jobs as in SETI_at_Home) for a Compute Grid

11
gLite Grid Middleware Services
Access
API
CLI
Security Services
Authorization
Information Monitoring
Services
Application Monitoring
Information Monitoring
Auditing
Authentication
Data Management
Workload Mgmt Services
MetadataCatalog
JobProvenance
PackageManager
File ReplicaCatalog
Accounting
StorageElement
DataMovement
ComputingElement
WorkloadManagement
Connectivity
12
DIRAC Architecture
13
Old AliEn Framework
User
100 perl5
Central services
SOAP
Local Site elements
14
Raw Data ? Data ? Information ?
Knowledge ? Wisdom
AnotherGrid
Decisions
AnotherGrid
SS
SS
SS
SS
FS
FS
OS
MD
MD
FS
Portal
Portal
FS
OS
OS
OS
SOAP Messages
OS
FS
FS
FS
FS
AnotherService
FS
MD
MD
MD
OS
MD
OS
OS
OS
OS
FS
Other Service
FS
FS
FS
FS
OS
MD
OS
OS
OS
FS
FS
FS
FS
MD
MD
MD
MD
FS
FS
Filter Service
OS
OS
FS
MetaData
AnotherGrid
FS
FS
FS
MD
Sensor Service
SS
SS
SS
SS
SS
SS
SS
SS
SS
SS
Grids of Grids Architecture
AnotherService
15
Data-Information-Knowledge-Wisdom Pipeline

DIKWi represent different forms of DIKW with
different terminology in different fields.
Each DIKWi has a resource view describing its
physical instantiation (different distributed
media with file, database, memory, stream etc.)
and
An access view describing its query model (dir or
ls, SQL, XPATH, Custom etc.).
The different forms DIKWi are linked by filtering
steps F. This could be a simple format
translation a complex calculation as in the
running of an LHC event processing code a
proprietary analysis as in a Search engines
processing of harvested web pages an addition of
a metadata catalog to a collection of files.

16
DIKW Pipeline II

Each DIKW can be a complete data grid
The resource view is typified by standards like
ODBC, JDBC, OGSA-DAI and is internal to DIKW Grid
A name-value resource view is exemplified by
Javaspaces (tuple model) and WS-Context
The access (user) view is external view of a data
grid and does not have such a clear model but
rather
Systems like SRB (Storage Resource Broker) that
virtualize file collections
WebDAV supports the distributed file access view
VOSpace from astronomy community is viewed by
some as an abstraction of SRB
WFS Web Feature Service from Open Geospatial
Consortium is an important example

17
WMS uses WFS that uses data sources
ltgmlfeatureMembergt ltfaultgt ltnamegt
Northridge2 lt/namegt ltsegmentgt Northridge2
lt/segmentgt ltauthorgt Wald D. J.lt/authorgt
ltgmllineStringPropertygt
ltgmlLineString srsName"null"gt
ltgmlcoordinatesgt -118.72,34.243
-118.591,34.176 lt/gmlcoordinatesgt
lt/gmlLineStringgt lt/gmllineStringPropertygt
lt/faultgt lt/gmlfeatureMembergt
18
Managed Data

Most grids have a managed data component (which
we call a Managed Data Grid)
Managed data can consist of the data and one or
more metadata catalogs
Metadata catalogs can contain semantic
information enabling more precise access to the
data
Replica catalogs (managing multiple file copies)
are another metadata catalog
SRB and Digital libraries have this architecture
with mechanisms to keep multiple metadata copies
coherent
RDF has clear relevance
However there is no clear consensus as to how to
build a Managed Data Grid

19
Resource and User Views

Federation implies we integrate (virtualize) N
data systems which could be heterogeneous
Sometimes you can choose where to federate but
sometimes you can only federate at user view
In Astronomy Grids there are several (20)
different data sources (collections)
corresponding to different telescopes. These are
built on traditional bases but expose astronomy
query interface (VOQL etc.) and one cannot
federate at database level
Geographical Information Systems GIS are built on
possibly spatially enhanced databases but expose
WFS or WMS OGC interfaces
To make a map of Indiana you need to combine the
GIS of 92 separate counties this cannot be done
at database level
More generally when we linking black box data
repositories to the Grid, we can only federate at
the interfaces exposed by the black box

20
Metadata Systems I Applications

Semantic description of data (for each
application)
Replica Catalog
UDDI or other service registry
VOMS or equivalent (PERMIS) authorization catalog
Compute Grid static resource metadata
Compute Grid dynamic events
And implicitly metadata defining workflow, state
etc. which can be stored in messages and/or
catalogs (databases)
Why not unify the resource view of these?

21
Metadata Systems II Implementations

There are also many WS- specifications
addressing meta-data defined broadly
WS-MetadataExchange
WS-RF
UDDI
WS-ManagementCatalog
And many different implementations from
(extended) UDDI through MCAT of the Storage
Research Broker
And of course representations including RDF and
OWL
Further there is system metadata (such as UDDI
for core services) and metadata catalogs for each
application domain
They have different scope and different QoS
trade-offs
e.g. Distributed Hash Tables (Chord) to achieve
scalability in large scale networks

WS-Context
ASAP
WBEM
WS-GAF

22
Different Trade-offs

It has never been clear to me how a poor lonely
service is meant to know where to look up
meta-data and if it is meant to be thought up as
a database (UDDI, WS-Context) or as the contents
of a message (WS-RF, WS-MetadataExchange)
We identified two very distinct QoS tradeoffs
1) Large scale relatively static metadata as in
(UDDI) catalog of all the worlds services
2) Small scale highly dynamic metadata as in
dynamic workflows for sensor integration and
collaboration
Fault-tolerance and ability to support dynamic
changes with few millisecond delay
But only a modest number of involved services (up
to 1000s in a session)
Need Session NOT Service/Resource meta-data so
dont use WS-RF

23
XML Databases of Importance

We choose a message based interface to a backend
database
We built two pieces of technology with different
trade-offs but each could store any meta-data but
with different QoS
WS-Context designed for controlling a dynamic
workflow
(Extended) UDDI exemplified by semantic service
discovery
WFS provides general application specific XML
data/meta-data repository built on top of a
hybrid system supported by UDDI and WS-Context
These have different performance, scalability and
data unit size requirement
In our implementation, each is currently just an
Oracle/MySQL database (with Javaspaces cache in
WS-Context) front ended by filters that convert
between XML (GML for WFS) and object-relational
Schema
Example of Semantics (XML) versus representation
(SQL)
OGSA-DAI offers Grid interface to databases we
could use this internally but dont as we only
need to expose external interfaces WFS and not
MySQL to Grid

24
WFS Geographical Information System compatible
XML Metadata Services

Extended UDDI XML Metadata Service (alternative
to OGC Web Registry Services) supports WFS GIS
Metadata Catalog (functional metadata),
user-defined metadata ((name, value) pairs),
up-to-date service information (leasing),
dynamically updated registry entries.
Our approach enables advanced query capabilities
geo-spatial and temporal queries ,
metadata oriented queries,
domain independent queries such as XPATH, XQuery
on metadata catalog.
http//www.opengrids.org/extendeduddi/index.html

25
Context as Service Metadata

We define all metadata (static, semi-static,
dynamic) relevant to a service as Context.
Context can be associated to a single service, a
session (service activity) or both.
Context can be independent of any interaction
slowly varying, quasi-static context
Ex type or endpoint of a service, less likely to
change
Context can be generated as result of service
interactions
dynamic, highly updated context
information associated to an activity or session
Ex session-id, URI of the coordinator of a
workflow session

26
Hybrid XML Metadata Services gt WS-Context
extendedUDDI

We combine functionalities of these two services
WS-Context AND extendedUDDI in one hybrid service
to manage Context (service metadata).
WS-Context controlling a workflow
(Extended) UDDI supporting semantic service
discovery
This approach enables uniform query capabilities
on service metadata catalog.
http//www.opengrids.org/wscontext/index.html

27
IS Client
IS Client
IS Client
WSDL
WSDL
WSDL
HTTP(S)
WSDL
WSDL
WSDL
Information Service
Optimized forScalability
Optimized forPerformance
WS-Context Ver1.0 ws-context.wsdl
WSDL
WSDL
UDDI Version 3.0 WSDL Service Interface
Descriptions uddi_api_v3_portType.wsdl
WSDL
WSDL
Extended WS-Context Service
Extended UDDI Registry Service
JDBC
JDBC
DB
DB
interaction-independent relatively static
metadata
dynamic metadata
28
Generalizing a GIS

Geographical Information Systems GIS have been
hugely successful in all fields that study the
earth and related worlds
They define Geography Syntax (GML) and ways to
store, access, query, manipulate and display
geographical features
In SOA, GIS corresponds to a domain specific XML
language and a suite of services for different
functions above
However such a universal information model has
not been developed in other areas even though
there are many fields in which it appears
possible
BIS Biological Information System
MIS Military Information System
IRIS Information Retrieval Information System
PAIS Physics Analysis Information System
SIIS Service Infrastructure Information System

29
ASIS Application Specific Information System I

a) Discovery capabilities that are best done
using WS- standards
b) Domain specific metadata and data including
search/store/access interface. (cf WFS). Lets
call generalization ASFS (Application Specific
Feature Service)
Language to express domain specific features (cf
GML). Lets call this ASL (Application Specific
language)
Tools to manipulate information expressed in
language and key data of application (cf
coordinate transformations). Lets call this ASTT
(Application specific Tools and Transformations)
ASL must support Data sources such as sensors (cf
OGC metadata and data sensor standards) and
repositories. Sensors need (common across
applications) support of streams of data
Queries need to support archived (find all
relevant data in past) and streaming (find all
data in future with given properties)
Note all AS Services behave like Sensors and all
sensors are wrapped as services
Any domain will have raw data (binary) and that
which has been filtered to ASL. Lets call ASBD
(Application Specific Binary Data)

30
ASIS Application Specific Information System II

Lets call this ASVS (Application Specific
Visualization Services) generalizing WMS for GIS
The ASVS should both visualize information and
provide a way of navigating (cf GetFeatureInfo)
database (the ASFS)
The ASVS can itself be federated and presents an
ASFS output interface
d) There should be application service interface
for ASIS from which all ASIS service inherit
e) There will be other user services interfacing
to ASIS
All user and system services will input and
output data in ASL using filters to cope with ASBD

31
Application Context Store usage in
communication of mobile Web Services

Handheld Flexible Representation (HHFR) is an
open source software for fast communication in
mobile Web Services. HHFR supports
streaming messages, separation of message
contents and usage of context store.
http//www.opengrids.org/hhfr/index.html
We use WS-Context service as context-store for
redundant message parts of the SOAP messages.
redundant data is static XML fragments encoded in
every SOAP message
Redundant metadata is stored as context
associated to service conversion in place
The empirical results show that we gain 83 in
message size and on avg. 41 on transit time by
using WS-Context service.

32
Optimizing Grid/Web Service Messaging Performance
The performance and efficiency of Web Services
can be greatly increased in conversational and
streaming message exchanges by removing the
redundant parts of the SOAP message.
33
Performance with and without Context-store

Experiments ran over HHFR
Optimized message exchanged over HHFR after
saving redundant/unchanging parts to the
Context-store
Save on average
83 of message size, 41 of transit time

Summary of the Round Trip Time (TRTT)
34
System Parameters

Taccess time to access to a Context-store (i.e.
save a context or retrieve a context to/from the
Context-store) from a mobile client
TRTT Round Trip Time to exchange message through
a HHFR channel
N number of simultaneous streams supported by
stream summed over ALL mobile clients
Twsctx time to process setContext operation
Taxis time consumed for Axis process
Ttrans transmission time through network
Tstream stream length

35
Context-store System Parameters
36
Summary of Taxis and Twsctx measurements

Taccess Twsctx Taxis Ttrans
Data binding overhead
at Web Service Container
is the dominant factor to
message processing

37
Performance Model and Measurements

Chhfr nthhfr Oa Ob
Csoap ntsoap
Breakeven point
nbe thhfr Oa Ob nbe tsoap
Oa(WS) is roughly 20 milliseconds

Oa overhead for accessing the
Context-store Service Ob overhead for
negotiation
38
Core Features of Management Architecture

Remote Management
Allow management irrespective of the location of
the resource (as long as that resource is
reachable via some means)
Traverse firewalls and NATs
Firewalls complicate management by disabling
access to some transports and access to internal
resources
Utilize tunneling capabilities and multi-protocol
support of messaging infrastructure
Extensible
Management capabilities evolve with time. We use
a service oriented architecture to provide
extensibility and interoperability
Scalable
Management architecture should be scale as number
of managees increases
Fault-tolerant
Management itself must be fault-tolerant. Failure
of transports OR management components should not
cause management architecture to fail.

39
Management System built in terms of

Bootstrap System Robust itself by Replication
Registry for metadata (distributed database)
Robust by standard database techniques and our
system itself for Service Interfaces
NaradaBrokering for robust tunneled messages NB
itself robust using our system
Managers Easy to make robust using our system
these are essentially agents
Managees what you are managing Our system
makes robust There is NO assumption that
Managed system uses NB

40
Basic Management Architecture I

Registry
Stores system state.
Fault-tolerant through replication
Could be a global registry OR separate registries
for each domain (later slide)
Current implementation uses a simple in-memory
system
Will use our WS - Context service as our
registry(Service/Message Interface to in-memory
JavaSpaces cache and MySQL)
Note metadata transported by messages but we use
distributed database to implement
Messaging Nodes
NaradaBrokering nodes that form a scalable
messaging substrate
Main purpose is to serve as a message delivery
mechanism between Managers and Service Adapters
(Managees) in presence of varying network
conditions

41
Basic Management Architecture II

Resources to Manage (Managee)
If the resources DO NOT have a Web Service
interface, we create a Service Adapter (a proxy
that provides the Web Service interface as a
wrapper over the basic management functionality
of the resource).
The Service Adapters connect to existing
messaging nodes. This mainly leverages
multi-protocol transport support in the messaging
substrate. Thus, alternate protocols may be used
when network policies cause connection failures
Managers
Active entities that manage the resources.
May be multi-threaded to improve scalability
(currently under further investigation)

Managees
42
ArchitectureUse of Messaging Nodes

Service adapters and Managers communicate through
messaging nodes
Direct connection possible, however
This assumes that the service adapters are
appropriately accessible from the machines where
managers would run
May require special configuration in routers /
firewalls
Typically managers and messaging nodes and
registries are always in the same domain OR a
higher level network domain with respect to
service adapters
Messaging Nodes (NaradaBrokering Brokers)
provides
A scalable messaging substrate
Robust delivery of messages
Secure end-to-end delivery

43
ArchitectureBootstrapping Process

The architecture is arranged hierarchically.
Resources in different domains can be managed
with separate policies for each domain
A Bootstrapping service is run in every domain
where the management architecture exists.
Serves to ensure that the child domain bootstrap
process are always up and running.
Periodic heartbeats convey status of bootstrap
service
Bootstrap service periodically spawns a
health-check manager that checks health of the
system (ensures that the registry and messaging
nodes are up and running and that there are
enough managers for managees)
Currently 1 manager per managee

HierarchicalBootstrap Nodes
/ROOT
/ROOT/FSU
/ROOT/CGL
Registry
Registry
44
Architecture User Component

Application-specific specification of the
characteristics that the resources/services being
managed, should maintain.
Impacts Managee interface, registry and Manager
Generic and Application specific policies are
written to the registry where it will be picked
up by a manager process.
Updates to the characteristics (WS-Policy in
future) are determined by the user.
Events generated by the Managees are handled by
the manager.
Event processing is determined by policy (future
work),
E.g. Wait for users decision on handling
specific conditions
The event can be processed locally, so execute
default policy, etc
Note Managers will set up services if registry
indicates that is appropriate so writing
information to registry can be used to start up a
set of services

45
ArchitectureStructure of Managers

Manager process starts appropriate manager thread
for the manageable resource in question
Heartbeat thread periodically registers the
Manager in registry
SAM (Service Adapter Manager) Module Thread
starts a Service/Resource Specific Resource
Manager that handles the actual management task
Management system can be extended by writing
ResourceManagers for each type of Managee

Manager
Heartbeat Generator Thread
46
Prototype

We illustrate the architecture by managing the
distributed messaging middleware, NaradaBrokering
This example motivated by the presence of large
number of dynamic peers (brokers) that need
configuration and deployment in specific
topologies
Use WS Management (June 2005) parts (WS
Transfer Sep 2004, WS Enumeration Sep 2004
and WS Eventing) (could use WS-DM)
WS Enumeration implemented but we do not
foresee any immediate use in managing the
brokering system
WS Transfer provides verbs (GET / PUT / CREATE
/ DELETE) which allow us to model setting and
querying broker configuration, instantiating
brokers and creating links between them and
finally deleting brokers (tear down broker
network) and re-deploy with possibly a different
configuration and topology
WS Eventing (will be leveraged from the WS
Eventing capability implemented in OMII)
WS Addressing Aug 2004 and SOAP v 1.2 used
(needed for WS-Management)
Used XmlBeans 2.0.0 for manipulating XML in
custom container.
WS-Context will replace current registry

47
Prototype Components

Broker Service Adapter
Note NB illustrates an electronic entity that
didnt start off with an administrative Service
interface
So add wrapper over the basic NB BrokerNode
object that provides WS Management front-end
Also provides a buffering service to buffer
undeliverable responses
These will be retrieved later by a separate
Request Response message exchange
Broker Network Manager
WS Management client component that is used to
configure a broker object through the Broker
Service Adapter
Contains a Request-Response as well as
Asynchronous messaging style capabilities
Contains a topology generator component that
determines the wiring between brokers (links that
form a specific topology)
For the purpose of prototype we simply create a
CHAIN topology where each ith broker is connected
to (i-1)st broker

48
Prototype Resources/Properties Modeled (very
specific to NaradaBrokering)
49
Response TimeHandling Events (WS Eventing)

Test Resource which does not do any work other
than responding to events
This base model shows that up to 200 resources
can be managed per manager process, beyond which
response time increases rapidly
This number is resource dependent and this result
is illustrative.
Equally dividing management between 2 processes,
increases response time, although slowly.

50
Amount of Management Infrastructure Required

N Number of resources to manage
NMP Number of Manager processes
If a manager process can manage 200 resources
simultaneously, then NMP N/200
NMN Number of Messaging Nodes
If a messaging node can support 800 simultaneous
connections then
NMN (N N/200 1) /800
1 connection is for registry

51
Amount of Management Infrastructure Required

Management Infrastructure Required
N/200 (N N/200 1)/800
N/160 (approximately)
Thus, for N gt 160, management is doable by adding
(N/160) 100
-----------------
(N N/160)
i.e. about, 0.625 more processes
Thus Management architecture is scalable, and the
approach is feasible

52
PrototypeRecovery costs (Individual Resources
Brokers)
Time for Create Broker depends on the number
type of transports opened by the broker E.g. SSL
transport requires negotiation of keys and would
require more time than simply opening a TCP port
53
Recovery times I

Use 5 msec read time per object from registry and
consider two different topologies
Ring Topology
N nodes, N links (1 outgoing link per Node)
Each Resource Management thread loads 2 objects
(read from) and write their corresponding state
(2 objects) (write to) REGISTRY.
Time to load (theoretical) per broker
10 1110 734 94
1.9 sec
Time to load (observed) 2.4 to 2.7 secs (total)

54
Recovery times II

Cluster Topology
N nodes, Links per broker vary from 0 3
(depending on what level is the broker present)
At Cluster level, we maintain a chain (hence num
links is 1, and 0 for the last node)
At Super cluster level, we again maintain a
chain, so all nodes except the last node will
have an additional link
At Super-Super-Cluster level, we maintain a chain
of super-cluster level nodes, so an additional
link per node except for the last one in chain
Each Resource Management thread loads from 1 4
objects (read from) and write their corresponding
state (1 - 4 objects) (write to) REGISTRY.
1 object for Broker Node, others for links
Thus, Time to load (theoretical) per broker
5 20 1110 734 0 (941 432)
1.8 2.0 sec similar to ring topology

Write a Comment

User Comments (0)

About PowerShow.com

Data and Metadata Architectures in a Robust Semantic Grid - PowerPoint PPT Presentation

Data and Metadata Architectures in a Robust Semantic Grid

FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. FS. Portal. FS. OS. OS. OS. OS. OS. OS. OS. OS. OS. OS. OS. OS. MD. MD. MD. MD. MD. MD ... – PowerPoint PPT presentation