Integrated support for data integration and science portals - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Integrated support for data integration and science portals

Description:

... (may be) on systems with different ... in a subset of OWL Any DAG-structured data source Source ... Portal Command/Batch Access Integrated ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 59
Provided by: Amarnat9
Category:

less

Transcript and Presenter's Notes

Title: Integrated support for data integration and science portals


1
Integrated support for data integration and
science portals
  • Amarnath Gupta
  • University of California San Diego

2
Overview
  • We will first
  • Discuss what cyberinfrastructure for science
    means
  • Situate the business of data integration within
    the cyberinfrastructure setting
  • Then we will briefly describe a few
    cyberinfrastructure projects in different science
    disciplines
  • Biomedical sciences, geo-sciences, environmental
    sciences, marine biology, physical oceanography
  • We will examine some dimensions of the data
    integration problem
  • Discuss how they are approached in different
    projects from a CS /Data Management perspective
  • Discuss common and complementary themes across
    these approaches

3
Cyberinfrastructure
  • Cyberinfrastructure is the organized aggregate of
    technologies enabling access and coordination of
    information technology resources to facilitate
    science, engineering, and societal goals.
  • Data access from distributed systems
  • Data inter-operability and assimilation
  • Computation grid based and workflows
  • Visualization
  • Tools
  • Information Integration highlighted today
  • National Science Foundations Cyberinfrastructure

NSF Blue Ribbon Panel (Atkins) Report provided a
compelling and comprehensive vision of an
integrated Cyberinfrastructure
Modified from Berman, SDSC, 2005
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
4
Source Mark Ellisman
5
Source Mark Ellisman
6
  • We are here
  • Making more general-purpose data integration
    infrastructure over distributed resources
  • Extending to accommodate various scientific
    applications with stored and streaming data

Source Mark Ellisman
7
GEONgrid Software Layers
Portal (login, myGEON)
Registration
GEONsearch
Core Grid Services GT3, OGSA-DAI, GSI, CAS,
gridFTP, SRB, PostGIS, mySQL, DB2
Physical Grid RedHat Linux, ROCKS, Internet, I2,
OptIPuter (planned)
GEON Space
8
BIRN Major System Components
Collaborating Groups of Biomedical Researchers
Registered BIRN Data
9
BIRN Specific Implementations
Mouse, Function, Morphometry ( New Areas and
Users )
Pegasus, Kepler, Loni Pipeline, etc.
e.g., AFNI, Air, 3DSlicer, LONI, ..
BIRN Data Integration Suite
Registered BIRN Data
10
The OntoGrid View
Third-party tools
Tavernae-Science workbench
Applications
Haystack
LSID Launchpad
Web portals
Utopia
e-Science process patterns
LSID support
myGrid information model
e-Science mediator
e-Science coordination
Metadata Management
Data Management
e-Science events
KAVE metadata store
Service workflow discovery
mIR myGrid information repository
Fetasemantic discovery
KAVE provenance capture
Core Services
Pedro semantic publication
Workflow enactment
Pedro semantic publication
Freefluoworkflow engine
GRIMOIRES federated UDDI registry
Notification service
myGrid ontology
Web Service (Grid Service) communication fabric
External Services
Java applications
Soaplab
AMBITtext extraction service
OGSA-DAI DQP service
Executable codes with an IDL
Gowlab
Legacy applications
Web Services
OGSA-DAI databases
Web Sites
Courtesy Carole Goble
11
A Word about Data in ScienceExcerpts from a
Report by NSFs Office of the Cyberinfrastructure
  • Data. data are any and all complex data
    entities from observations, experiments,
    simulations, models, and higher order assemblies,
    along with the associated documentation needed to
    describe and interpret the data.
  • Metadata. Metadata are a subset of data, and are
    data about data. Metadata summarize data content,
    context, structure, inter-relationships, and
    provenance (information on history and origins).
    They add relevance and purpose to data, and
    enable the identification of similar data in
    different data collections.
  • Ontology. An ontology is the systematic
    description of a given phenomenon, often includes
    a controlled vocabulary and relationships,
    captures nuances in meaning and enables knowledge
    sharing and reuse.

12
What is data integration?
  • For applications where there are a number of data
    sources (recall previous slide)
  • Geographically distributed
  • Having data on different platforms
  • (may be) on systems with different query
    capabilities (e.g., different DBMSs, files,
    spreadsheets)
  • Perhaps even having different data models
  • Having different schema
  • BUT about one common, general theme
  • One may want to construct
  • A general-purpose information system such that
  • All these data sources can be co-accessed as if
    they belong to a single data source
  • It can produce combined information objects
    on-demand for ad hoc queries to facilitate
    problem-specific analyses performed through other
    software products (workflows, atlases,
    statistical packages )
  • Data integration refers to a body of techniques
    to produce such an information system

13
Data Integration vis-à-vis Data Grid
  • A different aspect of data management

Inter-organizational Information Storage
Management
Semantic data Organization (with behavior)
Virtual Data Transparency
Data Replica Transparency
image_0.jpgimage_100.jpg
Data Identifier Transparency
Storage Location Transparency
Storage Resource Transparency
Courtesy Reagan Moore and Arun Jagatheesan
14
Data Integration in Science Starts with Science
Questions
  • GeoScience (GEON)
  • What is the geologic and geophysical record of
    Super-Continent assembly and dispersal?
  • What are the architectures of terrain boundaries
    at depth?
  • How do composition, temperature and strain
    fabrics vary within the lithosphere and
    asthenosphere? Are lithospheric and
    asthenospheric strain coupled?
  • Neuroscience (BIRN)
  • Find volumetric data/metadata from MRIs of humans
    with specific diagnosis(es)
  • Which structures are decreased/increased in size
    relative to normal controls
  • Which structures show structural differences
    across a variety of diagnoses
  • Given a structure which shows structural
    differences
  • Which other structures are associated with it
  • Do any of these associated structures show
    structural differences
  • Do these other changed structures have
    commonalities (i.e. cell types,
    neurotransmitters, other afferent/efferent
    connections)
  • Environmental Science (PAKT, CAMERA)
  • Explain biodiversity by correlating distribution
    of a taxonomic group with spatial (temporal)
    distribution of temperature, dissolved oxygen,
    salinity.
  • What accounts for large-scale genetic variation
    in microbial genomes that share a very recent
    common ancestry among coral reef habitats?

DATA NEEDED TO ADDRESS THESE QUESTIONS ARE
DISTRIBUTED ACROSS THE WORLD
15
A Science Question can be Complex
Q1. What is the geologic and geophysical record
of Super-Continent assembly and dispersal?
Needs complex integration of geophysical data
with those associated with sub-crustal
lithosphere ages, its composition and physical
properties (seismic, thermal etc), surface
geology and associated events chronology
Adapted from D.Seber, SDSC
A.K.Sinha, Virginia Tech, 2005
16
Converting Questions to Queries
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
17
(Some) Dimensions of Information Integrationin
Cyberinfrastructure Projects
  • Source Information Model
  • Integration Engines Information Model
  • Specification of semantic correspondences across
    sources
  • The 3-party power play among global schema,
    local schema, ontology
  • Query paradigms over integrated data
  • The mechanics of
  • query planning
  • query execution

18
About Semantic Correspondences
  • The general problem
  • For any data integration across multiple sources
    there needs to be a way to
  • Specify how two objects from different data
    sources may correspond
  • Specify of the joining of these two objects
    would create a composite data object
  • Whats the big deal?
  • Identical object versus equivalent objects
  • Complete objects versus partial objects
  • Multi-scale representations of the same object
  • Handling definitional differences
  • Taking into account natural variability
  • Contextual correspondence

Are these always specifiable through ontological
standards like OWL? Do we need to have
correspondence checking services? Listen to
Oscar and Carols session tomorrow for a
different angle
19
About the 3-party Power Play
  • While we want to create a single (cyber-)
    infrastructure with a data integration component,
    different applications have different integration
    scenarios
  • Is there a single global schema?
  • Do new applications (and hence global schema) get
    added all the time over existing sources and
    ontologies?
  • Are the sources fixed? Do new sources get added
    all the time? Do sources come and go?
  • Are sources added dynamically as data sets that
    users want to integrate on the fly?
  • Do local schemata come with their own ontologies?
    Is there a global ontology that all local
    ontologies must map to?
  • How does the global schema (if one exists) relate
    to the global and local ontologies?
  • Do new (or modified) ontologies get added all the
    time?
  • Do the local schemata evolve all the time?

Is there a general way to manage this? Do we need
to architect any cyberinfrastructure components
differently?
20
Source Information Models
  • BIRN
  • Data Sources
  • Relational DBMS
  • Standard data types
  • Semantic data types (attribute-domain references
    to ontologies)
  • Some data and computation sources expose a set of
    functions
  • Key constraints
  • Ontology Sources
  • Simplifying assumptions
  • Ontologies can be approximated by edge-labeled
    directed graphs stored in relational systems
  • Graph traversal functions can be mimicked as
    database functions
  • BONFIRE
  • Glue ontology for simple inter-ontology mappings
    and extensions
  • Image and Spatial Data Sources
  • Discussed later

21
Source Information Models
  • GEON
  • Data Sources
  • Assumption all data are in GEONSpace
  • Items and Item details
  • Any relational jdbc data source (e.g., Excel
    files) is admitted
  • Standard relational data types, shapefiles for
    spatial data
  • Semantic Data types by connecting to ontology
  • Ontology Sources
  • Any OWL-specified ontology
  • Registration in GEON
  • Level 1 Federation Based Integration
  • Users should know the component database
    schemata
  • Level 2 View Based Integration
  • Same as in BIRN
  • Level 3 Ontology Based Integration
  • Preferred Method

22
Source Information Models
  • PAKT (marine biogeography)
  • Data Sources
  • Relational
  • Spatial (vectors) supported by GIS and Spatial
    DBMS
  • Spatial (raster continuously partitionable
    arrays)
  • ArcGIS (map algebra),
  • Nested, non-aligned, multiple resolution
  • Spatially-indexed time series
  • Function-exposing sources (WSDL)
  • Parameter and result data types are interpretable
    or BLOBS
  • Ontology Sources
  • Any ontology specified in a subset of OWL
  • Any DAG-structured data source

23
Source Information Models
  • CAMERA
  • PAKT
  • Data sources that export annotated sequences as a
    base data type
  • Phylogenetic trees
  • XML repositories with XPath/XQuery Processor
  • RDBMS with XML processing capabilities
  • Graphs such as molecular interaction networks
    (e.g., biological pathways), chemical reaction
    networks

24
Integration Engines Information Model
  • BIRN
  • Sources from the mediators view
  • Base relations may have binding patterns
  • Distinction between data and metadata is not
    strictly observed
  • SRB metadata catalog is treated as a relational
    source with some special functions
  • Files are accessed by reference to data-grid URIs
    (SRB ids)
  • Integration Model
  • Essentially Global-as-view (GAV) mediation
  • semantic aspect of the mediation executed
    through opaque functions over ontology sources
  • Key constraints not used during standard query
    processing but are used for keyword queries

25
Integration Engines Information Model
  • BIRN (contd.)
  • The 3-party power-play
  • Many integrated views used by several global
    schemata on a relatively fixed set of sources
  • Ontologies are used in two ways
  • A global view may be defined using ontology
    functions
  • Keyword queries use simple ontological
    relationships
  • Some terms in the global schema mapped to
    ontologies through semantic typing
  • Otherwise the global schema and integrated views
    are independent from the ontology
  • Some data are warped to a common atlas coordinate
    systems to enable atlas queries
  • Atlas mapping spatial annotation

26
Integration Engines Information Model
  • BIRN Integration architecture
  • Gateway
  • has XML API for source registration, source
    schema update
  • Has XML API for queries
  • Can be accessed as web service
  • Registry
  • API-based access to schema elements and view
    definitions
  • Implemented over MySQL for portability
  • Spatial registry for image data
  • Planner and Executor
  • Described later
  • Wrappers
  • Local and remote
  • OTIS
  • Inverted index for ontological terms

Atlas Client
Onto Client
Query Client
Ontological Query Processor
Atlas Query Processor
OTIS
Spatial Registry
Mediator
Data Grid Access
Wrapper Access
27
BIRN Tool Source Registration
28
Information Engines Information Model
  • GEON
  • Sources from the Integration Engines Viewpoint
  • Metadata (Item-level information) maintained in a
    GEON standard called ADN (Alexandria-Delese-NASA)
  • Item-detail level information is either any
    relationalizable data or shapefiles
  • Any WMS, WFS service is a valid source for map
    information management
  • Does not permit an external ontology source, all
    ontologies have to be defined in the GEON
    framework
  • Integration Model
  • Every source schema is registered to an ontology

29
Integration Engines Information Model
  • 3-party power play
  • Several global schemata can be defined
  • A global schema IS the OWL-DL compliant ontology
  • A couple of consequences
  • All transitive closure information is
    pre-computed after registration
  • If a concept class have key constraints,
    subsumption is NEXP-Time hard, and undecidable if
    the key constraint has a complex domain
  • Does not matter much in practice because
    subsumption is hardly computed
  • Pragmatics
  • As new sources join, or new applications are
    attempted, the ontology needs to evolve

30
Geon Data Registration
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
31
Registration of Item Detail
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES
A.K.Sinha, Virginia Tech, 2005
32
ODAL (Ontological Database Annotation Language)
  • Create a partial model of ontologies from
    database
  • Independent on any GUI
  • Independent on any concrete implementations
  • reusable

The values in the column ssID of the table
Samples, RockTexture, RockGeoChemistry,
ModalData,MineralChemistry and Images represent
instances of RockSample
33
ODAL Import Ontologies
  • The Ontologies used for annotating a database can
    be imported as follows

lt?xml version"1.0"?gt ltodalODAL xmlnsrdf
http//www.w3.org/1999/02/22-rdf-syntax-ns
xmlnsowl"http//www.w3.org/20
02/07/owl" xmlnsodal
http//www.sdsc.edu/odal gt ltodalOntologygt
ltodalImports rdfresource"http//www.library.o
rg/Book.owl"/gt ltodalImports
rdfresource"http//www.writer.org/Writer.owl"/gt
lt/odalOntologygt lt/odalODALgt
34
ODAL Database Connection Declaration
  • The target database for making annotation is
    declared as follows

lt?xml version"1.0"?gt ltodalODAL xmlnsrdf
http//www.w3.org/1999/02/22-rdf-syntax-ns
xmlnsowl"http//www.w3.org/2002/
07/owl" xmlnsodal
http//www.sdsc.edu/odal gt ltodalDatabase
odalid"PublicationDatabase"gt
ltodalDatabaseProductNamegtOracleltodalDatabaseProd
uctNamegt ltodalDatabaseProductVersiongt9.1.21lto
dalDatabaseProductVersiongt
ltodalHostgtoracle.sdsc.edult/odalHostgt
ltodalPortgt3456lt/odalPortgt
ltodalDatabaseNamegtPublicationslt/odalDatabaseName
gt lt/odalDatabasegt lt/odalODALgt
35
ODAL Simple Named Individuals
Suppose the book ontology contains a class Book
and the schema Collection contains a table
book-price with a column ISBN.
  • ltodalNamedIndividuals odalid"BookInTableBookPr
    ice"

  • odaldatabase"PublicationDatabase" gt
  • ltodalClass odalresource"http//www.amazon.c
    om/Book.owlBook"/gt
  • ltodalSchemagtCollectionslt/odalSchemagt
  • ltodalTablegtbook-pricelt/odalTablegt
  • ltodalColumngtISBNlt/odalColumngt
  • lt/odalNamedIndividualsgt

The statement says that each value in the column
ISBN represents a book individual.
odalid gives a name to the declaration, and
represents the set of the individuals generated
by the statement.
36
ODAL The Names of Individuals
  • ltodalNamedIndividuals odalid"BookInTableBookPr
    ice"

  • odaldatabase"PublicationDatabase" gt
  • ltodalClass odalresource"http//www.amazon.c
    om/Book.owlBook"/gt
  • ltodalSchemagtCollectionslt/odalSchemagt
  • ltodalTablegtbook-pricelt/odalTablegt
  • ltodalColumngtISBNlt/odalColumngt
  • lt/odalNamedIndividualsgt

ISBN
0817313478

Individual Name
(BookInTableBookPrice, PublicationDatabase.Collect
ions.book-price.ISBN0817313478)
37
ODAL Named Individuals from Multiple Columns
Suppose an ontology contains a class Location and
a database table Rock-Sample with two columns
Latitude and Longitude.
  • ltodalNamedIndividuals odalid"LocationInTableRoc
    kSample" gt
  • ltodalClass odalresource"http//www.usgs.o
    rg/Space.owlLocation"/gt
  • ltodalSchemagtCalifornialt/odalSchemagt
  • ltodalTablegtRock-Samplelt/odalTablegt
  • ltodalColumngtLatitudelt/odalColumngt
  • ltodalColumngtLongitudelt/odalColumngt
  • lt/odalNamedIndividualsgt

The statement says that a pair of latitude and
longitude gives a location
38
ODAL Named Individuals with Conditions
ltodalNamedIndividuals odalid"MaleEmployeeInTabl
eEmployee" gt ltodalClass odalresource"http/
/www.abc.com/Employee.owlMaleEmployee"/gt
ltodalTablegtemployeelt/odalTablegt
ltodalColumngtEmployeeIdlt/odalColumngt
ltodalConditiongtlt!CDATA GenderM
gtlt/odalConditiongt lt/odalNamedIndividualsgt ltod
alNamedIndividuals odalid"FemaleEmployeeInTable
Employee" gt ltodalClass odalresource"http//
www.abc.com/EmployeeFemaleEmployee"/gt
ltodalTablegtemployeelt/odalTablegt
ltodalColumngtEmployeeIdlt/odalColumngt
ltodalConditiongtlt!CDATA GenderF
gtlt/odalConditiongt lt/odalNamedIndividualsgt
A condition in an odalCondition element should
be a Boolean expression which is valid to be used
in any WHERE clauses of SQL queries
39
ODAL Data Type Property Declaration
Person

age

SSN


8

123-56-7890

hasAge
posInt
ltodalNamedIndividuals odalid"PersonInTablePers
on" gt ltodalClass odalresource"http//www.
foo.org/Person.owlPerson"/gt
ltodalTablegtPersonlt/odalTablegt
ltodalColumngtssnlt/odalColumngt lt/odalNamedIndivid
ualsgt ltodalOntologyPropertygt
ltodalDatatypeProperty odalresource"http//www.f
oo.org/Person.owlhasAge"/gt
ltodalTablegtpersonlt/odalTablegt ltodalDomain
odalresource"PersonInTablePerson" /gt
ltodalRange odalresource"age"
/gt lt/odalOntologyPropertygt
40
Conditions for Joining Individuals from Different
Resources
  • Usually we dont make join on individuals cross
    different resources
  • A set of datatype properties can be declared as a
    key for a class in the ontology. We do join cross
    multiple resources based on keys.
  • e.g. hasLatitude, hasLongitude can
    be declared as a key of Location
  • Two locations from
    different resources are same if they have the
    same
  • latitude and longitude

Rock
RockSampleID
10001

RockID
10001

We dont know whether 10001 represents the same
rock in the two resources. By default, we assume
they are not.
41
The Architecture of GEON Semantic Mediator
Oracle
DB2
MySQL
SQL Server
PostgreSQL
PostGIS
Query Execution
Query Optimization
Query Planning
Internal Database
SQL Parser
Spatial SQL against federal schemas
Mediator JDBC Driver
SOQL Parser
Semantic Query Rewriter
SOQL
Ontology Reasoner
ODAL Processor
GUI
Portal or Application
OWL
ODAL
SOQL Processor
42
The Map Integration Architecture
43
Map Integration
44
Integration Engines Information Model
  • PAKT (briefly)
  • Type extensibility of the mediator
  • Nested relational query language extended by tree
    and a restricted set of graph pattern operations
  • Construction operations important
  • Passive extensibility
  • Source more powerful than the mediator
  • Source exports a set of type-based optimization
    rules to the mediator
  • Active extensibility
  • Mediator extends its set of interpreted types
  • Ontology management
  • Ontological queries processed by a separate
    co-processor that interoperates with mediator
  • Query planner partitions the query into
    ontological and mediated query processors

45
Query Paradigms
  • What are the different kinds of queries
    scientists and applications pose to an integrated
    system?
  • Metadata-based file access
  • 21,038 raw image files per subject
  • 2.4 GB of raw image data per subject
  • 25 GB to 40 GB of processed image data per
    subject
  • 10 million slices of functional imaging data in
    Phase II
  • 7 Terabytes of image data for all of the Phase II
    analyses
  • (conservative estimate of 25 GB/subject)
  • Ontologically supported mediated queries
  • Find most recent FMRI data of all patients with
    low scores in working memory tasks having
    volumetric changes of hippocampus over 10 in 2
    years
  • Keyword queries
  • FMRI working memory task hippocampus
  • Ontologically supported keyword queries
  • Associative searches

46
GEON SOQL (Simple Ontology Query Language)
  • Query single or integrated resources
  • via ontologies (i.e., high level logical views)
  • independent on any physical presentation (i.e.
    schemas)

47
Question Finding all seismic stations within 1
mile from railroads
SELECT X2.stationcode, X2.lat, X2.lon FROM
stationdatatable X2 WHERE bounding box condition

48
BIRN A Functional View of the Mediation Process
Planner Execution Engine
Query Expression (UCQ Nesting Grouping
Aggregate)
Pre-Executable Plan
Executable Plan
Flattening of Nested Queries
Post-processing aggregate
Execution Control
View Unfolding
Normalization to DNF
Result Building
Predicate Reordering (binding patterns maximal
chunk)
Result Reporting
Maximal Feasible Plan
Algebraic Plan
Cost/Selectivity-based Optimization
Pre-Executable Plan
49
View Definition and Query Language
  • Union of conjunctive queries
  • May contain function term
  • Expressed in XML Datalog with aggregated
    functions
  • Query q(X,F(Y))-r1(X,Z),r2(Z,Y), - where F(Y)
    aggregate function operated on set of Y and X
    group-by variables.
  • Planner and Executor translate this to
  • q(X,Y)-r1(X,Z),r2(Z,Y)
  • q(X,W)-F(gb(q(X,Y))
  • Where group-by gb function with aggregate
    function F pushed to data source whenever
    possible or evaluate at Mediator.
  • Query Language allows for nested query inner
    queries are assigned to intermediate variables
    that are used by main query

50
BIRN Mapping Relations
  • Ontology Mapping -maps data values from a source
    to an ontology term of a known ontology (UMLS)
  • Joinable relation pairs attributes from different
    relations
  • Value-Map maps mediator-supported data value to
    source supported (for example gender 0/1 at
    some source is male/female for mediator)

51
Processing Ontological Queries
Courtesy Vadim Astakhov
52
PAKT Spatial and Taxonomic Queries
53
Example Queries
OBIS
OBIS
WOA
Geo-Spatial
Biological
Geo-Spatial
Biological
Physiochemical
Q1 where is species X found?
OBIS(scientific_name,lat,long)
Q3 where is species X found given certain
physical parameter? OBIS(scientific_name,la
t,long) WOA(physio,lat,long)
Q2 for a given polygon, what species are found?
OBIS(scientific_name,m_lat,m_long,m_lat,m_lo
ng)
Q4 what are the aggregated physical properties
of species X? OBIS(scientific_name,lat,long
) WOA(physio,lat,long)
Italics input Underline output
OBIS
WOA
extended
Geo-Spatial
Geo-Spatial
Biological
Physiochemical
Benth_Hab
Habitat
Benth_Hab
Habitat
Q5 where is habitat X found?
Q7 where is habitat X found given certain
physical parameter?
CMECS(habitat,physio)
BH(habitat_grp,shape)
BH(habitat_grp,shape)
WOA(physio,lat,long)
CMECS(habitat,physio)
Q6 for a given polygon A, what habitats are
found?
CMECS(habitat,physio)
BH(habitat_grp,shape)
PolygonA
Q8 what are the aggregated physical properties
of habitat X?
BH(habitat_grp,shape)
WOA(physio,lat,long)
CMECS(habitat,physio)
Q9 what species can be found at habitat X?
CMECS(habitat,physio)
BH(habitat_grp,shape)
OBIS(scientific_name,lat,long)
Q10 what habitats is a species X found at ?
CMECS(habitat,physio)
BH(habitat_grp,shape)
OBIS(scientific_name,lat,long)
54
Frequent Query Patterns
  • Example queries are joins of
  • Left query patterns habitat-spatial, and
  • Right query patterns spatial-environmental/specie
    s distribution

BH(..,shape)
WOA(physio,lat,long)
(
)
PolygonA
BH(..,shape)
OBIS(scientific_name,lat,long)
BH(..,shape)
WOA(physio,lat,long)
BH(..,shape)
OBIS(scientific_name,lat,long)
Mediators queries
Onto-modules queries
API
55
The Resource Management Aspect of Query
Evaluation
node 5
DQP
  • Primarily done by the Manchester group (Watson et
    al)
  • Polar
  • Based on OQL (internally monoid comprehension)
  • Multi-node planning
  • Plan partitioning
  • Exchange operator
  • Attribute sensitivity
  • Data index repartitioning
  • Plan scheduling
  • Query execution

reduce
node 4
node 3
DQP
DQP
join (A1,B1)
join (A2,B2)
node 1
node 2
DQP
DQP
scan (A)
scan (B)
OGSA-DAI
OGSA-DAI
DBMS
DBMS
data
data
From Amy Krause
56
The Adaptivity Issue in DQP on a Grid
  • Monitoring-Assessment-Response framework of
    adaptive query processing in a grid (by Gounaris)
  • Monitoring
  • a separate module that keeps track of information
    like
  • Has a resource (e.g., memory availability)
    changed more than 10?
  • Has the data volume changed recently?
  • Occurs between operators or within an operators
    execution process
  • Other modules subscribe to this notification
  • Assessment
  • Diagnosis is carried out for suboptimal
    execution, resource shortage, resource idleness,
    unmet performance requirements, unmet user needs
  • Response
  • Operator replacement ore rescheduling, machine
    rescheduling, plan re-optimization

57
Commonalities and Complementarities
  • Common themes
  • Overall architectural similarity of
    cyberinfrastructure projects
  • Service orientation
  • The data integration task is part of a larger
    scientific computing, exploration and analysis
    process
  • Has impact on integration setting, design
    decisions and performance expectations
  • Mediation with semantic mapping and reasoning
    seems to be winning
  • Complementary approaches
  • Details of the architecture
  • Relationship with workflows
  • Styles of mediation
  • Extensibility of mediator
  • Adaptivity of query planning and evaluation

58
Thank you!
  • Questions? Comments? Integrated Queries?
Write a Comment
User Comments (0)
About PowerShow.com