A System to Integrate Distributed Sources of Heterogeneous Scientific Information - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

A System to Integrate Distributed Sources of Heterogeneous Scientific Information

Description:

SC2000, Dallas, TX. 2. A Standard Mediator Architecture ... SC2000, Dallas, TX. 4. Integration Issues: Mediating across Multiple-Worlds ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 26
Provided by: npa5
Category:

less

Transcript and Presenter's Notes

Title: A System to Integrate Distributed Sources of Heterogeneous Scientific Information


1
A System to Integrate Distributed Sources of
Heterogeneous Scientific Information
  • Amarnath Gupta, SDSC
  • Bertram Ludaescher, SDSC
  • Maryann E. Martone, NCMIR
  • Ilya Zaslavsky SDSC
  • University of California, San Diego

2
A Standard Mediator Architecture (MIX --
Mediation of Information using XML)
USER-Query
XML Q/A
INTEGRATED VIEW
MIX MEDIATOR
XML Integrated View Definition
XML Q/A
XML Q/A
Wrapper
Wrapper
Wrapper
Files
Lab1
Lab2
Lab3
Data Sources
3
Integration Issues
4
Integration Issues Mediating across
Multiple-Worlds
  • Structural Integration
  • gt common semistructured data model (XML)
  • gt XML queries transformations to resolve
    schema conflicts
  • Limited Query Capabilities
  • gt mediator is aware of QCs exported by wrappers
  • ...
  • Semantic Integration
  • most work deals with issues for one-world
    scenarios (e.g., amazon.com and bn.com)
  • what if data comes from a multiple-world
    scenario (like Neuroscience), where data objects
    from different sources are not even similar, and
    only the hidden semantics (known to the domain
    expert) provides the semantic link?

5
A Neuroscience Question
What is the cerebellar distribution of rat
proteins with more than 70 homology with human
NCS-1? Any structure specificity? How about other
rodents?
??? Integrated View ???
??? Integrated View Definition ???
???Mediator ???
Wrapper
Wrapper
Wrapper
Wrapper
Web
protein localization
morphometry
neurotransmission
CaBP, Expasy
6
Hidden Semantics Protein Localization
  • ltprotein_localizationgt
  • ltneuron typepurkinje cell /gt
  • ltprotein channelredgt
  • ltnamegtRyRlt/gt
  • .
  • lt/proteingt
  • ltregion h_grid_pos1 v_grid_posAgt
  • ltdensitygt
  • ltstructure fraction0.8gt
  • ltnamegtspinelt/gt
  • ltamount nameRyRgt0lt/gt
  • lt/gt
  • ltstructure fraction0.2gt
  • ltnamegtbranchletlt/gt
  • ltamount nameRyRgt30lt/gt
  • lt/gt

7
Hidden Semantics Morphometry
  • ltneuron namepurkinje cellgt
  • ltbranch level10gt
  • ltshaftgt
  • lt/shaftgt
  • ltspine number1gt
  • ltattachment x5.3 y-3.2 z8.7 /gt
  • ltlengthgt12.348lt/gt
  • ltmin_sectiongt1.93lt/gt
  • ltmax_sectiongt4.47lt/gt
  • ltsurface_areagt9.884lt/gt
  • ltvolumegt7.930lt/gt
  • ltheadgt
  • ltwidthgt4.47lt/gt
  • ltlengthgt1.79lt/gt
  • lt/headgt
  • lt/spinegt

8
The Problem
  • Multiple Worlds Integration
  • compatible terms not directly joinable
  • complex, indirect associations among schema
    elements
  • unstated integrity constraints
  • Why not just use Ontologies?
  • typical ontologies associate terms along limited
    number of dimensions
  • Whats needed?
  • a theory under which non-identical terms can be
    semantically joined
  • gt lift mediation to the level of conceptual
    models (CMs)
  • gt domain knowledge, ICs become rules over CMs
  • gt Model-Based Mediation

9
XML-Based vs. Model-Based Mediation
XML Models
10
Extended Mediator Architecture
  • gt Wrappers export Conceptual Models (CMs), i.e.,
    factsrules for classes, relationships, ICs, ...
    )
  • gt Mediator imports CMs (from sources, auxiliary
    knowledge bases, and domain maps (DMs)
  • gt a generic conceptual model (GCM, a subset of
    F-logic), extensible via rules common target CM
    language
  • gt new CMs can be plugged-in by specifying them
    in GCM F-logic rules
  • gt prototype implementation in FLORA
  • global-as-view approach
  • compiler F-logic gt XSB-Prolog
  • top-down evaluation gt virtual (demand-driven)
    views
  • external interfaces (XML, RDBs, DM
    visualization,...)

11
Model-Based Mediator Architecture
USER/Client
CM (Integrated View)
Domain Map DM
Integrated View Definition IVD
CM Plug-In
CM Queries Results (exchanged in XML)
Logic API (capabilities)
12
Definition of Integrated Views ...
  • XML-2-FL and CM-2-FL Translators

lt!ELEMENT Studies (Study)gt lt!ELEMENT Study
(study_id, animal,
experiments, experimentersgt lt!ELEMENT experiments
(experiment)gt lt!ELEMENT experiment (description,
instrument, parameters)gt
studyDBstudies gtgt study. studystudy_id gt
string animal gt animal
experiments gtgt experiment
experimenters gtgt string.
  • Specification of Domain Knowledge
  • Subclasses
  • Rules
  • Integrity Constraints
  • Integrated View Definition

mushroom_spine spine
Smushroom_spine IF Sspinehead?_ neck ?_.
ic1(S)alerttype ? invalid spine object ? S
IF Sspineundef -gtgt head, neck.
protein_distribution(Protein, Organism,
Brain_region, Feature_name, Anatom, Value) IF
Iprotein_label_image proteins -gtgt Protein
organism -gt Organism

anatomical_structures -gtgt ASanatomical_structure
name-gtAnatom, NAEneuro_anatomic_entitynam
e-gtAnatom loccated_in-gtgtBrain_region, AS..seg
ments..featuresname-gtFeature_name
value-gtValue.
13
... Definition of Integrated Views (Multiple
Sources)
  • Creating Mediated Classes
  • Reasoning with Schema

animalM?R IF Ssource, S.animal M?R
. Xtaxon?T IF X PROLAB.animalname ?N,
words(N,W1,W2_), T
TAXON.taxongenus ?W1species ?W2.
union over all classes
At Mediator
subspeciesspeciesgenus kingdomsuperkingd
om
TTR, TRTR1 IF T TAXON.taxonTaxon_Rank
?TR, Taxon_Rank1 ?TR1, Taxon_RankTaxon_Rank1.
Class creation by schema reasoning
14
Model-Based Mediation with DOMAIN MAPS (DMs)
  • Semantic Road Maps for situating source data
  • gt navigational aid (browsing source classes at
    the conceptual level)
  • gt basis for integrated views across multiple
    worlds
  • gt link points (concepts) and labeled arcs
    (roles)
  • gt formal semantics (in FL and/or DLs)
  • Example ANATOM DM
  • antatomical entities (concepts) is_a, has_a,
    overlaps, ... (roles)
  • gt from syntactic equality to semantic joins

LINK(X,Y) X.zip Y.zip X.addr in Y.zip X.zip
overlaps Y.county ...
Integrated-CM(Z1,...) get X1,... from
Src1 get X2,... from Src2 LINK (Xi, Yj) Zj
CM-QL(X1,...,Y1,...)
15
ANATOM Domain Map
ANATOM
16
ANATOM Domain Map with Registered Data
ANATOM DATA
17
Deductive Closure of has_a with tc(is_a)
ANATOM CLOSURE
18
Example Query Evaluation (I)
  • Example protein_distribution
  • given organism, protein, brain_region
  • ANATOM DM
  • recursively traverse the has_a_star paths under
    brain_region collect all anatomical_entities
  • Source PROLAB
  • join with anatomical structures and collect the
    value of attribute image.segments.features.featur
    e.protein_amount where image.segments.features.f
    eature.protein_name protein and
    study_db.study.animal.name organism
  • Mediator
  • aggregate over all parents up to brain_region
  • report distribution

19
Interactive Queries (I)
KIND
20
Example Query Evaluation (II)
"How does the parallel fiber output
(Yale/SENSELAB) relate to the distribution of
Ryanodine Receptors (UCSD/NCMIR)?"
  • _at_SENSELAB X1 select output from parallel
    fiber
  • _at_MEDIATOR X2 hang off X1 from Domain Map
  • _at_MEDIATOR X3 subregion-closure(X2)
  • _at_NCMIR X4 select PROT-data(X3,
    Ryanodine Receptors)
  • _at_MEDIATOR X5 compute aggregate(X4)

21
Interactive Queries (II)
KIND01
22
Resulting Sub DOMAIN MAP Browser
PROTLOC
23
Computed Protein Localization Data
PROTLOC
24
Client-Side Result Visualization(using AxioMap
Viewer Ilya Zaslavsky)
PROTLOC-AxioMap
25
Summary Outlook Federation of Brain Data
PROTLOC
Result (XML/XSLT)
Result (VML)
ANATOM
Write a Comment
User Comments (0)
About PowerShow.com