Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Interoperability in Federated - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Interoperability in Federated

Description:

Mermaid. DDTS. Multibase, MRDSM, ADDS, IISS, Omnibase, ... Early 80s. Infoscopes, ... Query only, little attention to updates; extensive use of IR techniques ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 57
Provided by: sue8168
Category:

less

Transcript and Presenter's Notes

Title: Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Interoperability in Federated


1
Semantic Interoperability in Infocosm Beyond
Infrastructural and Data Interoperability in
Federated Information SystemsKeynote
TalkInternational Conference on Interoperating
Geographic Systems (Interop97), Santa Barbara,
December 3-4 1997Amit ShethLarge Scale
Distributed Information Systems LabUniversity of
Georgiahttp//lsdis.cs.uga.eduThanks Vipul
Kashyap, Kshitij Shah
2
Three perspectives
  • Information Integration PerspectiveDistribution,
    Heterogeneity, Autonomy
  • Information Brokering PerspectiveData,
    Metadata, Semantic (Terminological, Contextual)
  • Vision Perspective ConnectivityComputation,
    Information, Knowledge

3
Evolving targets and approaches in integrating
data and information a personal perspective
Infocosm

Generation 3

Generation 2
Generation 1
4
Generation I
  • Data recognized as corporate resource -- leverage
    it!
  • Most data in structured databases (and the rest
    in files), different data models, transitioning
    from Network and Hierarchical to Relational DBMSs
  • Connectivity/access -- a major issue
  • Heterogeneity (system, modeling and schematic) as
    well as need to support autonomy posed main
    challenges
  • Support for corporate IS applications as the
    primary objective, update often required, data
    integrity important

5
Generation II
  • Significant improvements in computing and
    connectivity (standardization of protocol, public
    network, Internet/Web) remote data access as
    given
  • Increasing diversity in data formats, with focus
    on variety of textual data and semi-structured
    documents (and lesser focus on structured data)
  • Many more data sources, diverse domains, but not
    necessarily better understanding of data
  • Use of data beyond traditional business
    applications -- mining warehousing, marketing,
    commerce

6
Generation II
  • Query only, little attention to updates
    extensive use of IR techniques
  • Focus shift from data to metadata earlier,
    distribution applied to data only, now it also
    applies to metadata
  • Wrapper part of Mediator Architecture, Metadata
    component of Information Brokering Architecture
  • Early work on ontology support

Gio Wiederhold
7
Generation III
  • Increasing information overload
  • Changes in Web architecture push,
  • Broader variety of content with increasing amount
    of visual information
  • Continued standardization related to Web for
    representational and metadata issues (MCF, RDF,
    XML) and distributed computing (CORBA, Java)
  • Not just metadata, logical correlation
  • Users demand simplicity, but complexities
    continue to rise

8
Generation III (contd)
  • Broader variety of users and applications well
    beyond business and scientific uses (e.g.,
    focused marketing-- more than information on the
    web)
  • Not just data access, but decision support
    through data mining and information discovery,
    information fusion, information dissemination,
    knowledge creation and management, information
    management complemented by cooperation between
    the information system and humans

9
Generation Iand Lessons from the Federated
Database Systems Research
10
Dimensions for interoperability and integration
Perspective used for Federated Databases
11
FDBS Schema Architecture
  • Model Heterogeneity Common/Canonical Data
    Model Schema Translation
  • Information Sharing while preserving Autonomy

schema integration
schema translation
12
Heterogeneity in FDBMSs
  • Database System
  • Semantic Heterogeneity
  • Differences in DBMS
  • data models (abstractions, constraints, query
    languages)
  • System level support (concurrency control,
    commit, recovery)

1980s
  • Operating System
  • file system
  • naming, file types, operation
  • transaction support
  • IPC

C o m m u n i c a t i o n
1970s
  • Hardware/System
  • instruction set
  • data representation/coding
  • configuration

13
Characterization of Schematic Conflictsin
Multidatabase Systems
14
Observations and Lessons Learnt
  • tightly coupled vs loosely coupled debate
  • good common data model debate
  • tightly coupled harder to build, but can give
    better control over data sharing, provide more
    transparent access, and can possibly support
    update lessons learned in schema integration can
    be reapplied in newer situations
  • loosely coupled more flexible, but generally
    require more user involvement

15
Retracing the path without learning from past
expeditions
  • Steps for transitioning from Data Marts to
    Warehouses
  • Create consistent dimensions in the data marts
  • Create a data warehouse data model and convert
    data marts to it
  • Go back and build an enterprise data warehouse,
    then convert data marts to the new common data
    model and architectures
  • The above is doomed to repeat past mistakes.
    Integrating metadata is not easy!

PC Week, November 24 , 1997
16
(No Transcript)
17
Generation 1 concern So far (schematically),
yet so near (semantically)!
Generation 3 concern So near (schematically),
yet so far (semantically)!
18
Generation IIandGeneration III
19
Information Brokering A Three-Level Approach
Top Down
Semantic (Domain, Application specific)
Ontology
used-by
used-by
Metadata
Content
Emphasis from Gen.I to Gen.III
(content descriptions, intentional)
abstracted-into
abstracted-into
Data
Representation
Bottom Up
(heterogeneous types, media)
20
An Architecture for Information Brokering
INFORMATION BROKERING
Data Brokering (CORBA, HTTP, IIOP)
Information System 1
Information System N
21
Generation 2Limited Types of Metadata,Extractor
s,Mappers,Wrappers
22
Global/Enterprise Web Repositories
Generation 2
23
Gen.2
Junglee
24
  • Find Marketing Manager positions in a company
    that is within 15 miles of San Francisco and
    whose stock price has been growing at a rate of
    at least 25 per year over the last three years

Junglee, SIGMOD Record, Dec. 1997
25
Extractors
  • can automatically identify data/media type
  • can be extended at any time (pre-specified
    or parameterized routines)
  • can run at data source, metadata storage site or
    at IQ server
  • can run at pre-specified times or events, or on
    demand
  • can route metadata to appropriate metadatabase
    repositories
  • Extractors use agent networking computing (NC)
    technologies and are implemented in PERL/ Java

26
A Classification of Metadata
  • Content Independent Metadata e.g. creation-date,
    location, ...
  • Content Dependent Metadata e.g. size, number of
    colors in an image
  • Content-(directly)based Metadata e.g. inverted
    lists, doc vectors
  • Content-descriptive Metadata
  • Domain Independent (structural) Metadata
  • e.g. parse tree of a C program,
    HTML/SGML DTDs
  • Domain Specific Metadatascale, coordinate,
    land-cover, relief (GIS Domain), area,
    population (Census Domain), concept descriptions
    from Domain Specific Ontologies

Move in this direction to tackle information
overload !!
27
Query Processing andInformation Requests
  • traditional queries based on keywords
  • attribute-based queries
  • content-based queries
  • 'high-level' information requests involving
    ontology-based, iconic, mixed-media, and
    media-independent information requests
  • user selected ontology, use of profile

Generation 2
Generation 3
E.g., Kabilas political activities (in all
media)
28
VisualHarness
.
.
29
Metadata Brokering in VisualHarness
30
VisualHarness - An Example
31
What else can Information Brokering do?
32
  • WWW
  • A confusing heterogeneity of media, formats
    (Tower of Babel)
  • Information correlation using physical (HREF)
    links at the extensional data level
  • Location dependent browsing of information using
    physical (HREF) links gt User has to keep track
    of information content !!
  • WWWInformation Brokering
  • Domain Specific Ontologies as semantic
    conceptual views
  • Information correlation using concept mappings at
    the intensional concept level
  • Browsing of information using terminological
    relationships across ontologiesgt Higher level
    of abstraction, closer to user view of
    information !!

33
Ontologies for semantic interchange
  • Need for transcending local subject
    areas/domains gt Design Adaptable systems which
    adapt/adjust themselves in the face of
    vocabularies from different domains
  • Coordination and interrelation of models across
    domainsOne approach gt utilize terminological
    relationships across concepts in ontologies
  • Specification languages for ontologies
  • Description Logics, Rule-based Languages
  • Support for mechanisms for Coordination and
    Correlation, viz., representation and reasoning
    with terminological relationships

34
The InfoQuilt Project
http//lsdis.cs.uga.edu/infoquilt
35
Correlating Data on the Web today
  • ltTITLEgt A Scenic Sunset at Lake Tahoe lt/TITLEgt
  • ltpgt
  • Lake Tahoe is a popular tourist spot and ltA HREF
    http//www1.server.edu/lake_tahoe.txtgtsome
    interesting factslt/Agt are available here. The
    scenic beauty of Lake Tahoe can be viewed in this
    photographltcentergtltIMG SRChttp//www2.server.
    edu/lake_tahoe.imggtlt/centergt

Correlation achieved by using physical links Done
manually by user publishing the HTML document
36
MREFMetadata Reference Link -- complementing HREF
  • Creating logical web through
  • Media Independent Metadata based Correlation

37
Metadata Reference Link (ltA MREF gt)
  • ltA HREFURLgtDocument Descriptionlt/Agt
  • physical link between document (components)
  • ltA MREF KEYWORDSltlist-of-keywordsgt
    THRESHltrealgtgtDocument Descriptionlt/Agt
  • ltA MREF ATTRIBUTES(ltlist-of-attribute-value-pairsgt
    )gtDocument Descriptionlt/Agt
  • ltA MREF(ltparameterized_routine(.)gt Document
    Descriptionlt/Agt

38
Correlation based on
Content-descriptive Metadata
Some interesting ltA MREF KEYWORDSscenic
waterfall mountain THRESH 0.9gtinformation on
scenic waterfallslt/Agt is available here.
39
Correlation based on Content-based Metadata
Some interesting ltA MREF KEYWORDS scenic
waterfalls THRESH 0.9 ATTRIBUTES
(major-color blue)gt information on scenic
waterfallslt/Agt is available here.
height, width and size
40
Metadata,Domain Specific Ontologies
Get the titles, authors, documents, maps
published by the United States Geological
Service (USGS) about regions having a population
greater than 5000, area greater than 1000
acres having a low density urban area land cover
domain specific metadata terms chosen from
domain specific ontologies
What is Metadata ?
What are Ontologies ?
- collection of terms, definitions and their
interrelationships - specification of a
representational vocabulary for a shared
domain of discourse
- data/information about data - useful/derived
properties of media - properties/relationships
between objects
41
Repositories and the Media Types
Population Area
Boundaries
Land cover Relief
Image Features (image processing routines)
Regions (SQL)
Boundaries
TIGER/Line DB
Image/Map DB
Census DB
42
Domain Specific Correlation
  • Potential locations for a future shopping mall
    identified by all regions having a population
    greater than 500 and area greater than 50 sq ft
    having an urban land cover and moderate relief ltA
    MREF ATTRIBUTES(population gt 500 area gt 50
    region-type block land-cover urban
    relief moderate)gtcan be viewed herelt/Agt
  • gt media-independent relationships between domain
    specific metadata population, area, land cover
    relief
  • gt correlation between image and structured data
    at a higher domain specific level as opposed to
    physical link-chasing in the WWW

43
(No Transcript)
44
(No Transcript)
45
InfoQuilt Architecture (partial)
Media Independent Information Requests Browsing
Collections, Keyword-based queries, Attribute-base
d queries

Domain Knowledge
IQR Metadata Domain Knowledge Repository and
Registry
Correlation Server
KnowledgeBase
Parameterized Routines
Attr. Metadata
loc, type, author
Indices
InfoQuilt Server
Media and Domain specific Extractor Agents
Other InfoQuilt Servers
...
Wrapper
Wrapper
Wrapper
Text, Image, Audio, Video media repositories
46
What next (after comprehensive use of metadata) ?
  • Context, context, context
  • Semantic Proximity
  • domain
  • context
  • modeling/abstraction/representation
  • state
  • Characterizing Loss of Information incurred due
    to differences in vocabulary

BIG challenge identifying relationship
or similarity between objects of different media,
developed and managed by different persons and
systems
47
A Semantic Taxonomy
Semantic Proximity
Semantic Incompatibility
Semantic Resemblance
Semantic Relevance
Semantic Relationship
Semantic Equivalence
48
Tools to support semantics
profiles
ontologies
context
domain-specific metadata
49
Decision
Knowledge
Information
Cooperation
Data
Interoperability
Computing
Communication
Connectivity and Data Access
50
Interoperability in the 80s
Decision
System level interoperability like TCP/IP.
Standard communication channels, data exchange
formats, etc. Basic infrastructural work for
higher level interoperability.
Knowledge
Information
Data
Cooperation
Computing
Interoperability
Communication
Connectivity
HTTP, IIOP, TCP/IP
51
Interoperability in the 90s
Decision
Information level interoperability. Standards
evolve that go beyond connectivity and define
information standards. Systems start exchanging
metadata (MCF,RDF,..).
Knowledge
Information
Data
Cooperation
Business Objects, CORBA, DCOM, EDI
Computing
Interoperability
Communication
Connectivity
52
Where we are headed
Semantic interoperability where systems share
ontologies and knowledge.
Knowledge
Information
Systems and human can cooperate in decision
making and can generate new knowledge as a
collective entity.
Cooperation
Data
Computing
Interoperability
Communication
Connectivity
53
Cognition
Learning
Introspection
Heuristics
Deduction
Semantics
KNOWLEDGE
54
Cooperative Information Systems
Collaboration
Coordination
  • Video Conferencing
  • Whiteboarding
  • Application sharing
  • Scheduling
  • Workflow

Collective exploitation of complementary
technologies
InformationManagement
55
Infocosm
Knowledge
Cooperating Information Systems
Information Interoperablity
Data
Computing
Communication
56
Summary
  • We have addressed many data level (schematic,
    representational,) issues so far
  • We are in a good position to solve additional
    issues using metadata level need to support
    domain-specific metadata and media-independent
    information requests, qualified by use of
    ontologies
  • some challenges remain e.g., consistency of
    metadata

57
Agenda for Research
  • Interoperation not at systems level, but at
    informational and possibly knowledge level
  • traditional database and information retrieval
    solutions do not suffice
  • need to understand context measures of
    similarities
  • Need to increase impetus on semantic level issues
    involving terminological and contextual
    differences, possible perceptual or cognitive
    differences in future
  • information systems and humans need to cooperate,
    possible involving a coordination and
    collaborative processes

58
http//lsdis.cs.uga.eduSee publications on
Metadata, Semantics, InfoHarness/InfoQuiltamit_at_c
s.uga.edu
Write a Comment
User Comments (0)
About PowerShow.com