Title: Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Interoperability in Federated
1Semantic Interoperability in Infocosm Beyond
Infrastructural and Data Interoperability in
Federated Information SystemsKeynote
TalkInternational Conference on Interoperating
Geographic Systems (Interop97), Santa Barbara,
December 3-4 1997Amit ShethLarge Scale
Distributed Information Systems LabUniversity of
Georgiahttp//lsdis.cs.uga.eduThanks Vipul
Kashyap, Kshitij Shah
2Three perspectives
- Information Integration PerspectiveDistribution,
Heterogeneity, Autonomy
- Information Brokering PerspectiveData,
Metadata, Semantic (Terminological, Contextual)
- Vision Perspective ConnectivityComputation,
Information, Knowledge
3Evolving targets and approaches in integrating
data and information a personal perspective
Infocosm
Generation 3
Generation 2
Generation 1
4Generation I
- Data recognized as corporate resource -- leverage
it! - Most data in structured databases (and the rest
in files), different data models, transitioning
from Network and Hierarchical to Relational DBMSs - Connectivity/access -- a major issue
- Heterogeneity (system, modeling and schematic) as
well as need to support autonomy posed main
challenges - Support for corporate IS applications as the
primary objective, update often required, data
integrity important
5Generation II
- Significant improvements in computing and
connectivity (standardization of protocol, public
network, Internet/Web) remote data access as
given - Increasing diversity in data formats, with focus
on variety of textual data and semi-structured
documents (and lesser focus on structured data) - Many more data sources, diverse domains, but not
necessarily better understanding of data - Use of data beyond traditional business
applications -- mining warehousing, marketing,
commerce
6Generation II
- Query only, little attention to updates
extensive use of IR techniques - Focus shift from data to metadata earlier,
distribution applied to data only, now it also
applies to metadata - Wrapper part of Mediator Architecture, Metadata
component of Information Brokering Architecture - Early work on ontology support
Gio Wiederhold
7Generation III
- Increasing information overload
- Changes in Web architecture push,
- Broader variety of content with increasing amount
of visual information - Continued standardization related to Web for
representational and metadata issues (MCF, RDF,
XML) and distributed computing (CORBA, Java) - Not just metadata, logical correlation
- Users demand simplicity, but complexities
continue to rise
8Generation III (contd)
- Broader variety of users and applications well
beyond business and scientific uses (e.g.,
focused marketing-- more than information on the
web) - Not just data access, but decision support
through data mining and information discovery,
information fusion, information dissemination,
knowledge creation and management, information
management complemented by cooperation between
the information system and humans
9Generation Iand Lessons from the Federated
Database Systems Research
10Dimensions for interoperability and integration
Perspective used for Federated Databases
11FDBS Schema Architecture
- Model Heterogeneity Common/Canonical Data
Model Schema Translation - Information Sharing while preserving Autonomy
schema integration
schema translation
12Heterogeneity in FDBMSs
- Database System
- Semantic Heterogeneity
- Differences in DBMS
- data models (abstractions, constraints, query
languages) - System level support (concurrency control,
commit, recovery)
1980s
- Operating System
- file system
- naming, file types, operation
- transaction support
- IPC
C o m m u n i c a t i o n
1970s
- Hardware/System
- instruction set
- data representation/coding
- configuration
13Characterization of Schematic Conflictsin
Multidatabase Systems
14Observations and Lessons Learnt
- tightly coupled vs loosely coupled debate
- good common data model debate
- tightly coupled harder to build, but can give
better control over data sharing, provide more
transparent access, and can possibly support
update lessons learned in schema integration can
be reapplied in newer situations - loosely coupled more flexible, but generally
require more user involvement
15Retracing the path without learning from past
expeditions
- Steps for transitioning from Data Marts to
Warehouses - Create consistent dimensions in the data marts
- Create a data warehouse data model and convert
data marts to it - Go back and build an enterprise data warehouse,
then convert data marts to the new common data
model and architectures - The above is doomed to repeat past mistakes.
Integrating metadata is not easy!
PC Week, November 24 , 1997
16(No Transcript)
17Generation 1 concern So far (schematically),
yet so near (semantically)!
Generation 3 concern So near (schematically),
yet so far (semantically)!
18Generation IIandGeneration III
19Information Brokering A Three-Level Approach
Top Down
Semantic (Domain, Application specific)
Ontology
used-by
used-by
Metadata
Content
Emphasis from Gen.I to Gen.III
(content descriptions, intentional)
abstracted-into
abstracted-into
Data
Representation
Bottom Up
(heterogeneous types, media)
20An Architecture for Information Brokering
INFORMATION BROKERING
Data Brokering (CORBA, HTTP, IIOP)
Information System 1
Information System N
21Generation 2Limited Types of Metadata,Extractor
s,Mappers,Wrappers
22Global/Enterprise Web Repositories
Generation 2
23Gen.2
Junglee
24- Find Marketing Manager positions in a company
that is within 15 miles of San Francisco and
whose stock price has been growing at a rate of
at least 25 per year over the last three years
Junglee, SIGMOD Record, Dec. 1997
25Extractors
- can automatically identify data/media type
- can be extended at any time (pre-specified
or parameterized routines) - can run at data source, metadata storage site or
at IQ server - can run at pre-specified times or events, or on
demand - can route metadata to appropriate metadatabase
repositories - Extractors use agent networking computing (NC)
technologies and are implemented in PERL/ Java
26A Classification of Metadata
- Content Independent Metadata e.g. creation-date,
location, ... - Content Dependent Metadata e.g. size, number of
colors in an image - Content-(directly)based Metadata e.g. inverted
lists, doc vectors - Content-descriptive Metadata
- Domain Independent (structural) Metadata
- e.g. parse tree of a C program,
HTML/SGML DTDs - Domain Specific Metadatascale, coordinate,
land-cover, relief (GIS Domain), area,
population (Census Domain), concept descriptions
from Domain Specific Ontologies
Move in this direction to tackle information
overload !!
27Query Processing andInformation Requests
- traditional queries based on keywords
- attribute-based queries
- content-based queries
- 'high-level' information requests involving
ontology-based, iconic, mixed-media, and
media-independent information requests - user selected ontology, use of profile
Generation 2
Generation 3
E.g., Kabilas political activities (in all
media)
28VisualHarness
.
.
29Metadata Brokering in VisualHarness
30VisualHarness - An Example
31What else can Information Brokering do?
32- WWW
- A confusing heterogeneity of media, formats
(Tower of Babel) - Information correlation using physical (HREF)
links at the extensional data level -
- Location dependent browsing of information using
physical (HREF) links gt User has to keep track
of information content !!
- WWWInformation Brokering
- Domain Specific Ontologies as semantic
conceptual views - Information correlation using concept mappings at
the intensional concept level - Browsing of information using terminological
relationships across ontologiesgt Higher level
of abstraction, closer to user view of
information !!
33Ontologies for semantic interchange
- Need for transcending local subject
areas/domains gt Design Adaptable systems which
adapt/adjust themselves in the face of
vocabularies from different domains - Coordination and interrelation of models across
domainsOne approach gt utilize terminological
relationships across concepts in ontologies - Specification languages for ontologies
- Description Logics, Rule-based Languages
- Support for mechanisms for Coordination and
Correlation, viz., representation and reasoning
with terminological relationships
34The InfoQuilt Project
http//lsdis.cs.uga.edu/infoquilt
35Correlating Data on the Web today
- ltTITLEgt A Scenic Sunset at Lake Tahoe lt/TITLEgt
- ltpgt
- Lake Tahoe is a popular tourist spot and ltA HREF
http//www1.server.edu/lake_tahoe.txtgtsome
interesting factslt/Agt are available here. The
scenic beauty of Lake Tahoe can be viewed in this
photographltcentergtltIMG SRChttp//www2.server.
edu/lake_tahoe.imggtlt/centergt
Correlation achieved by using physical links Done
manually by user publishing the HTML document
36MREFMetadata Reference Link -- complementing HREF
- Creating logical web through
- Media Independent Metadata based Correlation
37Metadata Reference Link (ltA MREF gt)
- ltA HREFURLgtDocument Descriptionlt/Agt
- physical link between document (components)
- ltA MREF KEYWORDSltlist-of-keywordsgt
THRESHltrealgtgtDocument Descriptionlt/Agt - ltA MREF ATTRIBUTES(ltlist-of-attribute-value-pairsgt
)gtDocument Descriptionlt/Agt - ltA MREF(ltparameterized_routine(.)gt Document
Descriptionlt/Agt
38 Correlation based on
Content-descriptive Metadata
Some interesting ltA MREF KEYWORDSscenic
waterfall mountain THRESH 0.9gtinformation on
scenic waterfallslt/Agt is available here.
39Correlation based on Content-based Metadata
Some interesting ltA MREF KEYWORDS scenic
waterfalls THRESH 0.9 ATTRIBUTES
(major-color blue)gt information on scenic
waterfallslt/Agt is available here.
height, width and size
40Metadata,Domain Specific Ontologies
Get the titles, authors, documents, maps
published by the United States Geological
Service (USGS) about regions having a population
greater than 5000, area greater than 1000
acres having a low density urban area land cover
domain specific metadata terms chosen from
domain specific ontologies
What is Metadata ?
What are Ontologies ?
- collection of terms, definitions and their
interrelationships - specification of a
representational vocabulary for a shared
domain of discourse
- data/information about data - useful/derived
properties of media - properties/relationships
between objects
41Repositories and the Media Types
Population Area
Boundaries
Land cover Relief
Image Features (image processing routines)
Regions (SQL)
Boundaries
TIGER/Line DB
Image/Map DB
Census DB
42 Domain Specific Correlation
- Potential locations for a future shopping mall
identified by all regions having a population
greater than 500 and area greater than 50 sq ft
having an urban land cover and moderate relief ltA
MREF ATTRIBUTES(population gt 500 area gt 50
region-type block land-cover urban
relief moderate)gtcan be viewed herelt/Agt - gt media-independent relationships between domain
specific metadata population, area, land cover
relief - gt correlation between image and structured data
at a higher domain specific level as opposed to
physical link-chasing in the WWW -
43(No Transcript)
44(No Transcript)
45InfoQuilt Architecture (partial)
Media Independent Information Requests Browsing
Collections, Keyword-based queries, Attribute-base
d queries
Domain Knowledge
IQR Metadata Domain Knowledge Repository and
Registry
Correlation Server
KnowledgeBase
Parameterized Routines
Attr. Metadata
loc, type, author
Indices
InfoQuilt Server
Media and Domain specific Extractor Agents
Other InfoQuilt Servers
...
Wrapper
Wrapper
Wrapper
Text, Image, Audio, Video media repositories
46What next (after comprehensive use of metadata) ?
- Context, context, context
- Semantic Proximity
- domain
- context
- modeling/abstraction/representation
- state
- Characterizing Loss of Information incurred due
to differences in vocabulary
BIG challenge identifying relationship
or similarity between objects of different media,
developed and managed by different persons and
systems
47A Semantic Taxonomy
Semantic Proximity
Semantic Incompatibility
Semantic Resemblance
Semantic Relevance
Semantic Relationship
Semantic Equivalence
48Tools to support semantics
profiles
ontologies
context
domain-specific metadata
49Decision
Knowledge
Information
Cooperation
Data
Interoperability
Computing
Communication
Connectivity and Data Access
50Interoperability in the 80s
Decision
System level interoperability like TCP/IP.
Standard communication channels, data exchange
formats, etc. Basic infrastructural work for
higher level interoperability.
Knowledge
Information
Data
Cooperation
Computing
Interoperability
Communication
Connectivity
HTTP, IIOP, TCP/IP
51Interoperability in the 90s
Decision
Information level interoperability. Standards
evolve that go beyond connectivity and define
information standards. Systems start exchanging
metadata (MCF,RDF,..).
Knowledge
Information
Data
Cooperation
Business Objects, CORBA, DCOM, EDI
Computing
Interoperability
Communication
Connectivity
52Where we are headed
Semantic interoperability where systems share
ontologies and knowledge.
Knowledge
Information
Systems and human can cooperate in decision
making and can generate new knowledge as a
collective entity.
Cooperation
Data
Computing
Interoperability
Communication
Connectivity
53Cognition
Learning
Introspection
Heuristics
Deduction
Semantics
KNOWLEDGE
54Cooperative Information Systems
Collaboration
Coordination
- Video Conferencing
- Whiteboarding
- Application sharing
Collective exploitation of complementary
technologies
InformationManagement
55Infocosm
Knowledge
Cooperating Information Systems
Information Interoperablity
Data
Computing
Communication
56Summary
- We have addressed many data level (schematic,
representational,) issues so far - We are in a good position to solve additional
issues using metadata level need to support
domain-specific metadata and media-independent
information requests, qualified by use of
ontologies - some challenges remain e.g., consistency of
metadata
57Agenda for Research
- Interoperation not at systems level, but at
informational and possibly knowledge level - traditional database and information retrieval
solutions do not suffice - need to understand context measures of
similarities - Need to increase impetus on semantic level issues
involving terminological and contextual
differences, possible perceptual or cognitive
differences in future - information systems and humans need to cooperate,
possible involving a coordination and
collaborative processes
58http//lsdis.cs.uga.eduSee publications on
Metadata, Semantics, InfoHarness/InfoQuiltamit_at_c
s.uga.edu