Title: Reflections on "The Challenge of Environmental Data Interoperability on the Global Information Grid"
1Reflections on "The Challenge of Environmental
Data Interoperability on the Global Information
Grid" by Dobey and Eirich (05S-SIW-133)
- Dr. Dale D. Miller
- Geo-Spatial Technologies, Inc.
- Seattle, WA
- dmiller_at_gsti3d.com
- Dr. Paul A. Birkel
- The MITRE Corporation
- McLean, VA
- pbirkel_at_mitre.org
2Dobey / Eirich Tenets
- Paradigm shift TPED ? TPPU
- Interposed mediating software layer and LDM
for data conversion between schemas - LDM to consist of atomic concepts
- Three schema architecture
Task-Process-Exploit-Disseminate Task-Post-Proc
ess-Use Logical Data Model
3Outline
- Recent Relevant Efforts
- Mappings and Semantic Nuances
- Losslessness and Correctness
- Normalization the Silver Bullet?
- Dynamic Semantic Content Update Implications of
the GIG - Conclusions and Recommendations
4Recent Relevant Efforts
- U.S. Army Geospatial Data Integrated Master Plan
(AGDIMP) - Geospatial Intelligence Database Integration
(GIDI) - Multilateral Interoperability Programme (MIP)
C2IEDM - GDI FACT REDM, DREDM and the Dobey / Eirich
Mediating Layer Environmental Data Model (MLEDM)
5U.S. Army Geospatial Data Integrated Master Plan
(AGDIMP)
- Developed in preparation for the FCS and the
Joint Geospatial Enterprise Services (J-GES)
Initial Capabilities Document (ICD) - Vision
- Solder as a sensor with a two-way data flow
between the area of operations (AO) and
geospatial databases at data repositories - Key recommendations
- Joint geospatial data dictionary
- Standard ontology
- Integrated, end-to-end geospatial process
- Real-time, on-site geospatial data updates
broadly andrapidly disseminated - Brilliant push dissemination
- Machine readable metadata
- Procedures standards for fusion, conflation,
filtering,transformation, etc.
6Similarities and DifferencesAGDIMP vs. Dobey /
Eirich
- Both utilize TPPU paradigm
- Both utilize 3-schema architecture and mediating
layer - Implicit in AGDIMP
- AGDIMP envisions geospatial data repositories
based upon emerging NSG standards - The AGDIMP is the Army's implementation of the
National System for Geospatial Intelligence
Geospatial Transition Plan (GTP). The
GTPprovides an overarching vision, concept of
operations, and animplementation plan to assist
the NSG in providing for global geospatial
readiness and responsiveness that is current,
accurate, relevant, and interoperable, and fully
supports the Common Operational Picture. - AGDIMP does not reduce environmental concepts to
atomic forms
The National System for Geospatial-Intelligence
7Geospatial Intelligence Database Integration
(GIDI)
- Three-schema architecture to integrate existing
NGA databases and production tools - Fielded in 2002
- Continuous use by GIG-based Operations community
via NGA Gateway - Evolved to meet Homeland Security and other
requirements - Being integrated into the Geospatial-Intelligence
Knowledge Base (GKB) in early-FY06 - The Geospatial Intelligence Feature Database
(GIFD)serves the role of a MLEDM - Supports data exchange between ESRI-based
andIntergraph-based geospatial data
environmentsthrough mappings and a common
datastore - Data dictionary is FACC Ed. 2.1 with US National
extensions - No atomic data elements or canonical forms for
environmental concepts
8Multilateral Interoperability Programme (MIP)
C2IEDM
- Actual implementation of a three-schema
architecture - Here, the C2IEDM serves the role of a MLEDM
- Producers and consumers are national C2
Information Systems (C2IS) - No attempt for atomistic reduction
- Semantic interoperability
- Proposal for a data access stack with multiple
abstraction layers, access control and
notification services
In order to ensure true semantic
interoperability, far-reaching modifications to
the core of national C2ISs are necessary rather
than just the addition of mapping adapters as new
interfaces to the existing systems. -- M.
Schmitt, Integration of the MIP C2IEDM into
National Systems
9GDI FACT REDM, DREDM and the MLEDM
- DREDM an adjudicated union of EDMs
- Losslessly supports representation of concepts of
all constituents - Uses a common data dictionary
- Approach taken by NGA GIDI
- REDM an adjudicated intersection of EDMs
- Intended to express the common semantics to
support the closely-coupled deep interoperation
of multiple systems - Advantages of the atomic concept MLEDM over the
DREDM - MLEDM very powerful if uniqueness of
representation can be attained an open question - Issues remain with mapping composite concepts
10Lessons Learned from Previous Efforts
- Army has established a technical vision and
recommended policies to foster the interchange
and interoperability of geospatial data in the
context of the GIG and GES (i.e., the AGDIMP) - While advocating a Joint geospatial data
dictionary and data model,it does not envision
success as predicated on atomic decomposition - The three-schema architecture has been
implemented previously in the environmental
domain and has aided in the interchange and
interoperability of environmental data (e.g., the
GIDI) - Successful while operating above the atomic
level - Rigorous normalization of a rich data model is a
complex undertaking (e.g., the C2IEDM) - No existing implementation (of which we are
aware) has attempted to develop or leverage an
environmental logical data model comprised
entirely of irreducible (atomic) conceptual
elements
Global Information Grid GIG Enterprise Services
11Mappings and Semantic Nuances
- Dobey and Eirich state
- Another potential tradeoff exists in cases
where use of a mediating layer does not provide
for a lossless representational match for a
source data item. In this case, there may be
nuances of semantics that are lost in the
translation. - On mapping complexity, Schmitt states (C2IEDM
context) - The required mapping rules can be very complex
in practice. In particular, this holds in cases
in which there is no clear 11 mapping of
concepts. For instance, n attributes of the ODB
operational data base might have to be mapped
onto m attributes in the C2IEDM where the
attributes may be distributed over several
entities.
12Nuance Examples Abound(especially for aggregate
concepts)
13Losslessness and Correctness
- Assertion All lossless mappings are correct,
but the converse is false - RIVER ? WATERCOURSE is correct but lossy
- RECYCLING_SITE ? AB010 Wrecking Yard/Scrap Yard
is incorrect - Dobey / Eirich analogy
- Another analogy for this transfer might be a
chemical reaction, where atoms contained in
molecules from one or more substances are
exchanged and reassembled into molecules of one
or more different substances. The common
interchange hub provides a mechanism wherein the
disassembly and reassembly of data objects can
take place. - The concepts of losslessness and correctness
elucidate flaws in the analogy
14Mapping Aggregate to Atomic Concepts
FACC Composite Concept
MLEDM Atomic Concepts
?
- How does instance data actually map?
- Chemical reaction analogy fallacy
- Molecules are the conjunction of their elements
while aggregate environmental features are the
disjunction of possible specific types - When two hydrogen and one oxygen atom combine to
form one H2O molecule, both sides of the equation
still have three atoms two hydrogen and one
oxygen - But an aggregate feature instance is only one of
the possible feature types comprising the feature
type (concept) definition - There are simply no building blocks to
disassemble and recombine
15Normalization the Silver Bullet?
- While there are rigorous definitions of nth
normal forms in relational database theory, an
intuitive description is - Every expressible semantic can be reduced to a
unique canonical form - A stated goal of the C2IEDM, however Schmitt
states - The MIP community is continually improving the
model but there will always be some unresolved
problems. - Is normalizing a logical data model of fine
grained, atomic environmental concepts tractable? - Take, e.g., the atomic concept of RIVER
16John Sowas Exampleriver and stream vs. fleuve
and rivière
- In English, size is the feature that
distinguishes river from stream - In French, a fleuve is a river that flows into
the sea, and a rivière is either a river or a
stream that flows into another river
- Life experiences color our interpretation of
words, and, no matter how precise and rigorous
the definitions in an environmental data
dictionary, not everyone will agree on all
nuances of their meanings - Reducing a rich environmental data model to
normal form is a monumental task which may not
even be possible - Objective determination of the atomicity of
environmental concepts may not be possible
Many issues will require thought and adjudication
by thoughtful people to arrive at a good solution
for a particular context. But there is probably
no perfect one.
17Dynamic Semantic Content Update Implications of
the GIG
- NCOW tenet Allow communication of all
information of interest to all interested
parties all the time - What about a new semantic concept?
- Must be incorporated in MLEDM
- Example
- Consumer declares interest in geospatial feature
SHOE_FOOTPRINT with attributes MANUFACTURER and
MODEL and respective enumerants BRUNO_MAGLIand
LORENZO - If not in MLEDM, how does consumer specify
requirement? - Autonomic updates to MLEDM judged infeasible for
foreseeable future - Human in the loop at a central hub of the data
interchange design process - Contrary to design philosophy of the GIG, but no
available alternatives
An approach to self-managed computing systems
with a minimum of human interference. The term
derives from the body's autonomic nervous system,
which controls key functions without conscious
awareness or involvement. IBM Corp, Autonomic
Computing Glossary http//www.research.ibm.com/au
tonomic/glossary.html
http//www.cnn.com/US/9701/23/shoe.sales/
18Conclusions
- Three-schema architecture
- Tried-and-true and also well established in the
environmental domain - Mappings from aggregate concepts to aggregate
concepts - Inherently problematic, usually lossy and often
incorrect - Chemical recombinations aggregate-level data
model mappings analogy - Fundamentally flawed
- Chemical recombinations are represented by
conjunctions of atomic elements - Aggregate-level environmental concepts are
represented by disjunctions of their atomic
building blocks (subtypes) - Logical data model reduction to a maximally
normalized (nth normal) form - Feasible in constrained (one might say
artificial) domains - E.g., banking, inventory control, and shipping
and receiving - Problematical for a real-world environmental data
model - Ultimately suffers the ambiguities, redundancies
and nuances of natural language - Significant programs (e.g., the MIP C2IEDM) have
not yet demonstrated success - Unable to eliminate all ambiguities and
redundancies - Unable (yet, anyway) to guarantee the mapping of
an arbitrary real world conceptto a (unique)
canonical form - GIG tenant (all information of interest, to all
interested parties, all the time) - When a producer or consumer needs a new semantic
concept, can it autonomically publish its
description and interested parties make immediate
use of it? - We think not human analysts will be employed in
this regard for some time
19Recommendations
- Basic tenets sound
- Three-schema architecture, mediating layer LDM in
normal form, and well constructed ontology - But no substitute for hard work by SMEs to work
out and document the semantics of the domains of
interoperating COIs - GES mediation services can implement the results,
but hard work to reach cross-COI agreement
remains the necessary precondition - Aggregate concepts should continue to be included
in EDMs - But with explicit, instance-based relationships
to their more specific subtypes - Further research and development investment is
required to meet the environmental data
interoperability challenge - Environmental data model atomization
- COI-driven EDM mappings
- Computational aspects of linguistics modeling
- Data fusion and conflation
Community of Interest