Title: The Challenge of Environmental Data Interoperability on the Global Information Grid GIG
1The Challenge of Environmental Data
Interoperabilityon the Global Information Grid
(GIG)
Virginia T. Dobey, SAIC/DMSO
Virginia.Dobey.CTR_at_dmso.mil (703)
824-3411 Peter L. Eirich, JHU/APL
Peter.Eirich_at_jhuapl.edu (240) 228-7264
2The Emerging GIG Data Environment (Task, Post,
Process and Use - TPPU)
3GIG Policy The TPPU Paradigm
(diagram obtained from http//ges.dod.mil/tppu.ht
m)
4What are Warfighter Issues?Shifting Paradigms
- The adoption of a Net-Centric Data Enterprise
- Its not just a producer / user world anymore
(now EVERYONEs a producer!) - Consumers want access to data / information /
knowledge immediately - Consumers want to input how the data is
manipulated/filtered - Moving from a Collector / Product focus Task,
Process, Exploit and Disseminate - To a ... Analyst / Data focus Task, Post,
Process and Use (share)
Reliance on Factory Resource intensive data
downloadOne (producer) to many (consumers)
Bandwidth utilization / availability - not a
consideration
Moving to many-to-many topology Smart data
ordering agents Sharing of information
Immediate access to Through-the-Sensor
data Bandwidth - critical to warfighters
5GIG Increasing the Interoperability Challenge
- Everyone is a potential producer
- Multiple legacy environmental data sources and
user systems exist - Significant investment in existing production and
user hardware and software - Data in multiple (often system-specific) formats
need updating - Few data resources are reliably compatible, even
those produced by the Government - example OAML product-specific formats
- Power to the Edge concept empowers user to
identify other sources of required data - No requirement for common data syntax/semantics
- Increases the challenge of data fusion
6GIG Assumptions in Assessing Environmental Data
Interoperability
- Traditional data producers will continue to
provide data in producer-specific and
product-specific formats following existing
production guidelines, since those products and
formats meet the general needs of most customers
(users). Formats will continue to leverage
producer standards such as the Joint METOC
Conceptual Data Model and the Feature and
Attribute Coding Catalog. Tailoring data to user
requirements will remain a user responsibility. - Users will need a data mediation capability that
can access not only these traditional data
sources but also non-traditional and often
unknown data sources such as commercial products
(sometimes having proprietary formats) and
streaming data from in-situ sensors (anticipated
development using future technology) which can be
identified and obtained over the GIG
7Barriers to Data Interoperability
- Data sources, models, and operational systems
developed independently of each other - Simulations not traditionally designed to
interface with operational systems (and sometimes
with each other!) - Tailored (both in format and in content) datasets
that are optimized for a specific system support
only specific uses
Result syntactically and semantically different
forms of data representation are in use
8Developing Interoperable Data
- A data model is an abstract, self-contained,
logical definition of the objects, operators, and
so forth, that together constitute the abstract
machine with which users interact. The objects
allow us to model the structure of dataAn
implementation of a given data model is a
physical realization on a real machine of the
components of the abstract machine that together
constitute that modelthe familiar distinction
between logical and physical emphasis in the
original
C.J. Date1 - Logical Data Model A model of data that
represents the inherent structure of that data
and is independent of the individual applications
of the data and also of the software or hardware
mechanisms which are employed in representing and
using the data. DoD 8320.1-M2 - Normalization leads to an exact definition of
entities and data attributes, with a clear
identification of homonyms (the same name used to
represent different data) and synonyms (different
names used to represent the same data). It
promotes a clearer understanding and knowledge of
what each data entity and data attribute means.
C.Finkelstein3
1 Colleague of E.F. Codd, originator and
developer of relational database theory 2 DoD
authority on information engineering 3
Originator and main architect of the Information
Engineering methodology
9Normalization Challenges
- Users are familiar with non-normalized physical
data elements. Tendency is to call these
logical and stop there. - In any large data model, normalization is
difficult. It is often ignored (benign neglect). - Complete data models incorporate business rules
(how the entities relate to each other). - May not be needed for an implementation-independen
t model used to develop a data dictionary (of
interoperable concepts), but
10Achieving Data InteroperabilityThe Three-Schema
Architecture
Internal schema
External schema
User application views
Converting user-specific data requirements into
conceptual building blocks for data integration
Conceptual schema
Logical data model building blocks are the basis
for application data structures
Also facilitates ingest of other source data
Normalized logical data model serves as
conceptual design bridge from the external
schema to and from the internal schema
11The Three-Schema Architecture Applied to
Environmental Data
Fusion of normalized data internal to the system
- User (production) applications
- CBRN,
- Weather effects,
- Terrain trafficability,
Normalized logical data model serves as
conceptual bridge
- Producer product formats
- METOC producer-specific formats,
- NGA product formats,
- JMCDM,
- FACC,
Allows for ingest of other source data
Implementation-independent middle layer can be
placed at the producer interface, user
interface or somewhere in between
12Creating a Reusable Implementation-Independent
Middle Layer
- Such an architectural layer must be
- Independent of source products
- Independent of optimized system implementation
- Capable of providing for the FULL SPECIFICATION
of all source product data as well as all system
data requirements - Developed as an implementation-independent
(LOGICAL) relational data model, as required by
DoDAF OV-7 Product view
13A Reusable Middle Layer for Environmental Data
- Requires standardized terms in all environmental
domains - Should leverage existing International/DoD
standards - Requires a concise, well-organized, non-redundant
data structure - Must extend from a normalized logical data model
- Requires highly granular, independent data
elements - Atomic level concepts
- To support the many formats required by users
(precise rendering of translations to and from
the hub)
14A Complete Representation All Environmental
Domains
15A Concise Non-Redundant Data Structure
- Must address format as well as content
- Format
- Must handle the large number of required data
representation formats while preserving
consistency of data (the fair fight across the
federation) - Content
- Must be based on atomic data elements from a
normalized logical data model (support for data
fusion)
16Many Formats of Environmental Data
Tabular data
17And More Formats Algorithmic/Model Support and
Output Data
18And Even More Five-D Data Visualization
19Some MS Additions to the set of Environmental
Data Formats
- Compact Terrain Data Base (proprietary)
- DTED (product)
- ES GDF (proprietary)
- ES S1000 (proprietary)
- GeoTIFF
- Gridded raster
- MultiGen (proprietary)
- Shapefile (proprietary)
- Terrex DART, Terra Vista (proprietary)
- Vector Product Format (product)
20Atomic Level Concepts
- To facilitate precise rendering of translations
to and from the hub - Producers use their own coding systems, each of
which captures specific desired informationsome
of which may be captured by others, and some of
which may be unique. Almost always each producer
carries information not available from other
sources. Extracting information imbedded in
definitions through explicit statement of atomic
attributes assists in adding attributes without
overwriting the object
21The Value of Atomic-Level Attributes An Example
- Entity Bridge over river
- Entity Suspension bridge
- Entity Bridge for two-way traffic
- Decomposed
- Bridge located over water body river
- Bridge bridge type suspension
- Bridge traffic carried vehicular number of
traffic directions 2 - Results in
- Bridge located over water body river
- bridge type suspension
- traffic carried vehicular
- number of traffic directions 2
- (each of these attributes can be changed/updated
as new information is acquired)
22Complete and AccurateDoes That Mean Data
Fusion?
- Is the COP affected by METOC conditions? If so,
can those effects be reflected in actual changes
to the COP on the user system? This can be
handled internally to the system without
requiring data fusion capability. - Does the user need to derive useful or critical
information from the interaction of METOC/terrain
data and information in the COP and provide it to
other systems? The answer to this question
determines whether data fusion is required by the
user. - Will the warfighter integrate environmental data
into operational problems or will he use them as
map or other overlays? The answer to this
question determines whether data fusion is
required by the user and allowed by the producer. - Does the user need to have the ability to update
METOC conditions and effects as reported by data
from other (e.g., intel, foreign forces, etc.)
battlefield sources? The answer to this question
determines whether data fusion is required by the
user.
23The Result of Improper Data Fusion An MS
Example
What works for one system creates unusual
behaviors in another
24The Challenge of Data Fusion
- What is the total set of requirements?
- There are many processes and products involved
(some of which, as in ArcInfo/ArcView terrain
products, may be proprietary)but the exchange
mechanism must be independent of these. While we
may know all of the currently available sources,
will there ever be new ones available to the
warfighter? - Different views of the environment
- Air, land, sea, space
- Spatial location and orientation (coordinate
system and datum) - Lack of underlying environmental framework
- No integrated reference model available
- Representation (how the concept will be depicted
on the users systema visual object? 2D or 3D?
A data point? Background data for algorithm
use?) - Naming/semantics
- Existing Data Models are conceptual, future
models which are non-integrated and dont address
current data repositories and data interchange
requirements
Business
Technical
25THE TRADITIONAL SOLUTION Direct Mapping
- PROs
- Makes each data user organization responsible for
its own data integration - Reduces cross-organizational funding debates
- Simplifies justification/approval
- CONs
- Places burden of integration on data users
- Replication of efforts among different users
RESULT A BIAS AGAINST TRANSLATION SOFTWARE
26A GIG-ORIENTED SOLUTION The Interoperable
Middle Layer
- PROs
- Facilitates reuse of data by multiple users
- Facilitates integration of data from multiple
sources - Increases sources of data available to each user
- CONs
- Requires additional common infrastructure
- Requires adherence to data standards
27Why Not Let the Producers Handle it All?
This is the genesis of SEDRIS
28Summary The Middle-Layer Architecture Advantage
- Expensive and time consuming
- Often unreliable and non-interoperable
- Unique conversion needed for each source
- Increase in sources geometrically increases
number of conversions
- Significant reduction in conversion cost
- Higher reliability, interoperability,
integration, and reduction of correlation error - Common and open standards, tools, and software
reuse
29For Further Information
- Virginia T. Dobey, SAIC/DMSO
- Virginia.Dobey.CTR_at_dmso.mil
- (703) 824-3411
- Peter L. Eirich, JHU/APL
- Peter.Eirich_at_jhuapl.edu
- (240) 228-7264
30BACKUP SLIDES
31Formal Definitions of the Normal Forms (1 of 2)
- 1st Normal Form (1NF)
- Def A table (relation) is in 1NF if
- 1. There are no duplicated rows in the table.
- 2. Each cell is single-valued (i.e., there are no
repeating groups or arrays). - 3. Entries in a column (attribute, field) are of
the same kind. - Note The order of the rows is immaterial the
order of the columns is immaterial. - Note The requirement that there be no duplicated
rows in the table means that the table has a key
(although the key might be made up of more than
one columneven, possibly, of all the columns). - 2nd Normal Form (2NF)
- Def A table is in 2NF if it is in 1NF and if all
non-key attributes are dependent on all of the
key. - Note Since a partial dependency occurs when a
non-key attribute is dependent on only a part of
the (composite) key, the definition of 2NF is
sometimes phrased as, "A table is in 2NF if it is
in 1NF and if it has no partial dependencies." - 3rd Normal Form (3NF)
- Def A table is in 3NF if it is in 2NF and if it
has no transitive dependencies.
32Formal Definitions of the Normal Forms (2 of 2)
- Boyce-Codd Normal Form (BCNF)
- Def A table is in BCNF if it is in 3NF and if
every determinant is a candidate key. - 4th Normal Form (4NF)
- Def A table is in 4NF if it is in BCNF and if it
has no multi-valued dependencies. - 5th Normal Form (5NF)
- Def A table is in 5NF, also called
"Projection-Join Normal Form" (PJNF), if it is in
4NF and if every join dependency in the table is
a consequence of the candidate keys of the table. - Domain-Key Normal Form (DKNF)
- Def A table is in DKNF if every constraint on
the table is a logical consequence of the
definition of keys and domains. - Source DATABASE-MANAGEMENT PRINCIPLES AND
APPLICATIONSDr. Ronald E. Wyllys, The University
of Texas at Austin, Austin, Texas, 78712-1276
http//www.gslis.utexas.edu/l384k11w/normover.htm
l