The Challenge of Environmental Data Interoperability on the Global Information Grid GIG - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

The Challenge of Environmental Data Interoperability on the Global Information Grid GIG

Description:

(each of these attributes can be changed/updated as new information is acquired) 05S-SIW-133 ... data into operational problems or will he use them as map ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 33
Provided by: dobeyvir
Category:

less

Transcript and Presenter's Notes

Title: The Challenge of Environmental Data Interoperability on the Global Information Grid GIG


1
The Challenge of Environmental Data
Interoperabilityon the Global Information Grid
(GIG)
Virginia T. Dobey, SAIC/DMSO
Virginia.Dobey.CTR_at_dmso.mil (703)
824-3411 Peter L. Eirich, JHU/APL
Peter.Eirich_at_jhuapl.edu (240) 228-7264
2
The Emerging GIG Data Environment (Task, Post,
Process and Use - TPPU)
3
GIG Policy The TPPU Paradigm
(diagram obtained from http//ges.dod.mil/tppu.ht
m)
4
What are Warfighter Issues?Shifting Paradigms
  • The adoption of a Net-Centric Data Enterprise
  • Its not just a producer / user world anymore
    (now EVERYONEs a producer!)
  • Consumers want access to data / information /
    knowledge immediately
  • Consumers want to input how the data is
    manipulated/filtered
  • Moving from a Collector / Product focus Task,
    Process, Exploit and Disseminate
  • To a ... Analyst / Data focus Task, Post,
    Process and Use (share)

Reliance on Factory Resource intensive data
downloadOne (producer) to many (consumers)
Bandwidth utilization / availability - not a
consideration
Moving to many-to-many topology Smart data
ordering agents Sharing of information
Immediate access to Through-the-Sensor
data Bandwidth - critical to warfighters
5
GIG Increasing the Interoperability Challenge
  • Everyone is a potential producer
  • Multiple legacy environmental data sources and
    user systems exist
  • Significant investment in existing production and
    user hardware and software
  • Data in multiple (often system-specific) formats
    need updating
  • Few data resources are reliably compatible, even
    those produced by the Government
  • example OAML product-specific formats
  • Power to the Edge concept empowers user to
    identify other sources of required data
  • No requirement for common data syntax/semantics
  • Increases the challenge of data fusion

6
GIG Assumptions in Assessing Environmental Data
Interoperability
  • Traditional data producers will continue to
    provide data in producer-specific and
    product-specific formats following existing
    production guidelines, since those products and
    formats meet the general needs of most customers
    (users). Formats will continue to leverage
    producer standards such as the Joint METOC
    Conceptual Data Model and the Feature and
    Attribute Coding Catalog. Tailoring data to user
    requirements will remain a user responsibility.
  • Users will need a data mediation capability that
    can access not only these traditional data
    sources but also non-traditional and often
    unknown data sources such as commercial products
    (sometimes having proprietary formats) and
    streaming data from in-situ sensors (anticipated
    development using future technology) which can be
    identified and obtained over the GIG

7
Barriers to Data Interoperability
  • Data sources, models, and operational systems
    developed independently of each other
  • Simulations not traditionally designed to
    interface with operational systems (and sometimes
    with each other!)
  • Tailored (both in format and in content) datasets
    that are optimized for a specific system support
    only specific uses

Result syntactically and semantically different
forms of data representation are in use
8
Developing Interoperable Data
  • A data model is an abstract, self-contained,
    logical definition of the objects, operators, and
    so forth, that together constitute the abstract
    machine with which users interact. The objects
    allow us to model the structure of dataAn
    implementation of a given data model is a
    physical realization on a real machine of the
    components of the abstract machine that together
    constitute that modelthe familiar distinction
    between logical and physical emphasis in the
    original
    C.J. Date1
  • Logical Data Model A model of data that
    represents the inherent structure of that data
    and is independent of the individual applications
    of the data and also of the software or hardware
    mechanisms which are employed in representing and
    using the data. DoD 8320.1-M2
  • Normalization leads to an exact definition of
    entities and data attributes, with a clear
    identification of homonyms (the same name used to
    represent different data) and synonyms (different
    names used to represent the same data). It
    promotes a clearer understanding and knowledge of
    what each data entity and data attribute means.
    C.Finkelstein3

1 Colleague of E.F. Codd, originator and
developer of relational database theory 2 DoD
authority on information engineering 3
Originator and main architect of the Information
Engineering methodology
9
Normalization Challenges
  • Users are familiar with non-normalized physical
    data elements. Tendency is to call these
    logical and stop there.
  • In any large data model, normalization is
    difficult. It is often ignored (benign neglect).
  • Complete data models incorporate business rules
    (how the entities relate to each other).
  • May not be needed for an implementation-independen
    t model used to develop a data dictionary (of
    interoperable concepts), but

10
Achieving Data InteroperabilityThe Three-Schema
Architecture
Internal schema
External schema
User application views
Converting user-specific data requirements into
conceptual building blocks for data integration
Conceptual schema
Logical data model building blocks are the basis
for application data structures
Also facilitates ingest of other source data
Normalized logical data model serves as
conceptual design bridge from the external
schema to and from the internal schema
11
The Three-Schema Architecture Applied to
Environmental Data
Fusion of normalized data internal to the system
  • User (production) applications
  • CBRN,
  • Weather effects,
  • Terrain trafficability,

Normalized logical data model serves as
conceptual bridge
  • Producer product formats
  • METOC producer-specific formats,
  • NGA product formats,
  • JMCDM,
  • FACC,

Allows for ingest of other source data
Implementation-independent middle layer can be
placed at the producer interface, user
interface or somewhere in between
12
Creating a Reusable Implementation-Independent
Middle Layer
  • Such an architectural layer must be
  • Independent of source products
  • Independent of optimized system implementation
  • Capable of providing for the FULL SPECIFICATION
    of all source product data as well as all system
    data requirements
  • Developed as an implementation-independent
    (LOGICAL) relational data model, as required by
    DoDAF OV-7 Product view

13
A Reusable Middle Layer for Environmental Data
  • Requires standardized terms in all environmental
    domains
  • Should leverage existing International/DoD
    standards
  • Requires a concise, well-organized, non-redundant
    data structure
  • Must extend from a normalized logical data model
  • Requires highly granular, independent data
    elements
  • Atomic level concepts
  • To support the many formats required by users
    (precise rendering of translations to and from
    the hub)

14
A Complete Representation All Environmental
Domains
15
A Concise Non-Redundant Data Structure
  • Must address format as well as content
  • Format
  • Must handle the large number of required data
    representation formats while preserving
    consistency of data (the fair fight across the
    federation)
  • Content
  • Must be based on atomic data elements from a
    normalized logical data model (support for data
    fusion)

16
Many Formats of Environmental Data
Tabular data
17
And More Formats Algorithmic/Model Support and
Output Data
18
And Even More Five-D Data Visualization
19
Some MS Additions to the set of Environmental
Data Formats
  • Compact Terrain Data Base (proprietary)
  • DTED (product)
  • ES GDF (proprietary)
  • ES S1000 (proprietary)
  • GeoTIFF
  • Gridded raster
  • MultiGen (proprietary)
  • Shapefile (proprietary)
  • Terrex DART, Terra Vista (proprietary)
  • Vector Product Format (product)

20
Atomic Level Concepts
  • To facilitate precise rendering of translations
    to and from the hub
  • Producers use their own coding systems, each of
    which captures specific desired informationsome
    of which may be captured by others, and some of
    which may be unique. Almost always each producer
    carries information not available from other
    sources. Extracting information imbedded in
    definitions through explicit statement of atomic
    attributes assists in adding attributes without
    overwriting the object

21
The Value of Atomic-Level Attributes An Example
  • Entity Bridge over river
  • Entity Suspension bridge
  • Entity Bridge for two-way traffic
  • Decomposed
  • Bridge located over water body river
  • Bridge bridge type suspension
  • Bridge traffic carried vehicular number of
    traffic directions 2
  • Results in
  • Bridge located over water body river
  • bridge type suspension
  • traffic carried vehicular
  • number of traffic directions 2
  • (each of these attributes can be changed/updated
    as new information is acquired)

22
Complete and AccurateDoes That Mean Data
Fusion?
  • Is the COP affected by METOC conditions? If so,
    can those effects be reflected in actual changes
    to the COP on the user system? This can be
    handled internally to the system without
    requiring data fusion capability.
  • Does the user need to derive useful or critical
    information from the interaction of METOC/terrain
    data and information in the COP and provide it to
    other systems? The answer to this question
    determines whether data fusion is required by the
    user.
  • Will the warfighter integrate environmental data
    into operational problems or will he use them as
    map or other overlays? The answer to this
    question determines whether data fusion is
    required by the user and allowed by the producer.
  • Does the user need to have the ability to update
    METOC conditions and effects as reported by data
    from other (e.g., intel, foreign forces, etc.)
    battlefield sources? The answer to this question
    determines whether data fusion is required by the
    user.

23
The Result of Improper Data Fusion An MS
Example
What works for one system creates unusual
behaviors in another
24
The Challenge of Data Fusion
  • What is the total set of requirements?
  • There are many processes and products involved
    (some of which, as in ArcInfo/ArcView terrain
    products, may be proprietary)but the exchange
    mechanism must be independent of these. While we
    may know all of the currently available sources,
    will there ever be new ones available to the
    warfighter?
  • Different views of the environment
  • Air, land, sea, space
  • Spatial location and orientation (coordinate
    system and datum)
  • Lack of underlying environmental framework
  • No integrated reference model available
  • Representation (how the concept will be depicted
    on the users systema visual object? 2D or 3D?
    A data point? Background data for algorithm
    use?)
  • Naming/semantics
  • Existing Data Models are conceptual, future
    models which are non-integrated and dont address
    current data repositories and data interchange
    requirements

Business
Technical
25
THE TRADITIONAL SOLUTION Direct Mapping
  • PROs
  • Makes each data user organization responsible for
    its own data integration
  • Reduces cross-organizational funding debates
  • Simplifies justification/approval
  • CONs
  • Places burden of integration on data users
  • Replication of efforts among different users

RESULT A BIAS AGAINST TRANSLATION SOFTWARE
26
A GIG-ORIENTED SOLUTION The Interoperable
Middle Layer
  • PROs
  • Facilitates reuse of data by multiple users
  • Facilitates integration of data from multiple
    sources
  • Increases sources of data available to each user
  • CONs
  • Requires additional common infrastructure
  • Requires adherence to data standards

27
Why Not Let the Producers Handle it All?
This is the genesis of SEDRIS
28
Summary The Middle-Layer Architecture Advantage
  • Expensive and time consuming
  • Often unreliable and non-interoperable
  • Unique conversion needed for each source
  • Increase in sources geometrically increases
    number of conversions
  • Significant reduction in conversion cost
  • Higher reliability, interoperability,
    integration, and reduction of correlation error
  • Common and open standards, tools, and software
    reuse

29
For Further Information
  • Virginia T. Dobey, SAIC/DMSO
  • Virginia.Dobey.CTR_at_dmso.mil
  • (703) 824-3411
  • Peter L. Eirich, JHU/APL
  • Peter.Eirich_at_jhuapl.edu
  • (240) 228-7264

30
BACKUP SLIDES
31
Formal Definitions of the Normal Forms (1 of 2)
  • 1st Normal Form (1NF)
  • Def A table (relation) is in 1NF if
  • 1. There are no duplicated rows in the table.
  • 2. Each cell is single-valued (i.e., there are no
    repeating groups or arrays).
  • 3. Entries in a column (attribute, field) are of
    the same kind.
  • Note The order of the rows is immaterial the
    order of the columns is immaterial.
  • Note The requirement that there be no duplicated
    rows in the table means that the table has a key
    (although the key might be made up of more than
    one columneven, possibly, of all the columns).
  • 2nd Normal Form (2NF)
  • Def A table is in 2NF if it is in 1NF and if all
    non-key attributes are dependent on all of the
    key.
  • Note Since a partial dependency occurs when a
    non-key attribute is dependent on only a part of
    the (composite) key, the definition of 2NF is
    sometimes phrased as, "A table is in 2NF if it is
    in 1NF and if it has no partial dependencies."
  • 3rd Normal Form (3NF)
  • Def A table is in 3NF if it is in 2NF and if it
    has no transitive dependencies.

32
Formal Definitions of the Normal Forms (2 of 2)
  • Boyce-Codd Normal Form (BCNF)
  • Def A table is in BCNF if it is in 3NF and if
    every determinant is a candidate key.
  • 4th Normal Form (4NF)
  • Def A table is in 4NF if it is in BCNF and if it
    has no multi-valued dependencies.
  • 5th Normal Form (5NF)
  • Def A table is in 5NF, also called
    "Projection-Join Normal Form" (PJNF), if it is in
    4NF and if every join dependency in the table is
    a consequence of the candidate keys of the table.
  • Domain-Key Normal Form (DKNF)
  • Def A table is in DKNF if every constraint on
    the table is a logical consequence of the
    definition of keys and domains.
  • Source DATABASE-MANAGEMENT PRINCIPLES AND
    APPLICATIONSDr. Ronald E. Wyllys, The University
    of Texas at Austin, Austin, Texas, 78712-1276
    http//www.gslis.utexas.edu/l384k11w/normover.htm
    l
Write a Comment
User Comments (0)
About PowerShow.com