Information Ontologies for the Intelligence Communities A Survey of DCGS-A Ontology Work - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Information Ontologies for the Intelligence Communities A Survey of DCGS-A Ontology Work

Description:

Information Ontologies for the Intelligence CommunitiesA Survey of DCGS-A Ontology Work. ... Standard operating procedures and ontology quality assurance. – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 41
Provided by: buf70
Category:

less

Transcript and Presenter's Notes

Title: Information Ontologies for the Intelligence Communities A Survey of DCGS-A Ontology Work


1
Information Ontologies for the Intelligence
CommunitiesA Survey of DCGS-A Ontology Work
  • Ron Rudnicki
  • November 12, 2013

2
Topics
  • The DCGS-A ontology suite
  • Standard operating procedures and ontology
    quality assurance
  • Annotation vs. Explication
  • How the DCGS-A ontologies are being used for the
    explication of data models

3
The DCGS-A Ontology Suite
4
Motives for Ontology Development
  • Part of a Big Data solution
  • Multiple formats including free text,
    semi-structured and structured
  • Some surprise data sets are made available a
    short time prior to system testing
  • Data sets will change along with domain of
    interest
  • Data can not be collected into a single store
  • Provide cross-source searching and analytics
  • Need to maintain the provenance of data

5
Contribution of the Ontologies
  • Design choices affect the outcome
  • Common Upper Level Ontology The ontologies
    extend from a common upper level ontology
  • Delineated Content - Each ontology has a clearly
    specified and delineated content that does not
    overlap with any other ontology
  • Composable Content Classes in the ontologies
    represent entities at a level of granularity that
    can be composed in various ways to map to terms
    in sources

6
Integration Through a Common Upper Level Ontology
  • Encourages uniform representations of domains

Entity
Object
Quality
bearer_of
Organization
Quality of Physical Artifact
Quality of Organization
Physical Artifact
has_quality
has_quality
  • Provides common patterns within the target
    ontology for mappings from the sources
  • Easier to include new sources of data
  • Enables more uniformity between queries
  • Easier to transition to domains of interest

CUBRC - Proprietary
7
Integration Through Delineated Content
  • Each class in the target ontologies is defined in
    one place

Entity
Object
Organization
Physical Artifact
Spatial Location
located_at
located_at
  • Facilitates locating a class within the target
    ontologies
  • Provides better recall in queries
  • Less likely to overlook relevant data

CUBRC - Proprietary
8
Integration Through Composition of Classes
Data Source 2
  • Granular classes better accommodate mappings
    from various perspectives on the same domain
    without loss of information

prescribes
Model
Car
has quality
Full Size
manufactures
Length of Wheelbase
Manufacturer
Mid Size
is nominally measured by
Compact
CUBRC - Proprietary
9
High Level Depiction of Domain
  • Provides Coverage of Domain of Human Activity

Actions
to perform
People Organizations
Artifacts
that take place in
use
Natural Artificial Environments
are distinguished by
Time
Attributes
10
Developed Using a Top-Down Bottom-Up Strategy
  • Partial List of Data Sources Used
  • Treasury Office of Foreign Assets Control
    Specially Designated Nationals and Blocked
    Persons
  • NCTC Worldwide Incidents Tracking System
  • UMD Global Terrorism Database
  • RAND Database of Worldwide Terrorism Incidents
  • LDM version .60 (TED)
  • VMF PLI
  • DCGS-A Global Graph
  • DCGS-A Event Reporting
  • BFT Report (CCRi test data)
  • Cidne Sigact (CCRi test data)
  • Long War Journal
  • Harmony Documents from CTC at West Point
  • Threats Open Source Intelligence Gateway

11
Based Upon Standards
  • Partial List of Doctrine and Standards Used
  • DOD Dictionary of Military and Associated Terms
    (JP 1-02)
  • JC3IEDM
  • Counterinsurgency (FM 3-24)
  • Operations (FM 3-0)
  • Multinational Operations (JP 3-16)
  • International Standard Industrial Classification
    of all Economic Activities Rev.4 (ISIC4)
  • Universal Joint Task List (CJSCM 3500.04C)
  • Weapon Technical Intelligence (WTI) Improvised
    Explosive Device IED Lexicon
  • Information Artifact Ontology (IAO)
  • Phenotype and Trait Ontology (PATO)
  • Foundational Model of Anatomy (FMA)
  • Regional Connection Calculus (RCC-8)
  • Allen Time Calculus
  • Wikipedia

12
Current DCGS-A Ontology Architecture
13
Ontology Metrics
Ontology Name Number of Classes Number of Relations Equivalent Class Axioms Subclass Of Axioms
Agent Ontology 986 71 378 1004
AIRS Emotion Ontology 73 88
AIRS Mid-Level Ontology 516 8 221 641
Artifact Ontology 298 3 310
Event Ontology 409 2 423
Extended Relation Ontology 45
Geospatial Ontology 297 14 13 316
Information Entity Ontology 83 29 21 83
Quality Ontology 681 2 681
Relation Ontology 20
Time Ontology 16 22 30
Totals 3359 209 640 (19) 3576 (106)
14
Standard Operating Procedures and Ontology
Quality Assurance
15
Semantic Conformance Testing
  • Semantic Smuggling
  • An importing ontology reuses a term from another
    and adds to its content in some way
  • adds an axiom to some upper-level term.
  • the imported class inherits content from parent
    classes of the importing ontology
  • Corrective action
  • request that the curators of the ontology that is
    the source of the class add the content
  • If not possible, then plan for revision of import
    architecture
  • the importing ontology should introduce a subtype
    of the term to which the content could then be
    added.

16
Semantic Conformance Testing
  • Multiple Inheritance
  • Defining a class to be a subtype of more than one
    superclass
  • Corrective action
  • remove any subclass assertions that are false
    (e.g. Bank subClassOf Organization, Bank
    subClassOf Facility)
  • refactor superclasses into disjoint classes
  • write axiom so that the multiple inheritance
    exists in the inferred hierarchy rather than the
    asserted hierarchy

17
Semantic Conformance Testing
  • Taxonomy Overloading
  • Extending an ontology by introducing terms as
    child terms of a higher-level ontology using
    another relation (e.g. part of, is narrower in
    meaning than)
  • Corrective action
  • Place the terms into their appropriate place in
    the taxonomy

18
Semantic Conformance Testing
  • Containment
  • a term from a lower level is not a subclass of
    any class of the ontologies it imports
  • containment requires that the domain covered by a
    lower-level ontology be circumscribed by the
    domain covered by the higher-level ontology from
    which it extends.
  • Corrective action
  • Add the class (or an appropriate superclass) to
    the appropriate higher-level ontology
  • Import a higher-level ontology that does provide
    a superclass

19
Semantic Conformance Testing
  • Conflation
  • an ontology includes information model assertions
    that are not true of the domain
  • e.g. carrying over a not null constraint as in
    every person must have an email address
  • Corrective action
  • Make needed modifications to axiom (generally the
    source of such violations) so that it conforms to
    the domain
  • e.g. every person that has purchased from
    amazon.com must have an email address

20
Semantic Conformance Testing
  • Logic of Terms
  • a class is a set-theoretic combination of other
    classes
  • Corrective action
  • Add the class as a new type (College or
    University gt Higher Education Organization)

21
Calculating Value of Ontology Terms
  • Provide some basis for class inclusion/exclusion
  • The content of ontologies used in an enterprise
    will be the subject of debate and possibly,
    disagreement
  • Having one or more metrics that are proven
    measures of value would help resolve such
    disagreements
  • Current methods are often applied to ontologies
    in their entirety (e.g. Swoogle), fewer are
    designed to evaluate value of ontology classes
    and properties

22
Calculating Value of Ontology Terms
  • Statistical Methods Supplemented by Weightings
  • A purely statistical method applied to an
    ontology as a graph will undervalue isolated
    terms that are of importance in a domain
  • Importance, is at least a function of amount of
    use and criticality
  • Usage is tractable to definition, criticality
    less so

23
Annotation vs. Explication
24
Mappings
  • Value and Assessment
  • Many of the purposes for which ontologies are
    built will be realized only to the degree to
    which they are linked to data
  • One component of mapping is an act of translation
    and should be assessed on the degree of
    equivalence between source and target
  • Another component of mapping is an implementation
    and should be assessed on performance criteria
    such as costs and scalability
  • Techniques and technologies vary

An introductory overview can be found at
http//www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_S
urveyReport_01082009.pdf
25
Mappings
  • Subtypes
  • Hashtags the subjective assignment of uncurated
    keywords to a source
  • Annotations rule based assignment of curated
    terms to a source
  • Machine maps automated, structure-based
    translation of source into target vocabulary
  • Definitions rule based expansion of source
    terms into types and differentiating attributes
  • Explications rule based translation of all
    semantic content (including that which is
    implicit) of a source using terms and relations
    of the ontology

Term Mappings
Assertion Mappings
26
Mappings
  • Pros and Cons
  • Term mappings
  • Can be automated
  • Enable faceted queries (Select JFK as type
    Airport)
  • Can result in significant loss of information
  • Not reuseable
  • Assertion mappings
  • Manual process that does not scale
  • Requires extensive knowledge of the target
    ontology
  • Enables navigational queries
  • Improves integration of data sources
  • Can result in significant carry over of source
    information
  • Not reuseable

27
Assessing Current Mapping Methods
No Ideal Instances
Low
Hashtags
Annotations
Machine Maps
Definitions
Time/Money
Explications
High
Translation
Lossy
Lossless
28
Examples of Mappings
A Source of Data About Cities
CityId Name State IncorporationDate Area Coordinates
1 Tampa Florida July 15, 1887 170.6 sq mi. 27 5650 N 82 2731 W
2 Boston Massachusetts March 4, 1822 89.63 sq. mi. 42 2129 N 71 0349 W
3 Dallas Texas February 2, 1856 385.8 sq. mi. 32 4658 N 96 4814 W
4 Los Angeles California April 4, 1850 503 sq. mi. 34 03 N 118 15 W
29
Explication of the Source as an End Point
Coordinates
City Name
City
designated_by
designated_by
part_of
Area
delimits
State
has_quality
City Government
designated_by
State Name
participates_in
Act Of Incorporation
occurs_on
Incorporation Date
30
Explication Implementation Example
  • A Portion of a D2RQ File Mapping Birth Place and
    Date
  • mapPersonBirth
  • rdftype d2rqClassMap
  • rdfslabel "Person Birth"
  • d2rqclass eventBirth
  • d2rqclassDefinitionLabel "Treasury OFAC
    Person Birth"
  • d2rqdataStorage mapKDD-02-B-Treasury-SDN
  • d2rquriPattern "treasurydata_PersonBirth/_at__at_
    TreasuryPerson.idurlify_at__at_" .
  • mapPersonBirthTemporalInterval
  • rdftype d2rqClassMap
  • rdfslabel "Person Birth Temporal Interval"
  • d2rqclass spanTemporalRegion
  • d2rqclassDefinitionLabel "Treasury OFAC
    Person Birth Temporal Interval"
  • d2rqdataStorage mapKDD-02-B-Treasury-SDN
  • d2rquriPattern "treasurydata_PersonBirthTem
    poralIdentifier/_at__at_TreasuryPerson.idurlify_at__at___at__at_Tre
    asuryPerson.dateofbirthlist_uidurlify_at__at_" .
  • mapPersonBirthTemporalIntervalIdentifier
  • rdftype d2rqClassMap
  • rdfslabel "Person Birth Temporal Interval
    Identifier"

31
Explication Current Method
  • The full mapping of birth place and date consists
    of 16 such blocks
  • The full mapping of the entire table consists of
    150 such blocks
  • If the ontologies change, so must the mappings
  • Common patterns in the ontologies make some
    re-use possible by adding placeholders to
    portions of maps and replacing them with specific
    values for the source at hand.
  • Applications exist or are under development to
    auto-generate initial mappings that a human can
    then edit

32
Explication Current Method
  • The improvements are source and implementation
    specific
  • What works for structured sources mapped in D2RQ
    cant be reused in structured sources mapped in
    other languages (R2RML, EDOAL)
  • Separate mappings would be needed for sources
    expressed in XML, HTML or free text
  • Another solution is needed

33
How the DCGS-A Ontologies are Being Used for the
Explications of Data Models
34
Start with Machine Made Assertion Mappings
  • Type to Type mapping (e.g. table column to class)
  • Relationships between types expressed using a
    default generic object property
  • Meta-data about the source entity (e.g. table
    name, column name, element name) is mapped to
    annotation properties (rdfslabel)

35
Machine Made Assertion Mapping as a Starting Point
Name
Coordinates
has_coordinates
has_name
City
has_area
Area
has_state
State
has_incorporation_date
Class mappings created by associating the
container with the components with a generic
property
Incorporation Date
36
Current Content of Ontologies is not Well Used
  • Ontologists are trained to associate subclass and
    equivalence axioms to classes
  • OWL reasoners dont expand the graph by creating
    instances based upon these axioms
  • OWL reasoners are resource expensive and often
    result in unimpressive output
  • Not much control can be exerted upon which
    inferences an OWL reasoner performs

37
Create a Library of Rules
  • Change the relationship and type of the name of a
    city
  • CONSTRUCT ?city exdesignated_by ?cityname .
  • ?cityname rdftype exCityName .
  • WHERE
  • ?city rdftype exCity .
  • ?cityname rdftype exName .
  • ?city ?related_to ?cityname .
  • NOT EXISTS ?city exdesignated_by ?cityname .

38
Create a Library of Rules
  • Delete the original relationship and type
  • DELETE ?city ?related_to ?name .
  • ?name rdftype exName .
  • WHERE
  • ?city ?related_to ?name .
  • ?name rdftype exName .
  • ?city exdesignated_by ?name .
  • ?name rdftype exCityName .

39
The Affect of Such Rules on Translated Data
city_name_1
coordinates _1
has_text_value
Tampa
designated_by
has_value
designated_by
27 5650N 82 2731W
city_1
part_of
area_1
state_1
has_quality
designated_by
has_value
state_name_1
delimits
170.6 sq. mi.
has_value
city_government_1
Florida
is_output_of
act_of_incorporation
has_incorporation_date
act_of _incorporation_1
has_value
occurs_on
July 15, 1887
act_of _incorporation_2
40
Benefits of Rule Library
  • No need to write different rules for different
    source formats
  • Changes to the ontology affect a single rule
    rather than some (possibly large) number of
    mappings
  • Allows mappings from source to target to be
    simple and possibly fully automated
  • Writing of rules can be performed by SMEs
  • Fine grained control of which rules are executed
  • by user group
  • above a stated level of priority (weighting)
Write a Comment
User Comments (0)
About PowerShow.com