Title: Information Ontologies for the Intelligence Communities A Survey of DCGS-A Ontology Work
1Information Ontologies for the Intelligence
CommunitiesA Survey of DCGS-A Ontology Work
- Ron Rudnicki
- November 12, 2013
2Topics
- The DCGS-A ontology suite
- Standard operating procedures and ontology
quality assurance - Annotation vs. Explication
- How the DCGS-A ontologies are being used for the
explication of data models
3The DCGS-A Ontology Suite
4Motives for Ontology Development
- Part of a Big Data solution
- Multiple formats including free text,
semi-structured and structured - Some surprise data sets are made available a
short time prior to system testing - Data sets will change along with domain of
interest - Data can not be collected into a single store
- Provide cross-source searching and analytics
- Need to maintain the provenance of data
5Contribution of the Ontologies
- Design choices affect the outcome
- Common Upper Level Ontology The ontologies
extend from a common upper level ontology - Delineated Content - Each ontology has a clearly
specified and delineated content that does not
overlap with any other ontology - Composable Content Classes in the ontologies
represent entities at a level of granularity that
can be composed in various ways to map to terms
in sources
6Integration Through a Common Upper Level Ontology
- Encourages uniform representations of domains
Entity
Object
Quality
bearer_of
Organization
Quality of Physical Artifact
Quality of Organization
Physical Artifact
has_quality
has_quality
- Provides common patterns within the target
ontology for mappings from the sources - Easier to include new sources of data
- Enables more uniformity between queries
- Easier to transition to domains of interest
CUBRC - Proprietary
7Integration Through Delineated Content
- Each class in the target ontologies is defined in
one place
Entity
Object
Organization
Physical Artifact
Spatial Location
located_at
located_at
- Facilitates locating a class within the target
ontologies - Provides better recall in queries
- Less likely to overlook relevant data
CUBRC - Proprietary
8Integration Through Composition of Classes
Data Source 2
- Granular classes better accommodate mappings
from various perspectives on the same domain
without loss of information
prescribes
Model
Car
has quality
Full Size
manufactures
Length of Wheelbase
Manufacturer
Mid Size
is nominally measured by
Compact
CUBRC - Proprietary
9High Level Depiction of Domain
- Provides Coverage of Domain of Human Activity
Actions
to perform
People Organizations
Artifacts
that take place in
use
Natural Artificial Environments
are distinguished by
Time
Attributes
10Developed Using a Top-Down Bottom-Up Strategy
- Partial List of Data Sources Used
- Treasury Office of Foreign Assets Control
Specially Designated Nationals and Blocked
Persons - NCTC Worldwide Incidents Tracking System
- UMD Global Terrorism Database
- RAND Database of Worldwide Terrorism Incidents
- LDM version .60 (TED)
- VMF PLI
- DCGS-A Global Graph
- DCGS-A Event Reporting
- BFT Report (CCRi test data)
- Cidne Sigact (CCRi test data)
- Long War Journal
- Harmony Documents from CTC at West Point
- Threats Open Source Intelligence Gateway
11Based Upon Standards
- Partial List of Doctrine and Standards Used
- DOD Dictionary of Military and Associated Terms
(JP 1-02) - JC3IEDM
- Counterinsurgency (FM 3-24)
- Operations (FM 3-0)
- Multinational Operations (JP 3-16)
- International Standard Industrial Classification
of all Economic Activities Rev.4 (ISIC4) - Universal Joint Task List (CJSCM 3500.04C)
- Weapon Technical Intelligence (WTI) Improvised
Explosive Device IED Lexicon - Information Artifact Ontology (IAO)
- Phenotype and Trait Ontology (PATO)
- Foundational Model of Anatomy (FMA)
- Regional Connection Calculus (RCC-8)
- Allen Time Calculus
- Wikipedia
12Current DCGS-A Ontology Architecture
13Ontology Metrics
Ontology Name Number of Classes Number of Relations Equivalent Class Axioms Subclass Of Axioms
Agent Ontology 986 71 378 1004
AIRS Emotion Ontology 73 88
AIRS Mid-Level Ontology 516 8 221 641
Artifact Ontology 298 3 310
Event Ontology 409 2 423
Extended Relation Ontology 45
Geospatial Ontology 297 14 13 316
Information Entity Ontology 83 29 21 83
Quality Ontology 681 2 681
Relation Ontology 20
Time Ontology 16 22 30
Totals 3359 209 640 (19) 3576 (106)
14Standard Operating Procedures and Ontology
Quality Assurance
15Semantic Conformance Testing
- An importing ontology reuses a term from another
and adds to its content in some way - adds an axiom to some upper-level term.
- the imported class inherits content from parent
classes of the importing ontology - Corrective action
- request that the curators of the ontology that is
the source of the class add the content - If not possible, then plan for revision of import
architecture - the importing ontology should introduce a subtype
of the term to which the content could then be
added.
16Semantic Conformance Testing
- Defining a class to be a subtype of more than one
superclass - Corrective action
- remove any subclass assertions that are false
(e.g. Bank subClassOf Organization, Bank
subClassOf Facility) - refactor superclasses into disjoint classes
- write axiom so that the multiple inheritance
exists in the inferred hierarchy rather than the
asserted hierarchy
17Semantic Conformance Testing
- Extending an ontology by introducing terms as
child terms of a higher-level ontology using
another relation (e.g. part of, is narrower in
meaning than) - Corrective action
- Place the terms into their appropriate place in
the taxonomy
18Semantic Conformance Testing
- a term from a lower level is not a subclass of
any class of the ontologies it imports - containment requires that the domain covered by a
lower-level ontology be circumscribed by the
domain covered by the higher-level ontology from
which it extends. - Corrective action
- Add the class (or an appropriate superclass) to
the appropriate higher-level ontology - Import a higher-level ontology that does provide
a superclass
19Semantic Conformance Testing
- an ontology includes information model assertions
that are not true of the domain - e.g. carrying over a not null constraint as in
every person must have an email address - Corrective action
- Make needed modifications to axiom (generally the
source of such violations) so that it conforms to
the domain - e.g. every person that has purchased from
amazon.com must have an email address
20Semantic Conformance Testing
- a class is a set-theoretic combination of other
classes - Corrective action
- Add the class as a new type (College or
University gt Higher Education Organization)
21Calculating Value of Ontology Terms
- Provide some basis for class inclusion/exclusion
- The content of ontologies used in an enterprise
will be the subject of debate and possibly,
disagreement - Having one or more metrics that are proven
measures of value would help resolve such
disagreements - Current methods are often applied to ontologies
in their entirety (e.g. Swoogle), fewer are
designed to evaluate value of ontology classes
and properties
22Calculating Value of Ontology Terms
- Statistical Methods Supplemented by Weightings
- A purely statistical method applied to an
ontology as a graph will undervalue isolated
terms that are of importance in a domain - Importance, is at least a function of amount of
use and criticality - Usage is tractable to definition, criticality
less so
23Annotation vs. Explication
24Mappings
- Many of the purposes for which ontologies are
built will be realized only to the degree to
which they are linked to data - One component of mapping is an act of translation
and should be assessed on the degree of
equivalence between source and target - Another component of mapping is an implementation
and should be assessed on performance criteria
such as costs and scalability - Techniques and technologies vary
An introductory overview can be found at
http//www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_S
urveyReport_01082009.pdf
25Mappings
- Hashtags the subjective assignment of uncurated
keywords to a source - Annotations rule based assignment of curated
terms to a source - Machine maps automated, structure-based
translation of source into target vocabulary - Definitions rule based expansion of source
terms into types and differentiating attributes - Explications rule based translation of all
semantic content (including that which is
implicit) of a source using terms and relations
of the ontology
Term Mappings
Assertion Mappings
26Mappings
- Term mappings
- Can be automated
- Enable faceted queries (Select JFK as type
Airport) - Can result in significant loss of information
- Not reuseable
- Assertion mappings
- Manual process that does not scale
- Requires extensive knowledge of the target
ontology - Enables navigational queries
- Improves integration of data sources
- Can result in significant carry over of source
information - Not reuseable
27Assessing Current Mapping Methods
No Ideal Instances
Low
Hashtags
Annotations
Machine Maps
Definitions
Time/Money
Explications
High
Translation
Lossy
Lossless
28Examples of Mappings
A Source of Data About Cities
CityId Name State IncorporationDate Area Coordinates
1 Tampa Florida July 15, 1887 170.6 sq mi. 27 5650 N 82 2731 W
2 Boston Massachusetts March 4, 1822 89.63 sq. mi. 42 2129 N 71 0349 W
3 Dallas Texas February 2, 1856 385.8 sq. mi. 32 4658 N 96 4814 W
4 Los Angeles California April 4, 1850 503 sq. mi. 34 03 N 118 15 W
29Explication of the Source as an End Point
Coordinates
City Name
City
designated_by
designated_by
part_of
Area
delimits
State
has_quality
City Government
designated_by
State Name
participates_in
Act Of Incorporation
occurs_on
Incorporation Date
30Explication Implementation Example
- A Portion of a D2RQ File Mapping Birth Place and
Date
- mapPersonBirth
- rdftype d2rqClassMap
- rdfslabel "Person Birth"
- d2rqclass eventBirth
- d2rqclassDefinitionLabel "Treasury OFAC
Person Birth" - d2rqdataStorage mapKDD-02-B-Treasury-SDN
- d2rquriPattern "treasurydata_PersonBirth/_at__at_
TreasuryPerson.idurlify_at__at_" . - mapPersonBirthTemporalInterval
- rdftype d2rqClassMap
- rdfslabel "Person Birth Temporal Interval"
- d2rqclass spanTemporalRegion
- d2rqclassDefinitionLabel "Treasury OFAC
Person Birth Temporal Interval" - d2rqdataStorage mapKDD-02-B-Treasury-SDN
- d2rquriPattern "treasurydata_PersonBirthTem
poralIdentifier/_at__at_TreasuryPerson.idurlify_at__at___at__at_Tre
asuryPerson.dateofbirthlist_uidurlify_at__at_" . - mapPersonBirthTemporalIntervalIdentifier
- rdftype d2rqClassMap
- rdfslabel "Person Birth Temporal Interval
Identifier"
31Explication Current Method
- The full mapping of birth place and date consists
of 16 such blocks - The full mapping of the entire table consists of
150 such blocks - If the ontologies change, so must the mappings
- Common patterns in the ontologies make some
re-use possible by adding placeholders to
portions of maps and replacing them with specific
values for the source at hand. - Applications exist or are under development to
auto-generate initial mappings that a human can
then edit
32Explication Current Method
- The improvements are source and implementation
specific - What works for structured sources mapped in D2RQ
cant be reused in structured sources mapped in
other languages (R2RML, EDOAL) - Separate mappings would be needed for sources
expressed in XML, HTML or free text - Another solution is needed
33How the DCGS-A Ontologies are Being Used for the
Explications of Data Models
34Start with Machine Made Assertion Mappings
- Type to Type mapping (e.g. table column to class)
- Relationships between types expressed using a
default generic object property - Meta-data about the source entity (e.g. table
name, column name, element name) is mapped to
annotation properties (rdfslabel)
35Machine Made Assertion Mapping as a Starting Point
Name
Coordinates
has_coordinates
has_name
City
has_area
Area
has_state
State
has_incorporation_date
Class mappings created by associating the
container with the components with a generic
property
Incorporation Date
36Current Content of Ontologies is not Well Used
- Ontologists are trained to associate subclass and
equivalence axioms to classes - OWL reasoners dont expand the graph by creating
instances based upon these axioms - OWL reasoners are resource expensive and often
result in unimpressive output - Not much control can be exerted upon which
inferences an OWL reasoner performs
37Create a Library of Rules
- Change the relationship and type of the name of a
city
- CONSTRUCT ?city exdesignated_by ?cityname .
- ?cityname rdftype exCityName .
- WHERE
- ?city rdftype exCity .
- ?cityname rdftype exName .
- ?city ?related_to ?cityname .
- NOT EXISTS ?city exdesignated_by ?cityname .
-
38Create a Library of Rules
- Delete the original relationship and type
- DELETE ?city ?related_to ?name .
- ?name rdftype exName .
- WHERE
- ?city ?related_to ?name .
- ?name rdftype exName .
- ?city exdesignated_by ?name .
- ?name rdftype exCityName .
39The Affect of Such Rules on Translated Data
city_name_1
coordinates _1
has_text_value
Tampa
designated_by
has_value
designated_by
27 5650N 82 2731W
city_1
part_of
area_1
state_1
has_quality
designated_by
has_value
state_name_1
delimits
170.6 sq. mi.
has_value
city_government_1
Florida
is_output_of
act_of_incorporation
has_incorporation_date
act_of _incorporation_1
has_value
occurs_on
July 15, 1887
act_of _incorporation_2
40Benefits of Rule Library
- No need to write different rules for different
source formats - Changes to the ontology affect a single rule
rather than some (possibly large) number of
mappings - Allows mappings from source to target to be
simple and possibly fully automated - Writing of rules can be performed by SMEs
- Fine grained control of which rules are executed
- by user group
- above a stated level of priority (weighting)