Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision Fusion. Project Report - PowerPoint PPT Presentation

About This Presentation
Title:

Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision Fusion. Project Report

Description:

Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision Fusion. Project Report Vladimir Gorodetski, Oleg Karsaev, Vladimir Samoilov – PowerPoint PPT presentation

Number of Views:228
Avg rating:3.0/5.0
Slides: 27
Provided by: Gor1151
Category:

less

Transcript and Presenter's Notes

Title: Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision Fusion. Project Report


1
Data Fusion and Semantic Web Meta-Models of
Distributed Data and Decision Fusion.Project
Report
  • Vladimir Gorodetski,
  • Oleg Karsaev,
  • Vladimir Samoilov
  • Intelligent System Laboratory of the
  • St. Petersburg Institute for Informatics and
    Automation
  • E-mail gor, ok, samovl_at_mail.iias.spb.su
  • http//space.iias.spb.su/ai/english/gorodetski.htm

2
Title of the Project
  • Autonomous Information Collection, Knowledge
    Discovery Techniques and Software Tool Prototype
    for Knowledge-Based Data Fusion
  • Project from
  • European Office of Aerospace Research and
    Development (EOARD) AFRL/IF (USA)
  • (December 2000 - December 2003)

3
Outline of the Project Presentation

1. Outline of the Data and Information Fusion
problems 2. Project research objectives 3.
Examples of case studies and applications used 4.
Ontology-centered meta-model of data sources 5.
Meta-model of decision fusion 6. Multi-agent
architecture 7. Conclusion
4
Tasks and Applications of Data and Information
Fusion
  • Application Fields
  • Critical areas of human society security,
    life support, security of critical state
    infrastructures, large-scale logistics, natural
    and man-made disasters, etc.
  • Examples of Applications
  • Assessment and prediction of situations,
  • Resource management and rescue operation planning
    in large scale natural and man-made disasters,
  • Decision making and planning of rescue operations
    in systems like US 911, Situational awareness and
    prediction for terrorist intents and
    anti-terrorist activity planning,
  • Military situation assessment,
  • Safeguard of critical plants like nuclear power
    stations, electrical power grids, etc.

5
Information Fusion-Definition
  • data fusion is a formal framework in which
    means and tools for the alliance of data
    originating from different sources are expressed.
    It aims at obtaining information of greater
    quality the exact definition of greater
    quality will depend on the application
    (JDL-Joint Directors of
    Laboratories model, USAF)

Level 0-Pre-processing of sensor data
Level 5-User refinement
Sensor 1
Level 1-Object assessment
Sensor 2
Level 2- Situation assessment
Distributed data sources

Level 3- Impact assessment
Sensor N
Human-Computer interface
Level 4-Process refinement
Areas of the current and Future research projects
are yellowed
Distributed information sources
Sensor management, resource management
(Erik Blash, Fusion-2002, July, 2002, Annapolis,
USA)
6
Project Research Objectives
  • Development of DF software tool providing
    support for design (first of all, for learning!)
    and implementation of DF applications of broad
    spectrum, in particular, providing support for
  • Development of ontology-based meta-models of data
    sources, meta-model of decision fusion and
    conceptual model of DF software tool,
  • Development of Multi-agent architecture and
  • Design and implementation of applications of
    broad spectrum.

7
Examples of case studies and application used in
Projects
  • Case studies
  • -KDD Cup99 dataset -- Preprocessed relational
    data specifying
    Intrusion Detection task
  • http//kdd.ics.uci.edu/databases/kdd
    cup99.html
  • -Landsat Multi-Spectral Scanner image dataset
  • http//www.dfc-grss.org/data/grss_d
    fc_0010.zip
  • -STULONG dataset Longitudinal Study of
    Atherosclerosis Risk Factors
  • http//euromise.vse.cz/challenge/en/
    projekt/index.php
  • Application to be used in debugging and
    validation of MAS DK-DF - Intrusion detection
    learning system (Project also funded by
    EOARD/AFRL)

8
Subtasks of the Project matching Semantic Web
Mining area
  • 1. Design and implementation of meta-model of
    data sources caused by heterogeneity and
    distribution of data to be fused.
  • 2. Design and implementation of meta-model of
    distributed learning.

9
Multiplicity of Data Sources Presenting Users
Activity in Intrusion Detection system
10
Interrelation of Semantic Web and
Ontology-oriented Research within the Project
Semantic Web considers development and
standardization of the ontology specification
languages (XML, RDF, DAMLOIL), ontology-based
query languages, ontology editors, etc).
Semantic Web Mining considers specific problems
of ontology design technology for (Web-based)
Data Mining systems. Any DF system technology
supposes (Web-based) distributed Data Mining and
KDD and that is why it is a sub-area of the
Semantic Web Mining. Ontology-based Data and
Information Fusion system design put a number of
specific problems of technological sort. Among
them, the most important one is a technology for
distributed design of distributed ontology.
11
What is distributed design of distributed
ontology? Data Sources Meta-model
.
Meta-model Ontology Data source models at
meta-level supporting a unified view of data of
particular sources
12
DF system ontology
13
Distributed Ontology and Protocols for
Distributed Ontology Design
Meta-level KDD Agent
Protocols, Functions
Protocols, Functions
Agent 1
Agent k
.
Agent 2
Agent 3
KDD Master Agent
Protocols, Functions
Protocols, Functions
Protocols, Functions
14
Particular Tasks to Be Solved on the Basis of
Meta-model of Data Sources
  • Providing for monosemantic understanding of
    terminology used in data specification by
    distributed analysts
  • Solution of the entity identification problem
  • Providing consistency of data representation (in
    case if the same attributes are presented
    differently in different data sources)
  • Providing a gateway between ontology and
    distributed databases accessibility making
    possible interaction between ontology and
    distributed databases, and several other tasks.

15
Meta-model of Data Sources Ontology Protocols
gt Monosemantic understanding of terminology

Monosemantic understanding of terminology among
DF system components is provided by shared
vocabulary used by DF system distributed entities
for communication. This excludes different
naming of the same entities and their properties
in different sources, and equal naming of
different entities within different data sources
thus providing integrity and consistency of
shared vocabulary. Protocols Supports
distributed collaborative design of coherent
ontology by distributed analysts.
16
Example of Application Ontology High-level Part
of Intrusion Detection Domain Ontology
A
Network attack
Reconnaissance
R
ABE
CI
Applications and Banners Enumeration
Collection of Information
Implantation and threat realization
IS
UE
Identification of services
Users and Groups Enumeration
IO
I
Creating Back Doors
Identification of OS
IH
Resource Enumeration
RE
Getting Access to Resources
CBD
Identification of hosts
SPIH
CT
Network Ping Sweeps
DC
Port Scanning
Proxy scanning
Covering Tracks
GAR
GAD
ER
ST
Gaining Additional Data
PS
TCP connect scan
Escalating Privilege
Dumb host scan
TR
Threat Realization
SS
Notions of micro-layer
DHS
TCP SYN scan
DOS
ID
SFB
CD
SF
Scanning 'FTP Bounce'
Denial of Service
Confidentiality destruction
Integrity destruction
TCP FIN scan
SN
SX
TCP Null scan
HS
"Part of" relationship
SU
TCP Xmas Tree scan
Half scan
Subclass of" relationship
UDP scan
N o t i o n s o f l o w e r l e v e l s
17
The Simplest ("top-down") Meta-protocol for
Collaborative Ontology Design
18
Ontology Synchronization Protocol Represented in
Terms of UML-sequence Diagram
Legend 1. Local source expert 2. Local source
data managing agent 3. Local source
ontology 4. Local source buffer of
temporary changes 5. KDD master (Meta- data
description agent) 6. Shared ontology 7.
Meta-level agent buffer of temporary
changes 8. Application expert
(meta-level) 9. Local source determining the
modified ontology part
19
Meta-model of Data Sources Entity
Identification Problem
Explanation of Entity Identification Problem
Data Source 3
of case Attributes of Data source 3
1
2
4
8
9
11
14
15
Data Source 2
Data Source 1
of case Attributes of Data Source 2
1
4
5
9
11
12
14
15
17
19
of case Attributes of Data source 1
1
3
4
7
9
11
15
19
20
Demonstration of Entity Identification Problem
Intrusion Detection Application
21
A Technique for Entity Identification Problem
  • In the DF problem ontology, for each instance of
    an object to be classified, the notion of entity
    identifier ("ID entity") is introduced. This
    entity identifier plays the role of the primary
    key of the instance (in analogy with the primary
    key of a table).
  • For each such identifier, a rule as a component
    of the shared part of application ontology is
    defined, which can be used to calculate the value
    of the instance key. A rule is a function which
    arguments are chosen from the set of this entity
    attributes. A rule is defined for each local data
    source to uniquely connect the entity identifier
    and the local primary key in this source. This
    rule specifies
  • how to derive the local primary key of instance
    from the entity identifier value
  • how to derive the entity identifier value from
    the value of the local primary key of an instance
    of the source.

22
Meta-model of Data Sources Diversity of
Measurement Scales of the Same Attributes in
Different Data Sources
  • Let X be an attribute in application ontology
    that is measured
  • differently in different sources.
  • In the shared component of application ontology,
    the type and the measurement unit of the
    attribute X are determined. Selection of
    attribute X specification within shared part of
    application ontology is made by experts during
    negotiations according to a synchronization
    protocol.
  • In all the sources where X is present,
    expressions are determined for this attribute,
    through which it can further be converted into
    the same scale in all the sources.
  • This allows using the values of attributes
    on the meta-level regardless of the data source
    from which they originated.

23
Meta-model of Data Sources Interaction of
Ontology and Databases of Sources
  • The task arises due to the fact that
    application ontology entities are specified in
    terms of ontology notions but their instances are
    represented in terms of database language.
  • To provide interaction of ontology and
    databases of sources (accessibility of data
    requested in ontology terms) , a special gateway
    is developed.

Application
DF problem ontology
DF application ontology
Client-gateway
DF problem ontology
DF Application ontology
Local source data properties
Access via VIEW objects
Database objects
Local data source
Three-level hierarchy of access to the database
objects
24
Meta-model of Distributed Learning
  • Components of meta-model of distributed learning
  • Meta-model of decision making and combining
    decisions of multiple base-level classifiers
  • Model of distributed data management (allocation
    training and testing data sets for learning
    particular classifiers management by computation
    of meta-data for upper level example-based
    learning, etc.)
  • Approaches and formal techniques used for
    combining decisions.

25
Meta-model of Data Fusion Hierarchy of
Classifiers and Combining Decisions
To DF system meta-level classifier
To DF system meta-level classifier
To DF system meta-level classifier
Meta-level classifier of source
Base classifier 2
Base classifier 1
Base classifier k
...
Local database (database of source)
Variant 2
26
Meta-model of Data FusionDistributed data
management
  • Distributed data management that is allocation
    training and testing data sets for learning
    particular classifiers management by computation
    of meta-data for upper level example-based
    learning, etc. These tasks are solved through
    using in DF system special agents operating on
    source-located components and meta-level
    component of DF system.
  • These agents solve the task in question through
    special negotiation protocol under management of
    local source and meta-level analysts.

27
Meta-models of Training and Testing Data An
Example
Results of distributed data management for the
case study KDDCup-99 for two data sources
28
Meta-model of Data Fusion Approaches for
combining decisions-1
  • 1. Meta-classification scheme for combining
    decisions
  • (based on stacked generalization)

Meta-learning level
Result Meta-classifier
Algorithm for learning meta-classifier
Meta-classifiers training and testing data
("meta-data")
Testing data
Base Classifier 2 to be learned
Base Classifier 1 to be learned
Base Classifier k to be learned

Algorithms for Base classifier learning
Data
Legend
KDD algorithms
Resulting classifiers
29
Meta-model of Data Fusion Approaches for
Combining Decisions-2
Competence-based Approach
Referee 1
Decision of the most competent classifier
Referee 2
..
Partition of learning data for classifier training
Partition of learning data for referee training
..
Selection of the most competent classifier and
its decision
Referee K
Correctly classified examples
Training and testing data
Erroneously classified examples
30
Architecture of DF Software Tool
  • Architecture of the source-based component of DF
    software tool

Local data source
User interface
Data source managing agent
Local classification agents of DF system
Testing
Base classifier
Base classifier
KDD agent
Base classifier
Meta-classifier
Training
Referee
Server (library) of learning methods
To the Meta-classification agent
To the KDD Master
31
Architecture of DF system
  • Architecture of meta-level component of DF
    software tool

Agent-classifier of meta-level
KDD Master agent
Meta-classifier
Meta-classifier
Meta-classifier
User interface
Referee
Meta-level KDD agent
Inference engine
Server of learning methods
To the Data source managing agent
To the KDD agent
Local classification agents
32
Conclusion Future work
  • .
  • 1. Development of sophisticated ontology editor
    supporting distributed design of a distributed
    ontology.
  • 2. Further design and Implementation of Data
    Fusion System software tool for development and
    implementation of particular distributed
    applications in Data Fusion area.

33
Thank you!
  • For more information and related publications
    please contact
  • E-mail gor_at_mail.iias.spb.su
  • http//space.iias.spb.su/ai/english/gorodetski.htm

Acknowledgement This research is funded by
AFRL/IF (EOARD), 1999-2003
Write a Comment
User Comments (0)
About PowerShow.com