A Domain Ontology Engineering Tool with General Ontologies and Text Corpus - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Description:

Collocation matrix. ai,j = fi before f j ...f8 f4 f3 f7 f8. f4 f1 f3 f4 f9 f2. f5 f1 f7 ... WordSpace is a collocation of ?(w) NTRL. Extraction of Concept Pairs ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 29
Provided by: loca188
Category:

less

Transcript and Presenter's Notes

Title: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus


1
A Domain Ontology Engineering Tool with General
Ontologies and Text Corpus
  • Naoki Sugiura,
  • Masaki Kurematsu,
  • Naoki Fukuta,
  • Naoki Izumi,
  • Takahira Yamaguchi

2
DODDLE and DODDLE II
  • Domain Ontology rapiD DeveLopmet Environment
  • Builds taxonomic and non-taxonomic relationships
  • Uses dictionary approach and text corpus (body)
    to build relationships

3
DODDLE DODDLE II
  • Large Ontologies are difficult to build by hand
  • Locates relationships between words based on
    context similarities even if separated
  • Disadvantages
  • Human Interaction is still required
  • Low amount of success

4
DODDLE vs DODDLE II
  • DODDLE only works on taxonomic relationships
  • DODDLE II
  • Extension of DODDLE
  • Finds non-taxonomic relationships

5
Outline
  • Overview
  • Taxonomic Relationships
  • Non-Taxonomic Relationships
  • Case Studies
  • Problems/Future Work
  • Conclusion
  • Assessment

6
Overview
Domain Terms
Domain Specific Text Corpus
Concept Extraction Module
NTRL Module
TRA Module
7
Overview TRA Module
Matched Result Analysis
MRD (Wordnet)
Trimmed Result Analysis
Modification using syntactic strategies
Taxonomic Relationship
8
Overview NTRL Module
Extraction of frequent words
WordSpace creation
Domain Specific Text Corpus
Extraction of similar concept pairs
Concept specification templates
Non-Taxonomic Relationship
9
Overview
Overview
Taxonomic Relationship
Non-Taxonomic Relationship
Interaction Module
10
TRA Module
Matched Result Analysis
MRD (Wordnet)
Trimmed Result Analysis
Modification using syntactic strategies
Taxonomic Relationship
11
TRA
  • Matched Result Analysis
  • Constructs PAB and STM
  • Trimmed Result Analysis
  • Remove unnecessary nodes
  • Modification using statistical strategies
  • Allows for human input

12
PAB and STM
13
TRA
14
NTRL Module
Extraction of frequent words
WordSpace creation
Domain Specific Text Corpus
Extraction of similar concept pairs
Concept specification templates
Non-Taxonomic Relationship
15
NTRL
  • Extraction of key words
  • Primitive 4 words
  • Collocation matrix
  • ai,j fi before f j

f8 f4 f3 f7 f8 f4 f1 f3 f4 f9 f2 f5 f1 f7 f1 f5
16
NTRL
  • WordSpace Creation
  • Context Vectors
  • Word Vectors
  • Sum of Context Vectors
  • ?(w)? ( ? f(f))
  • ie C(w) f close to i

a 4-gram vector of a 4 gram f
A vector representation of a word of phrase w
Appearance places of a word or phrase w
WordSpace is a collocation of ?(w)
17
NTRL
  • Extraction of Concept Pairs
  • Each input has a best-matched synset
  • Synset collection of word vectors
  • Sum of the word vectors set to a concept which
    corresponds with each input term
  • Inner product of all combinations of concept
    pairs
  • Match is determined by user set threshold
  • Case Study .87

18
NTRL
  • Finding Association Rules
  • Locates Rules of the form

19
NTRL
  • Constructing Concept Specification Templates
  • Set of Similar concept pairs and association
    rules
  • DODDLE sets priorities between concept pairs
  • Based on TRA Module and Co-occurrence information

20
Case Study
  • Law-Contract for International Sale of Goods
  • Business -XML Common Business Library
  • Support 0.4
  • Confidence 80

21
Law Case Study
  • Given 46 Concepts
  • WordSpace 77 concept pairs
  • Association between input terms 55 pairs or
    terms
  • Templates

22
Business Case Study
  • Input 57 terms
  • Wordspace 40 pairs
  • Association between input terms
  • 39

23
Taxonomic Results
Bus. Precision Recall per path Recall per subtree
Matched Result .2 .29 .71
Trimmed Result .22 .13 .5
Law Precision Recall per path Recall per subtree
Matched Result .25 .23 .19
Trimmed Result .3 .3 .15
24
Non-taxonomic Results
Law WS AR Join of WS and AR
Extracted Concept Pairs 77 55 117
Accepted Concept Pairs 18 13 27
Precision .23 .24 .23
Recall .38 .27 .56
Bus. WS AR Join of WS and AR
Extracted Concept Pairs 40 39 66
Accepted Concept Pairs 30 20 39
Precision .75 .51 .59
25
Problems/ Future Work
  • Threshold
  • Changes with each domain
  • Specification of a Concept Relation
  • Still need to specify relationships
  • Ambiguity of Multiple Terminology
  • transmission
  • Semantic specialization of multi-definition words
    needed.
  • DODDLE-R
  • Uses RDF tags

26
Conclusion
  • Uses MRD and text corpus
  • Two strategies for taxonomic matched result
    analysis and trimmed result analysis
  • Non-Taxonomic extracted by co-occurrence
    information in text corpus
  • Concept Specification a way to eliminate concept
    pairs to build an ontology

27
Assessment
  • Designed to be a tool
  • No time results
  • Determining thresholds is plug-and-guess.

28
Questions ?
Write a Comment
User Comments (0)
About PowerShow.com