Mining for Lexons: Applying Unsupervised Learning Methods to create Ontology Bases - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Mining for Lexons: Applying Unsupervised Learning Methods to create Ontology Bases

Description:

Mining for Lexons: Applying Unsupervised Learning Methods to create Ontology Bases – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 27
Provided by: Vero171
Category:

less

Transcript and Presenter's Notes

Title: Mining for Lexons: Applying Unsupervised Learning Methods to create Ontology Bases


1
Mining for LexonsApplying Unsupervised Learning
Methods to create Ontology Bases
  • Marie-Laure Reinberger, Peter Spyns,Walter
    Daelemans, Robert Meersman
  • CNTS - University of AntwerpSTARLab - Vrije
    Universiteit Brussel

2
Project OntoBasis
  • Elaboration and adaptation of Text Analysis
    tools for the building of specific domain
    Ontologies
  • The role of Ontologies in Database and Web
    Semantics

3
Outline
  • DOGMA
  • Text Mining
  • Syntactic analysisClusteringEvaluationResults
  • Conclusion

4
A DOGMA inspired ontology
  • Developing Ontology-Guided Mediation for Agents
    an ontology engineering approach
  • Ontology Base consisting of binary relations or
    lexons extracted automatically
  • Layer of Ontological Commitments (rules)

5
First step Ontology Base
  • Unsupervised extraction of relevant terms
  • Grouping or clustering of those terms
  • Medical domain
  • 4M corpus - Medline abstracts

6
Outline
  • DOGMA
  • Text Mining
  • Syntactic analysisClusteringEvaluationResults
  • Conclusion

7
Syntactic analysis
  • NP1Subject The/DT patients/NNS NP1Subject
    VP1 followed/VBD VP1 NP1Object a/DT '/''
    healthy/JJ '/'' diet/NN NP1Object and/CC
    NP2Subject 20/CD /NN NP2Subject VP2
    took/VBD VP2 NP1Object a/DT high/JJ level/NN
    NP3Object PNP P of/IN P NP physical/JJ
    exercise/NN NP PNP ./.

8
Data selection
  • Subject-Verb-Object structure selectional
    restriction functional relation
  • Structures selection according to frequency
  • Building of initial classes

9
Clustering algorithms
  • Soft clustering
  • Hard clustering
  • Clusters merging

10
Soft clustering
  • each verb associated to its class of co-occuring
    nouns
  • nouns clustered according to the similarity
    between classes of nouns
  • algorithm run as long as a cluster is modified

11
Soft clusters
  • - hepatitis infection disease cases syndrome-
    liver cirrhosis disease carcinoma HCC HBV virus
    method model- liver transplantation
    chemotherapy treatment- chemotherapy treatment
    vaccine injection drug- immunization vaccine
    vaccination

12
Hard clustering
  • each noun associated to its class of co-occuring
    verbs
  • nouns clustered according to the similarity
    between classes of verbs
  • each step measure lowered
  • cut according to the percentage of nouns clustered

13
Hard clusters
  • - month year- children infant- concentration
    number incidence use prevalence level rate-
    course therapy transplantation treatment
    immunization

14
Evaluation
  • Use of WordNet
  • Building of a set of WordNet pairs of nouns
  • Considering the set of WordNet pairs ? recall
    WN pairs in clusters / WN pairs ? minimum
    precision (mP) WN pairs in clusters /
    pairs in clusters composed of WN words ?
    extrapolated precision (eP) ext. WN pairs in
    clusters / ext. pairs in clusters composed of
    WN words

15
Comparison raw text vs parsed textsoft clustering
Parsed1 Clusters gt 15 elements
dismissed Parsed2 Clusters gt 15 and clusters lt 3
elements dismissed
16
Comparison raw text vs parsed texthard clustering
17
Raw text or parsed text?
  • Hard comparable results
  • Soft better for parsed text
  • N-grams gain of processing time
  • Parsed text possible labeling of the relation

18
Soft clustering vs Hard
Soft1 Clusters gt15 elements dismissed Soft2
Clusters gt 15 and clusters lt 3 elements dismissed
19
Soft or Hard?
  • SoftDifferent semantic dimensions
    consideredProblem too many clusters, too many
    associations
  • HardProblem only one semantic dimension of the
    word is considered

20
Merging hard and soft clusteringon disease
  • Hard clustering 1 cluster- disease transmission
  • Soft clustering 6 clusters- antigen hepatitis
    virus protein disease- prevalence infection
    correlation disease
  • Combination 2 clusters- hepatitis infection
    disease cases syndrome- liver cirrhosis disease
    carcinoma vaccine HCC HBV virus history method
    model

21
on chemotherapy
  • Hard clustering- (none) transplantation
    treatment
  • Soft clustering- hepatitis blood factor HBV
    doses chemotherapy treatment vaccine vaccines
    vaccination injection drug immunization- liver
    chemotherapy treatment transplantation
  • Combination- liver transplantation chemotherapy
    treatment

22
Evaluation results (150-200 words)
23
Turning to terms
  • Catch terminology items we have missed so far
  • Hard clustering on terms would allow a noun or
    part of a term to appear in different
    clustersface mask, mask, glove, protective
    eyewearimmunoadsorbent, immunoassay, immunospot

24
Prospects
  • Focus on terms and on the verb-object dependency
  • Filter the terminology
  • Use prepositional structures to establish links
    and build a network

25
Example
  • Cluster1 buy pay inc
  • Cluster2 month year president
  • Words found in WordNet buy pay month year
    president

WordNet pairs buy pay month year
Clustered pairs (WN words)buy pay month
yearyear president month president
Recall 2/2100Precision 2/450
26
Turning to terms
Write a Comment
User Comments (0)
About PowerShow.com