Ontology Learning - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Ontology Learning

Description:

System won't work on new examples. Remember: goal of learning is generalization of training data ... groups to make them easier to identify, study, or locate. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 39
Provided by: davids245
Category:

less

Transcript and Presenter's Notes

Title: Ontology Learning


1
Ontology Learning
  • David Salz
  • david.salz_at_snafu.de

2
Motivation (1)
Motivation
  • Modeling / maintaining ontologies by hand is
  • Slow
  • Expensive
  • Problem about the Semantic Web is not using the
    meta-data but providing it!
  • Automated help is necessary
  • There is plenty of knowledge in the form of
    natural language text available on the web

3
Motivation (2)
Motivation
  • Are hand-made ontologies really better than
    machine-made ones?
  • Modeling and maintaining ontologies by hand is
    also
  • Biased
  • Error prone

4
Motivation (3)
Motivation
  • Goal
  • Discover ontologies by analyzing natural language
    documents
  • With or without the help of human experts
  • Save time and work

5
Machine Learning
Machine Learning
6
Learning
Machine Learning
  • Learning
  • the alteration of behavior as a result of
    individual experience. When an organism can
    perceive and change its behavior, it is said to
    learn.
  • Learning. Encyclopædia Britannica.
  • Retrieved June 28, 2003, from Encyclopædia
    Britannica Premium Service.http//www.britannica.
    com/eb/article?eu48642

7
Machine Learning
Machine Learning
  • Problem x ? f(x)
  • Function f is too difficult to compute or unknown
  • Solution Learning f (or an approximation of f)
  • Hypothesis is updated with experience

Hypothesis
x ?
? f(x)
8
Machine Learning
Machine Learning
  • System is fed with training data
  • Supervised learning
  • A human teacher gives feedback
  • Or the training data is pre-classified
  • Unsupervised learning
  • Training data is raw system must discover
    patterns
  • We will concentrate on unsupervised learning

9
Problems What does the system learn?
Machine Learning
  • Training Data
  • What you want
  • What the system really learns

10
Problems Overfitting (1)
Machine Learning
  • Training Data
  • What you want
  • What the system really learns

11
Problems Overfitting (2)
Machine Learning
  • Lerner adapts too well to training data
  • System wont work on new examples
  • Remember goal of learning is generalization of
    training data
  • Reasons
  • Learning time too long
  • Training data too special or inconsistent

12
Evaluation
Machine Learning
  • It is important to evaluate the quality of
    discovered knowledge
  • Evaluation must be done with independend test
    data, NOT with the training data
  • If necessary, split the available data into test
    data and evaluation data beforehand

13
Machine Learning
14
Text Processing
Text Processing
15
Text Processing
Text Processing
to offer
to wish
16
Text Processing
Text Processing
  • Challenges
  • Recognize different grammatical forms
  • Identify Names
  • Identify compounds
  • Identify important keywords / unimportant fillers
  • Solutions
  • Existing ontologies
  • Dictionaries
  • Grammars

17
Learning Taxonomies
Learning Taxonomies
18
Taxonomies
Learning Taxonomies
  • structures that provide a way of classifying
    things - living organisms, products, books - into
    a series of hierarchical groups to make them
    easier to identify, study, or locate.
  • Jean Graef
  • 'Managing taxonomies strategically'

19
Lexico-Syntactic Patterns (1)
Learning Taxonomies
  • Use Regular expressions to find patterns that
    describe a semantic relation
  • X is a n X
  • An apple is a fruit
  • X ,X , or other X
  • Apples, pears, cherries and other fruit
  • X, in particular X
  • Fruit, in particular apples
  • X , X , and X are expamles for X
  • Apples, pears and plums are examples for fruit

20
Lexico-Syntactic Patterns (2)
Learning Taxonomies
Operating Systems
Unix
21
Lexico-Syntactic Patterns (3)
Learning Taxonomies
?????
  • No method is ever 100 accurate
  • Possible Solutions
  • Verifying by hand
  • Statistic approaches (dont believe everything
    youve seen only once or twice)
  • Better patterns / better preprocessing

22
Lexico-Syntactic Patterns (4)
Learning Taxonomies
  • Advantages and Disadvantages
  • () quite accurate
  • () works on small set of data
  • (-) bad scaling
  • (-) patterns have to be created somehow
  • - by hand
  • - through other automatic techniques

23
Statistical Clustering (1)
Learning Taxonomies
  • Form clusters of similar words
  • Split or merge the clusters form a hierarchy

Windows, Unix, Solaris, OS, Operating System
Apple, Cherry, Banana,Fruit, price
Windows, Unix, Solaris
OS, Operating System
24
Statistical Clustering (2) Forming Clusters
Learning Taxonomies
  • Similarity Measure a mathematical way to express
    how similiar two words are
  • Possible similiarity measures between words
  • Words that often appear the same sentence /
    paragraph / text
  • Words that often appear with the same verb
  • Exploit existing ontologies, if possible!

25
Statistical Clustering (3)
Learning Taxonomies
  • Building a hierarchy from clusters
  • Top-down start with one big cluster and split it
  • Bottom-up start with one-word clusters and join
    them

26
Statistical Clustering (4)Splitting and Joining
Clusters
Learning Taxonomies
  • Use the similarity measure between words to
    calculate
  • similarity of two clusters join the two most
    similar clusters
  • or
  • the coherence of a cluster split the most
    incoherent cluster

27
Statistical Clustering (5)Splitting and Joining
Clusters
Learning Taxonomies
  • Computing similarity / coherence of two clusters
  • Single linkage similarity of two most similar
    objects counts
  • Complete linkage similarity of two least
    similar objects counts
  • Group-Average average similarity of all objects
    is calculated

28
Statistical Clustering (6)
Learning Taxonomies
  • Advantages and Disadvantages
  • () good scaling
  • (-) less precise than symbolic approach
  • (-) finds relations between words, but cannot
  • name the relation must be done by hand

29
Learning Relations
Learning Relations
30
Relation Learning
Learning Relations
  • Assume we have a taxonomy
  • We want to use an that taxonomy to discover other
    relations between the concepts

31
Transactions (1)
Learning Relations
  • A transaction is a set of concepts that occured
    together

Unix development startet in the late 1960s at
Bell Labs T Unix, development, Bell Labs
32
Transactions (2)
Learning Relations
  • Extend transaction to include all parent
    concepts

Operating System
Company
Unix
Bell Labs
T Unix, development, Bell Labs, Operating
System, Company
33
Association Rules (1)
Learning Relations
  • An association rule is an expression in the form
    X ? Y
  • Like Bell Labs ? Unix
  • We dont know what exactly the relation is at
    first, we only want to find out if there is a
    relation

34
Association Rules (2)
Learning Relations
  • We consider all pairs of concepts from our
    transaction as association rules
  • UNIX ? Bell Labs
  • Bell Labs ? Unix
  • Company ? Unix
  • Company ? Operating System
  • ...
  • Which of these rules are the best?

35
Support and Confidence
Learning Relations
  • Support of an association rule X ? Y
  • Percentage of transactions that contain both X
    and Y
  • Confidence on a rule X ? Y
  • Percentage of transactions in which Y appears out
    of those in which X appears
  • You can use confidence and support to compare
    rules and select the best candidates

36
Selecting the Results
Learning Relations
  • Select the rules with best support or confidence
  • E.g. user threshold for support
  • Present Result to the user
  • User gives names to useful rules
  • Bell Labs developed Unix
  • Prune rules if you have more general rules with
    better confidence
  • More general Bell Labs developed Operating
    System

37
References
  • Maedche, Pekar, Staab Ontology Learning Part
    One On Discovering Taxonomic Relations from the
    Web
  • http//www.ontoprise.de/documents/web-intelligenc
    e-book.pdf
  • Maedche, Staab Discovering Conceptual Relations
    from Text
  • Berendt, Hotho, Stumme Towards Semantic Web
    Mining
  • Maedche Development and Application of
    Ontologies (Tutorial)

38
The End
Thank you for your attention! Questions /
Comments?
Write a Comment
User Comments (0)
About PowerShow.com