Ontology Learning - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Ontology Learning

Description:

... be done manually or using reverse engineering tools. Second, merging and ... The identified dictionary words are used with the concept clustered verb and ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 30
Provided by: cs6
Category:

less

Transcript and Presenter's Notes

Title: Ontology Learning


1
Ontology Learning
  • For the Semantic Web

2
The Paper Itself
  • Based around two products OntoEdit and
    Text-to-Onto.
  • A rather foundational approach to the problems
    surrounding information extraction.
  • Occasionally, some really weak sentence
    structure.
  • Problems with inconsistent example use.
  • A frustrating exercise in presenting questions
    and not the answers.

3
The Web(our semantic battleground)
  • The Web was created as a free-form information
    space.
  • Made for human comprehension, not machine
    understanding.
  • From experience with the web there seems to be
    some inherent aversion to correct speeling or
    gRamar.

4
Machine Semantics
  • Computers, while originally designed to
    understand a series of electrical pulses, have
    had that same vocabulary expanded to be able to
    also evaluate letters, booleans, and numbers.
  • -Brian Goodrich

5
Ontologies
  • Ontologies are metadata schemas.
  • Controlled vocabulary of concepts
  • Machine understandable semantics
  • Define shared domain conceptualizations
  • (e.g. website to website, people to machines,
    etc.)

6
The Assumption
  • If every internet webpage had an associated
    perfect ontology that was just as accessible as
    the selfsame webpage, the creation of the
    semantic web would be only as far away as the
    creation of a browser that can find and interpret
    those ontologies and extract information based
    upon those models.
  • -Brian Goodrich

7
The Knowledge Bottleneck
  • manual acquisition of ontologies still remains
    a tedious cumbersome task resulting in a
    knowledge acquisition bottleneck.
  • Steffen Staab

8
The Challenge
  • In Overcoming this knowledge acquisition
    bottleneck the authors took a three-fold
    approach
  • Time (Can you develop an ontology fast?)
  • Difficulty (Is it difficult to build an
    ontology?)
  • Confidence (How do you know that youve got the
    ontology right?)

9
OntoEdit
  • OntoEdit supports the development and maintenance
    of ontologies using graphical means. It supports
    RDF-Schema, DAML-ONT, OIL and F-Logic.
  • Has many of the same features as our Ontology
    Editor.
  • Cardinality Restrictions
  • Keyword Associations
  • Value Phrase Restrictions

10
Assaulting the walls of Jericho
  • Multi-disciplinary approach
  • Machine learning (human assisted)
  • Five Phase approach
  • Import and Re-use existing ontologies
  • Data Extraction uses machine learning to sculpt
    major sections of the target ontology.
  • Target ontology is pruned
  • Refinement(?) (automatically and incrementally
    maintained by evaluating quality of proposals)
    Hahn Schnattinger
  • Validation using prime target application as a
    measure for success of the ontology.

11
Another Wonderful Graph
  • Import/ReUse
  • Extract
  • Prune
  • Refine
  • Validate
  • Legacy data reference to archaic databasing
    techniques
  • Text-to-Onto

12
Components for Learning Ontologiesby Staab and
Maedche
  • Management Component
  • Resource Processing Component
  • Algorithm Library
  • GUI for Manual Engineering

13
Ontology Primitives
  • a set of strings that describe lexical entries L
    for concepts and relations
  • a set of concepts2 C
  • a taxonomy of concepts with multiple inheritance
    (heterarchy) HC
  • a set of non-taxonomic relations R described
    by their domain and range restrictions
  • a heterarchy of relations, i.e. a set of
    taxonomic relations HR
  • relations F and G that relate concepts and
    relations with their lexical entries,
    respectively
  • a set of axioms A that describe additional
    constraints on the ontology and allow to make
    implicit facts explicit

14
Management Component
  • OntoEngineer uses to select desired XML/HTML
    pages, document type definitions, databases, or
    pre-existing ontologies
  • Selects methods for the Resource Processing
    Component and Algorithms for the Library
    Component
  • Also includes a crawler that can find legacy data
    relevant to creation of the ontology on the web.
    (used for training data)

15
Resource Processing Component
  • HTML documents may be indexed and reduced to free
    text.
  • Semi-structured documents, like dictionaries,
    may be transformed into a predefined relational
    structure.
  • Semi-structured and structured schema data (like
    DTDs, structured database schemata, and existing
    ontologies) are handled following different
    strategies that may (or may not) be discussed
    later.
  • For processing free natural text our system
    accesses the natural language processing system
    SMES (Saarbrucken Message Extraction System), a
    shallow text processor for German. SMES comprises
    a tokenizer based on regular expressions, a
    lexical analysis component including various word
    lexicons, a morphological analysis module, a
    named entity recognizer, a part-of-speech tagger
    and a chunk parser.

16
Algorithm Library Component
  • This is the actual ontology builder and where we
    revisit our previous model
  • Import/ReUse
  • Extraction
  • Pruning
  • Refining
  • And then almost introduce some actual algorithms
    these phases use.

17
Import/Reuse
  • Recovering Conceptualizations
  • First, schema structures are identified and
    imported separately. This may be done manually
    or using reverse engineering tools.
  • Second, merging and aligning.
  • This is a HUGE body of research that is largely
    ignored by this document
  • While the general research issue concerning
    merging and aligning is still an open problem,
    recent proposals (e.g., 8) have shown how to
    improve the manual process of merging/aligning.

18
Extraction
  • Lexical Entry Concept Extraction
  • Hierarchical Concept Clustering
  • Dictionary Parsing
  • Association Rules

19
Lexical Entry Concept Extraction
  • Uses statistical technique (N-grams) similar to
    the product from Cuis presentation on the
    BioMedicine data extractor to group multi-word
    nouns together and associate them with their
    corresponding verbs
  • Every time a new lexical entry is introduced to L
    the OntoEngineer must decide whether to include
    the entry in an existing concept domain or to
    introduce a new one.

20
Hierarchical Concept Clustering
  • A useful way of creating a taxonomic
    classification of concepts.
  • Done automatically Text-to-Onto clusters concepts
    by adjacency of terms and syntactical
    relationships.
  • Done by a cooperative machine learning system,
    ASIUM, presented by Faure Nedellec. Uses the
    verb to noun and noun to verb association method.
  • Thus, they cooperatively extend the lexicon, the
    set of concepts, and the concept heterarchy. (L,
    C, HC)

21
Dictionary Parsing
  • This is really only one step further than what we
    are doing with the lexicons in our own Ontology
    Editor. The identified dictionary words are used
    with the concept clustered verb and noun
    associations to infer relationships between
    lexical entries.

22
Association Rules
  • These algorithms are usually used for data
    mining.
  • Works by using the taxonomy heterarchy to
    generalize the lexical entries and thereby draw
    conclusions about their use.
  • Snacks are purchased together with drinks
    -instead of-
    Lays chips are purchased with Sprite.

23
Example Output from Text-to-Onto
24
Completeness vs. Scarcity (Pruning)
  • Pruning the Ontology
  • It is a widely held belief that targeting
    completeness for the domain model on the one hand
    appears to be practically unmanageable and
    computationally intractable, and targeting the
    scarcest model on the other hand is overly
    limiting with regard to expressiveness. Hence,
    what we strive for is the balance between these
    two, which is really working.
  • Staab and Maedche

25
  • Import and ReUse, as well as the different
    Extraction methods weve discussed all tend to
    introduce unfocused elements into the ontology,
    as more general rules satisfy the conditional
    statements much more often.
  • Pruning is the art of diminishing the ontology to
    more specific rules.
  • First, must evaluate how removal of item from C
    (the set of concepts) will affect the rest of the
    ontology. (Petersen 9, no dangling or broken
    links)
  • Second, based on absolute or relative counts of
    frequency determine which ontology items are to
    be either kept or pruned. (Kietz 13)

26
Refine
  • Hahn and Schnattinger
  • Incremental approach to updating an ontology
    centered around linguistic and conceptual
    quality of various forms of evidence i.e.
    conflicting and analogous semantic structures
    underlying the generation and refinement of
    concept hypothesis.

27
Conclusions
  • Ontology learning a significant leverage to
    Semantic Web.
  • Propels propagation of ontologies
  • Multi-disciplinary approach to the problem

28
Further Challenges in Learning
  • XML namespace mechanisms will turn the web into
    an amoeba-like structure, with ontologies
    supporting and referring to each other (ReUse and
    Import) Not clear yet on what will be the
    semantic result of this evolution.
  • This examination has been restricted almost
    entirely to RDF-Schema. Additional layers of RDF
    (future OIL or DAML-ONT) will require new means
    for improved Ontology engineering.

29
Questions?
Dabu?
Write a Comment
User Comments (0)
About PowerShow.com