Information Extraction and Ontology Learning Guided by Web Directory - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Information Extraction and Ontology Learning Guided by Web Directory

Description:

Information Extraction and Ontology Learning Guided by Web Directory Authors: Martin Kavalec Vojt ch Sv tek Presenter: Mark Vickers Outline Introduction Mining ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 21
Provided by: loca188
Category:

less

Transcript and Presenter's Notes

Title: Information Extraction and Ontology Learning Guided by Web Directory


1
Information Extraction and Ontology Learning
Guided by Web Directory
  • Authors Martin Kavalec
  • Vojtech Svátek
  • Presenter Mark Vickers

2
Outline
  • Introduction
  • Mining Indicator Terms
  • Integrating Rainbow
  • Ontological Analysis of Web Directories
  • IE and Ontology Learning
  • Future Work
  • Related Work
  • Assessment

3
Introduction
  • Goal
  • to extract information about (mostly generic)
    products, services and areas of competence of
    companies, from the free text chunks embedded in
    web presentations.
  • Taking advantage of
  • Collections of extraction patterns
  • Ontologies of problem domains
  • Approach Combine Information Extraction With
    Ontologies
  • Ontologies can improve quality of IE
  • Extracted information can improve/extend
    ontologies
  • Bootstrapping

4
Introduction
  • Uses Open Directory (http//dmoz.org)
  • Obtain labeled training data
  • Lightweight ontologies

The Open Directory Project is the largest, most
comprehensive human-edited directory of the Web.
5
(No Transcript)
6
Mining Indicator Terms
  • Informative terms generic names of products
  • Indicator terms situated near informative terms
  • Example our assortment includes
  • in our shop you can buy
  • Assumption Directory headings coincide with
    informatives
  • Purpose Generate extraction patterns based on
    Indicator terms
  • They use deeper linguistic techniques

7
Mining Indicator Terms
  • Example
  • /Manufacturing/Materials/Metals/Steel/

Informative terms
  • Match headings with text pages to find sentences
    containing informative terms
  • Grab nearby words as indicator terms
  • Generate extraction patterns from indicator terms

8
Mining Indicator Terms
  • Choosing Indicator Terms
  • Syntactical analysis Link Grammar Parser
  • Chose verbs occurring closest in parse tree to
    informative word
  • Arrange verbs into a frequency table
  • Order by ratio of frequency near informative term
    to frequency in general
  • Chose 8 most promising verbs

9
Mining Indicator Terms
  • Preliminary Testing
  • Sampled 14,500 sentences containing heading terms
  • Randomly chose 130 sentences with indicators
  • Manually labeled to estimate if informative term
    was present or not
  • Example
  • We are equipped to run any grade of corrugated
    from E-flute to Triplewall, including all
    government grades.

10
Mining Indicator Terms
  • Preliminary Test Results

Coverage
Non-Filtered 10 20
Pre-Filtered 70 80
11
Integration into Rainbow
  • RAINBOW
  • (Reusable Architecture for INtelligent Brokering
    Of Web information access)
  • Web Analysis Tasks
  • Sentence Extraction
  • Explicit Metadata
  • HTML Structure
  • Inline Image
  • Link Topology Structure
  • Page Similarity
  • Internal Communication based on SOAP
  • Will use ontologies for verifying semantic
    consistency of web services provided within the
    distributed system

12
Integration into Rainbow
  • Rainbow will help solve coverage problem of
    directory links pointing to barren pages
  • Using Analysis of
  • Keywords and HTML Structure on start-up pages
  • URLs of embedded links
  • Metadata Extractor will be navigated towards
    promising pages.
  • Looking for about-us or profile to find more
    syntactically correct text, for example.

13
Ontological Analysis of Web Directories
-Industries - Construction_and_Maintenance -
Materials_and_supplies - Masonry_and_Stone
- Natural_Stone - International_Sources
- Mexico
  • Terms and Phrases in single heading belong to a
    small set of classes
  • Parent-child relations belong to particular
    classes corresponding to deep ontological
    relations.

14
Ontological Analysis of Web Directories
Class-subclass Relations
Class
Named Relations
Reflexive Binary Relations
  • Meta-ontology of directory headings

15
Ontological Analysis of Web Directories
  • Interpretation Rules

16
IE and Ontology Learning
  • Extracting with plain indicator terms with simple
    heuristics works
  • But Even Better
  • Learn indicators for each class
  • Use ontology analysis to classify indicators
    found
  • Fill in database templates true IE

17
IE and Ontology Learning
Closed Loop Strategy
Learn class-specific indicators
Classify Headings
Human Classifies Directory Headings (WordNet)
18
Future Work
  • Complete the Information extraction ontology
    learning loop.
  • With relation to Semantic Web, they want to adapt
    technique to the standards of usual explicit
    metadata
  • Example The information extracted can be forged
    to RDF triples, with indicator collections
    accessible over the web

19
Related Work
  • Combining IE and Ontologies (without use of web
    directories)
  • Bootstrapping an Ontology-Based Information
    Extraction Systems
  • Advantages of using Link Grammar Parser
  • Learning to Generate Semantic Annotation for
    Domain Specific Sentences
  • Using Yahoo to classify whole documents
  • Turning Yahoo into an Automatic Web-Page
    Classifier
  • Similar work aimed at more structured information
    using search engines
  • Extracting Patterns and Relations form the World
    Wide Web
  • Bootstrapping and other statistical methods for
    IE
  • Text Classification by Bootstrapping with
    Keywords
  • Learning Dictionaries of Information Extraction
    by Multi-Level Bootstrapping

20
Assessment
  • I dont think indicator term learning is done
    (even though they say it is)
  • Counts on not yet decided Ontology learning
    techniques
  • Need to develop an official directory
Write a Comment
User Comments (0)
About PowerShow.com