Hidden Markov Models Applied to Information Extraction - PowerPoint PPT Presentation

About This Presentation

Title:

Hidden Markov Models Applied to Information Extraction

Description:

Example: speech signals ... Once each day weather is observed. State 1: rain. State 2: cloudy. State 3: sunny ... Given a speech signal, evaluation can ... – PowerPoint PPT presentation

Number of Views:231

Avg rating:3.0/5.0

Slides: 35

Provided by: pagesD

Learn more at: http://www.pages.drexel.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Markov Models Applied to Information Extraction

1
Hidden Markov ModelsApplied to Information
Extraction

Part I Concept
HMM Tutorial
Part II Sample Application
AutoBib web information extraction

Larry Reeve INFO629 Artificial IntelligenceDr.
Weber, Fall 2004
2
Part I Concept HMM Motivation

Real-world has structures and processes which
have (or produce) observable outputs
Usually sequential (process unfolds over time)
Cannot see the event producing the output
Example speech signals
Problem how to construct a model of the
structure or process given only observations

3
HMM Background

Basic theory developed and published in 1960s and
70s
No widespread understanding and application until
late 80s
Why?
Theory published in mathematic journals which
were not widely read by practicing engineers
Insufficient tutorial material for readers to
understand and apply concepts

4
HMM Uses

Uses
Speech recognition
Recognizing spoken words and phrases
Text processing
Parsing raw records into structured records
Bioinformatics
Protein sequence prediction
Financial
Stock market forecasts (price pattern prediction)
Comparison shopping services

5
HMM Overview

Machine learning method
Makes use of state machines
Based on probabilistic models
Useful in problems having sequential steps
Can only observe output from states, not the
states themselves
Example speech recognition
Observe acoustic signals
Hidden States phonemes
(distinctive sounds of a language)

State machine
6
Observable Markov Model Example
State transition matrix

Weather
Once each day weather is observed
State 1 rain
State 2 cloudy
State 3 sunny
What is the probability the weather for the next
7 days will be
sun, sun, rain, rain, sun, cloudy, sun
Each state corresponds to a physical observable
event

Rainy Cloudy Sunny
Rainy 0.4 0.3 0.3
Cloudy 0.2 0.6 0.2
Sunny 0.1 0.1 0.8
7
Observable Markov Model
8
Hidden Markov Model Example

Coin toss
Heads, tails sequence with 2 coins
You are in a room, with a wall
Person behind wall flips coin, tells result
Coin selection and toss is hidden
Cannot observe events, only output (heads, tails)
from events
Problem is then to build a model to explain
observed sequence of heads and tails

9
HMM Components

A set of states (xs)
A set of possible output symbols (ys)
A state transition matrix (as)
probability of making transition from one state
to the next
Output emission matrix (bs)
probability of a emitting/observing a symbol at a
particular state
Initial probability vector
probability of starting at a particular state
Not shown, sometimes assumed to be 1

10
HMM Components
11
Common HMM Types

Ergodic (fully connected)
Every state of model can be reached in a single
step from every other state of the model
Bakis (left-right)
As time increases, states proceed from left to
right

12
HMM Core Problems

Three problems must be solved for HMMs to be
useful in real-world applications
1) Evaluation
2) Decoding
3) Learning

13
HMM Evaluation Problem

Purpose score how well a given model matches a
given observation sequence
Example (Speech recognition)
Assume HMMs (models) have been built for words
home and work.
Given a speech signal, evaluation can determine
the probability each model represents the
utterance

14
HMM Decoding Problem

Given a model and a set of observations, what are
the hidden states most likely to have generated
the observations?
Useful to learn about internal model structure,
determine state statistics, and so forth

15
HMM Learning Problem

Goal is to learn HMM parameters (training)
State transition probabilities
Observation probabilities at each state
Training is crucial
it allows optimal adaptation of model parameters
to observed training data using real-world
phenomena
No known method for obtaining optimal parameters
from data only approximations
Can be a bottleneck in HMM usage

16
HMM Concept Summary

Build models representing the hidden states of a
process or structure using only observations
Use the models to evaluate probability that a
model represents a particular observation
sequence
Use the evaluation information in an application
to recognize speech, parse addresses, and many
other applications

17
Part II Application AutoBib System

Provide a uniform view of several computer
science bibliographic web data sources
An automated web information extraction system
that requires little human input
Web pages designed differently from site-to-site
IE requires training samples
HMMs used to parse unstructured bibliographic
records into a structured format NLP

18
Web Information Extraction Converting Raw Records
19
Approach

1) Provide seed database of structured records
2) Extract raw records from relevant Web pages
3) Match structured records to raw records
To build training samples
4) Train HMM-based parser
5) Parse unmatched raw recs into structured recs
6) Merge new structured records into database

20
AutoBib Architecture
21
Step 1 - Seeding

Provide seed database of structured records
Take small collection of BibTeX format records
and insert into database
Cleaning step normalizes record fields
Examples
Proc. ? Proceedings
Jan ? January
Manual step, executed once only

22
Step 2 Extract Raw Records

Extract raw records from relevant Web pages
User specifies
Web pages to extract from
How to follow next page links for multiple
pages
Raw records are extracted
Uses record-boundary discovery techniques
Subtree of Interest largest subtree of HTML
tags
Record separators frequent HTML tags

23
Tokenized Records
(Replace all HTML tags with )
24
Step 3 - Matching

Match raw records R to structured records S
Apply 4 tests (heuristic-based)
Match at least author in R to an author in S
S.year must appear in R
If S.pages exists, R must contain it
S.title is approximately contained in R
Levenshtein edit distance approximate string
match

25
Step 4 Parser Training

Train HMM-based parser
For each pair of R and S that match, annotate
tokens in raw record with field names
Annotated raw records are fed into HMM parser in
order to learn
State transition probabilities
Symbol probabilities at each state

26
Parser Training, continued

Key consideration is HMM structure for navigating
record fields (fields, delimiters)
Special states
start, end
Normal states
author, title, year, etc.
Best structure found
Have multiple delimiter and tag states,
one for each normal state
Example author-delimiter, author-tag

27
Sample HMM (Method 3)
Source http//www.cs.duke.edu/geng/autobib/web/h
mm.jpg
28
Step 5 - Conversion

Parse unmatched raw recs into structured recs
using HMM parser
Matched raw records can be directly converted
without parsing because they were annotated in
matching step

29
Step 6 - Merging

Merge new structured records into database
Initial seed database has now grown
New records will be used for improved matching on
the next run

30
Evaluation

Success rate
of tokens labeled by HMM
-------------------------------------
of tokens labeled by person
DBLP 98.9
Computer Science Bibliography
CSWD 93.4
CompuScience WWW-Database

31
HMM Advantages / Disadvantages

Advantages
Effective
Can handle variations in record structure
Optional fields
Varying field ordering
Disadvantages
Requires training using annotated data
Not completely automatic
May require manual markup
Size of training data may be an issue

32
Other methods

Wrappers
Specification of areas of interest on Web page
Hand-crafted
Wrapper induction
Requires manual training
Not always accommodating to changing structure
Syntax-based no semantic labeling

33
Application to Other Domains

E-Commerce
Comparison shopping sites
Extract product/pricing information from many
sites
Convert information into structured format and
store
Provide interface to look up product information
and then display pricing information gathered
from many sites
Saves users time
Rather than navigating to and searching many
sites, users can consult a single site

34
References

Concept
Rabiner, L. R. (1989). A Tutorial on Hidden
Markov Models and Selected Applications in Speech
Recognition. Proceedings of the IEEE, 77(2),
257-285.
Application
Geng, J. and Yang, J. (2004). Automatic
Extraction of Bibliographic Information on the
Web. Proceedings of the 8th International
Database Engineering and Applications Symposium
(IDEAS04), 193-204.