Incrementally Learning Parameter of Stochastic CFG using Summary Stats

About This Presentation

Title:

Incrementally Learning Parameter of Stochastic CFG using Summary Stats

Description:

Disadvantage of Inside/Outside algo. ... Comparing Inside/Outside Algo with the proposed algorithm. Inside/Outside. O(n3) Good ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 11

Provided by: lin63

Learn more at: http://www.cs.rutgers.edu

Category:

more less

Transcript and Presenter's Notes

Title: Incrementally Learning Parameter of Stochastic CFG using Summary Stats

1
Incrementally Learning Parameter of Stochastic
CFG using Summary Stats

Written byBrent Heeringa
Tim Oates

2
Goals

To learn the syntax of utterances
Approach
SCFG (Stochastic Context Free Grammar)
MltV,E,R,Sgt
V-finite set of non-terminal
E-finite set of terminals
R-finite set of rules, each r has p(r).
Sum of p(r) of the same left-hand side 1
S-start symbol

3
Problems with most SCFG Learning Algorithms

1)Expensive storage need to store a corpus of
complete sentences
2)Time-consuming algorithms needs to repeat
passes throughout all data

4
Learning SCFG

Inducing context-free structure from
corpus(sentences)
Learning the production(rules) probabilities

5
Learning SCFG Cont

General method Inside/Outside algorithm
Expectation-Maximization (EM)
Find expectation of rules
Maximize the likelihood given both expectation
corpus

Disadvantage of Inside/Outside algo.
Entire sentence corpus must be stored using some
representation(eg. chart parse)
Expensive storage (unrealistic for human agent!)

6
Proposed Algorithm

Use Unique Normal Form (UNF)
Replace all terminal A-z to 2 new rules
A-gtD pA-gtDpA-gtz
D-gt z pD-gtz1
No two productions have the same right hand side

7
Learning SCFG- Proposed Algorithm -cont

Use Histogram
Each rule has 2 histograms (Hor, HLr)

8
Proposed Algorithm -cont

Hor -contructed when parsing sentences in O
HLr- -will continue to be updated throughout
learning process
HLr rescale to fixed size h
Why?!
Recently used rules has more impact on histogram

9
Comparing between HLr Hor

Relative entropy
T decrease- increase prob of rules used
(if s large, increase prob of rules used when
parsing last sentence )
T increase- decrease prob of rules used
(eg pt1(r)0.01 p t1(r)

10
Comparing Inside/Outside Algo with the proposed
algorithm

Inside/Outside
O(n3)
Good
3-5 iterations
Bad
Need to store complete sentence corpus

Proposed Algo
O(n3)
Bad
500-1000 iterations
Good
Memory requirements is constant!

Write a Comment

User Comments (0)

About PowerShow.com

Incrementally Learning Parameter of Stochastic CFG using Summary Stats - PowerPoint PPT Presentation

Incrementally Learning Parameter of Stochastic CFG using Summary Stats

Disadvantage of Inside/Outside algo. ... Comparing Inside/Outside Algo with the proposed algorithm. Inside/Outside. O(n3) Good ... – PowerPoint PPT presentation