CYK Parser - PowerPoint PPT Presentation

About This Presentation
Title:

CYK Parser

Description:

CYK Parser. Von Carla und Cornelia Kempa. Overview. CYK Parser ... not only to annoy students ) Perfect grammar for CYK. How CYK works for. a grammar in CNF ... – PowerPoint PPT presentation

Number of Views:126
Avg rating:3.0/5.0
Slides: 54
Provided by: Con9152
Category:
Tags: cyk | annoy | parser

less

Transcript and Presenter's Notes

Title: CYK Parser


1
CYK Parser
  • Von Carla und Cornelia Kempa

2
Overview
Top-down Bottom-up
Non-directional methods Unger Parser CYK Parser
3
Cocke Younger Kasami -method
4
Recognition phase
5
Example grammar
  • Number(s) ? Integer Real
  • Integer ? Digit Integer Digit
  • Real ? Integer Fraction Scale
  • Fraction ? . Integer
  • Scale ? e Sign Integer Empty
  • Digit ? 0 1 2 3 4 5 6 7 8 9
  • Empty ? ?
  • Sign ? -

6
Example Sentence 32.5e1
  • 1. concentrate on the substrings of the
    input sentence

7
Building the recognition table
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
32.5e 1 is in the language
  • What problems can we already see in this example?

17
Another complication ?- rules
  • Input 43.1

18
(No Transcript)
19
The ?- Problem
  • Shortest substrings of any input sentence
  • ?-substrings

We must compute R? the set of non-terminals that
derive ?
R? Empty, Scale
20
Non- empty substrings of the input sentence
  • Input z z1 z2 z3 z4 .zn
  • Compute the set of Non-Terminals
  • that derive the substring of z starting at
    position i, of length l.

21
Terminology (also on the handout)
  • i index we are starting at
  • l length of this substring
  • R s i,l set of Non-Terminals deriving the
    substring s i, l
  • S i, 0 ?
  • Set of Non- Terminals that derive ?
  • R s i,0 R ?

22
S i, l z i z i1 z i l-1
23
The set of Non- Terminals deriving the substring
s i, l R s i, l
  • 1.) substrings of length 0
  • S i, 0 ? and R s i, l R ?
  • 2.) short substrings
  • 3.) longer substrings (say l j )
  • All the information on substrings with
  • l lt j is available

24
Check each RH-side (Right-Hand -side) in the
grammar to see if it derives s i, l
  • L ? A1 .Am

S i, l ( divided into m segments (
possibly empty))
A1 ? first segment of s i, l A2 ? second segment
of s i, l .? .
25
A 1 .Am ? s i,l
  • So A1 ? first part of s i,l
  • (lets say A1 has to derive a first part of
  • s i, l of length k)
  • A1 ? s i, k
  • A1 is in the set R s i,k

26
A 1 .Am ? s i,l
  • Assuming this A2Am has to derive the rest
  • A2 Am ? Sik, l-k

This is attempted for every k
27
Problems with this Approach
  • 1) Consider A2Am
  • m could be 1 and A1 a Non-terminal
  • ? We are Dealing with a unit- rule
  • A1 must derive the whole substring
  • s i, l and thus be a member of R s i, l
  • But thats the set we are computing right
  • now

28
Solution to this problem
  • A1 ? s i, l
  • Somewhere along the derivation there must be a
    first step not using a unit rule
  • A1 ? B ?? C ? s i, l
  • C is the first Non-Terminal using a
  • non-unit-rule in the derivation

29
Solution cont.
  • At some stage C is added to Rs i, l
  • If we repeat the process again and again
  • At some point B will be added and in the next
    step A1 will be added
  • ? We have to repeat the process again and again
    until no new Non-Terminals are added to R s i,l

30
Problem 2
  • ?-rules
  • Consider all but one of the At derive ?
  • B ? A1 A2 A3 A4 A5 . At
  • B and A1 - t are Non-Terminals
  • A2 At derive ?
  • So what stays is B ? A1
  • A unit-rule

31
We have computed all the Rs i,l
  • If S is a member of Rs 1, n the start symbol
    derives z (s 1, n) (the input string)

32
CYK recognition with a grammar in - form
  • What are the Restrictions we want to have on our
    grammar ?

33
Useful Restrictions
  • No ?- rules
  • No unit-rules
  • Limit the length of the right- hand side of each
    rule, say to two
  • What we get out of this
  • A ? a
  • A ? BC
  • Where a is a terminal and ABC are Non- Terminals

34
Chomsky-Normal-Form
  • ( not only to annoy students )
  • Perfect grammar for CYK

35
How CYK works for a grammar in CNF
  • R? is empty
  • R s i, 1 can be read directly from the rules
  • (A ? a)
  • A rule A ? BC
  • can never derive a single terminal

36
Procedure
  • Iteratively (as before)
  • 1) Fill the sets R s, 1 directly
  • 2) Process all substrings of length 1
  • 3) Process all substrings of length 2
  • 4) Process all substrings of length l
  • For the first step we use the rules of the form A
    ? a
  • For all the following steps we have to use the
    rules of the form A ? BC

37
CYK and CNF
Question the CYK-Parser has to answear is Does
such a k exist?
38
Answearing this question is easy
  • Just try all possibilities
  • no problem since you are a computer -)
  • Range from 1 to (l-1)
  • All the sets R s i,k and R s ik , l-1
  • have already been computed at this point

39
Transform our sample CF-grammar into Chomsky
Normal Form
  • Overview
  • 1) eliminate ?-rules
  • 2) eliminate unit-rules
  • 3) remove non-productive non-terminals
  • 4) remove non reachable non-terminals
  • 5) modify the rest until all grammar rules
    are of the form A ? a , A ? BC

40
Our number grammar in CNF
  • Number(s) ? 0 1 2 3 4 5 6 7 8 9
  • Number(s) ? Integer Digit
  • Number(s) ? N1 Scale Integer Fraction
  • N1 ? Integer Fraction
  • Integer ?0 1 2 3 4 5 6 7 8 9
  • Integer ? Integer Digit
  • Fraction ? T1 Integer
  • T1 ? .
  • Scale ? N2 Integer
  • N2 ? T2 Sign
  • T2 ? e
  • Digit ? 0 1 2 3 4 5 6 7 8 9
  • Sign ? -

41
Building the recognition table
  • Input
  • Our example grammar in CNF
  • input sentence 32.5 e 1

42
Building the recognition table
  • 1) bottom-row read directly from the
    grammar (rules of the form A? a )
  • 2) Check each RHS in the grammar

43
Check each RHS of the grammar
  • Two Ways Example 2.5 e ( s 2, 4)
  • 1) check each RHS e.g N1 Scale
  • 2) compute possible RH-Sides from the recognition
    table

44
How this is done
  • 1) N1 not in R s 2, 1 or R s 2, 2
  • N1 is a member of R s 2, 3
  • But Scale is not a member of R s 5, 1
  • 2) R s 2, 4 is the set of Non- Terminals that
    have a RHS AB where either
  • A in R s 2, 1 and B in R s 3, 3
  • A in R s 2, 2 and B in R s 4, 2
  • A in R s 2, 3 and B in R s 5, 1
  • Possible combinations N1 T2 or Number T2
  • In our grammar we do not have such a RHS, so
    nothing is added to R s 2, 4.

45
Recognition table
46
Recognition table (well-formed substring table)
47
Computing R s i, lfollow the arrows V and W
simultaneously
  • A ? BC ,
  • B a member of a set on the V arrow ,
  • C a member of a set on the W arrow

48
Comparison
  • This process is much less complicated than the
    one we saw before
  • Why?

49
Conclusion
  • This process is much less complicated
  • Reasons
  • 1) We do not have to repeat the process again and
    again until no new Non-Terminals are added to R s
    i,l
  • (The substrings we are dealing with
  • are really substrings and cannot be equal to
    the string we start with)

50
Reasons cont.
  • 2) We only have to find one place where the
    substring must be split into two
  • A ? B C

Here !
51
Result of the algorithm we have seen so far
  • Complete collection of sets R s i, l
  • These sets can be organized in a triangular table

52
Cost of CYK - algorithm
  • Operations dependent on n,
  • the number of input symbols
  • (n ( n1) ) / 2 substrings to be examined
  • For each substring n-1 different k-positions as
    the worst case

53
Cost of CYK algorithm cont.
  • All other operations are independent of n
  • ? The algorithm works in a time at most
    proportional to n ³
  • ? Thats far more efficient than exhaustive
    search (time exponential in the length of the
    input sentence)
Write a Comment
User Comments (0)
About PowerShow.com