LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing - PowerPoint PPT Presentation

Loading...

PPT – LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing PowerPoint presentation | free to download - id: 45ced9-YzA0N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing

Description:

Intro to Computer Speech and Language Processing Lecture 13: Grammar and Parsing (I) November 9, 2004 Dan Jurafsky Thanks to Jim Martin for many of these s! – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 64
Provided by: DanJ94
Learn more at: http://www.stanford.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing


1
LING 138/238 SYMBSYS 138 Intro to Computer Speech
and Language Processing
  • Lecture 13 Grammar and Parsing (I)
  • November 9, 2004
  • Dan Jurafsky

Thanks to Jim Martin for many of these slides!
2
Outline for Grammar/Parsing Week
  • Context-Free Grammars and Constituency
  • Some common CFG phenomena for English
  • Sentence-level constructions
  • NP, PP, VP
  • Coordination
  • Subcategorization
  • Top-down and Bottom-up Parsing
  • Earley Parsing
  • Quick sketch of advanced stuff

3
Review
  • Parts of Speech
  • Basic syntactic/morphological categories that
    words belong to
  • Part of Speech tagging
  • Assigning parts of speech to all the words in a
    sentence

4
Syntax
  • Syntax from Greek syntaxis, setting out
    together, arrangmenet
  • Refers to the way words are arranged together,
    and the relationship between them.
  • Distinction
  • Prescriptive grammar how people ought to talk
  • Descriptive grammar how they do talk
  • Goal of syntax is to model the knowledge of that
    people unconsciously have about the grammar of
    their native langauge

5
Syntax
  • Why should you care?
  • Grammar checkers
  • Question answering
  • Information extraction
  • Machine translation

6
4 key ideas of syntax
  • Constituency (well spend most of our time on
    this)
  • Grammatical relations
  • Subcategorization
  • Lexical dependencies
  • Plus one part we wont have time for
  • Movement/long-distance dependency

7
Context-Free Grammars
  • Capture constituency and ordering
  • Ordering
  • What are the rules that govern the ordering of
    words and bigger units in the language?
  • Constituency
  • How words group into units and how the various
    kinds of units behave

8
Constituency
  • Noun phrases (NPs)
  • Three parties from Brooklyn
  • A high-class spot such as Mindys
  • The Broadway coppers
  • They
  • Harry the Horse
  • The reason he comes into the Hot Box
  • How do we know these form a constituent?
  • They can all appear before a verb
  • Three parties from Brooklyn arrive
  • A high-class spot such as Mindys attracts
  • The Broadway coppers love
  • They sit

9
Constituency (II)
  • They can all appear before a verb
  • Three parties from Brooklyn arrive
  • A high-class spot such as Mindys attracts
  • The Broadway coppers love
  • They sit
  • But individual words cant always appear before
    verbs
  • from arrive
  • as attracts
  • the is
  • spot is
  • Must be able to state generalizations like
  • Noun phrases occur before verbs

10
Constituency (III)
  • Preposing and postposing
  • On September 17th, Id like to fly from Atlanta
    to Denver
  • Id like to fly on September 17th from Atlanta to
    Denver
  • Id like to fly from Atlanta to Denver on
    September 17th.
  • But not
  • On September, Id like to fly 17th from Atlanta
    to Denver
  • On Id like to fly September 17th from Atlanta
    to Denver

11
CFG Examples
  • S -gt NP VP
  • NP -gt Det NOMINAL
  • NOMINAL -gt Noun
  • VP -gt Verb
  • Det -gt a
  • Noun -gt flight
  • Verb -gt left

12
CFGs
  • S -gt NP VP
  • This says that there are units called S, NP, and
    VP in this language
  • That an S consists of an NP followed immediately
    by a VP
  • Doesnt say that thats the only kind of S
  • Nor does it say that this is the only place that
    NPs and VPs occur

13
Generativity
  • As with FSAs and FSTs you can view these rules as
    either analysis or synthesis machines
  • Generate strings in the language
  • Reject strings not in the language
  • Impose structures (trees) on strings in the
    language

14
Derivations
  • A derivation is a sequence of rules applied to a
    string that accounts for that string
  • Covers all the elements in the string
  • Covers only the elements in the string

15
Derivations as Trees
16
Parsing
  • Parsing is the process of taking a string and a
    grammar and returning a (many?) parse tree(s) for
    that string

17
Context?
  • The notion of context in CFGs has nothing to do
    with the ordinary meaning of the word context in
    language.
  • All it really means is that the non-terminal on
    the left-hand side of a rule is out there all by
    itself (free of context)
  • A -gt B C
  • Means that I can rewrite an A as a B followed by
    a C regardless of the context in which A is found

18
Key Constituents (English)
  • Sentences
  • Noun phrases
  • Verb phrases
  • Prepositional phrases

19
Sentence-Types
  • Declaratives A plane left
  • S -gt NP VP
  • Imperatives Leave!
  • S -gt VP
  • Yes-No Questions Did the plane leave?
  • S -gt Aux NP VP
  • WH Questions When did the plane leave?
  • S -gt WH Aux NP VP

20
NPs
  • NP -gt Pronoun
  • I came, you saw it, they conquered
  • NP -gt Proper-Noun
  • Los Angeles is west of Texas
  • John Hennesey is the president of Stanford
  • NP -gt Det Noun
  • The president
  • NP -gt Nominal
  • Nominal -gt Noun Noun
  • A morning flight to Denver

21
PPs
  • PP -gt Preposition NP
  • From LA
  • To Boston
  • On Tuesday
  • With lunch

22
Recursion
  • Well have to deal with rules such as the
    following where the non-terminal on the left also
    appears somewhere on the right (directly).
  • NP -gt NP PP The flight to Boston
  • VP -gt VP PP departed Miami at noon

23
Recursion
  • Of course, this is what makes syntax interesting
  • flights from Denver
  • Flights from Denver to Miami
  • Flights from Denver to Miami in February
  • Flights from Denver to Miami in February on a
    Friday
  • Flights from Denver to Miami in February on a
    Friday under 300
  • Flights from Denver to Miami in February on a
    Friday under 300 with lunch

24
Recursion
  • Of course, this is what makes syntax interesting
  • flights from Denver
  • Flights from Denver to Miami
  • Flights from Denver to Miami in
    February
  • Flights from Denver to Miami in
    February on a Friday
  • Etc.

25
Implications of recursion and context-freeness
  • If you have a rule like
  • VP -gt V NP
  • It only cares that the thing after the verb is an
    NP. It doesnt have to know about the internal
    affairs of that NP

26
The Point
  • VP -gt V NP
  • I hate
  • flights from Denver
  • Flights from Denver to Miami
  • Flights from Denver to Miami in February
  • Flights from Denver to Miami in February on a
    Friday
  • Flights from Denver to Miami in February on a
    Friday under 300
  • Flights from Denver to Miami in February on a
    Friday under 300 with lunch

27
Bracketed Notation
  • S NP PRO I VP V prefer NP NP Det a Nom
    N morning N flight

28
Coordination Constructions
  • S -gt S and S
  • John went to NY and Mary followed him
  • NP -gt NP and NP
  • VP -gt VP and VP
  • In fact the right rule for English is
  • X -gt X and X

29
Problems
  • Agreement
  • Subcategorization
  • Movement (for want of a better term)

30
Agreement
  • This dog
  • Those dogs
  • This dog eats
  • Those dogs eat
  • This dogs
  • Those dog
  • This dog eat
  • Those dogs eats

31
Possible CFG Solution
  • S -gt NP VP
  • NP -gt Det Nominal
  • VP -gt V NP
  • SgS -gt SgNP SgVP
  • PlS -gt PlNp PlVP
  • SgNP -gt SgDet SgNom
  • PlNP -gt PlDet PlNom
  • PlVP -gt PlV NP
  • SgVP -gtSgV Np

32
CFG Solution for Agreement
  • It works and stays within the power of CFGs
  • But its ugly
  • And it doesnt scale all that well

33
Subcategorization
  • Sneeze John sneezed
  • Find Please find a flight to NYNP
  • Give Give meNPa cheaper fareNP
  • Help Can you help meNPwith a flightPP
  • Prefer I prefer to leave earlierTO-VP
  • Said You said United has a flightS

34
Subcategorization
  • John sneezed the book
  • I prefer United has a flight
  • Give with a flight
  • Subcat expresses the constraints that a predicate
    (verb for now) places on the number and syntactic
    types of arguments it wants to take (occur with).

35
So?
  • So the various rules for VPs overgenerate.
  • They permit the presence of strings containing
    verbs and arguments that dont go together
  • For example
  • VP -gt V NP therefore
  • Sneezed the book is a VP since sneeze is a
    verb and the book is a valid NP

36
Subcategorization
  • Sneeze John sneezed
  • Find Please find a flight to NYNP
  • Give Give meNPa cheaper fareNP
  • Help Can you help meNPwith a flightPP
  • Prefer I prefer to leave earlierTO-VP
  • Told I was told United has a flightS

37
Forward Pointer
  • It turns out that verb subcategorization facts
    will provide a key element for semantic analysis
    (determining who did what to who in an event).

38
Possible CFG Solution
  • VP -gt V
  • VP -gt V NP
  • VP -gt V NP PP
  • VP -gt IntransV
  • VP -gt TransV NP
  • VP -gt TransPP NP PP

39
Movement
  • Core example
  • My travel agent booked the flight

40
Movement
  • Core example
  • My travel agentNP booked the flightNPVPS
  • I.e. book is a straightforward transitive verb.
    It expects a single NP arg within the VP as an
    argument, and a single NP arg as the subject.

41
Movement
  • What about?
  • Which flight do you want me to have the travel
    agent book?
  • The direct object argument to book isnt
    appearing in the right place. It is in fact a
    long way from where its supposed to appear.
  • And note that its separated from its verb by 2
    other verbs.

42
CFGs a summary
  • CFGs appear to be just about what we need to
    account for a lot of basic syntactic structure in
    English.
  • But there are problems
  • That can be dealt with adequately, although not
    elegantly, by staying within the CFG framework.
  • There are simpler, more elegant, solutions that
    take us out of the CFG framework (beyond its
    formal power)
  • Syntactic theories HPSG, LFG, CCG, Minimalism,
    etc

43
Other Syntactic stuff
  • Grammatical Relations
  • Subject
  • I booked a flight to New York
  • The flight was booked by my agent.
  • Object
  • I booked a flight to New York
  • Complement
  • I said that I wanted to leave

44
Dependency Parsing
  • Word to word links instead of constituency
  • Based on the European rather than American
    traditions
  • But dates back to the Greeks
  • The original notions of Subject, Object and the
    progenitor of subcategorization (called
    valence) came out of Dependency theory.
  • Dependency parsing is quite popular as a
    computational model
  • Since relationships between words are quite useful

45
Parsing
  • Parsing assigning correct trees to input strings
  • Correct tree a tree that covers all and only the
    elements of the input and has an S at the top
  • For now enumerate all possible trees
  • A further task disambiguation means choosing
    the correct tree from among all the possible
    trees.

46
Parsing
  • The Link Grammar parser
  • http//www.link.cs.cmu.edu/cgi-bin/link/construct-
    page-4.cgi - submit
  • The Connexor dependency parser
  • http//www.connexor.com/demos/syntax_en.html

47
Treebanks
  • Parsed corpora in the form of trees
  • Examples

48
Parsed Corpora Treebanks
  • The Penn Treebank
  • The Brown corpus
  • The WSJ corpus
  • Tgrep
  • http//www.ldc.upenn.edu/ldc/online/treebank/

49
Parsing
  • As with everything of interest, parsing involves
    a search which involves the making of choices
  • Well start with some basic (meaning bad) methods
    before moving on to the one or two that you need
    to know

50
For Now
  • Assume
  • You have all the words already in some buffer
  • The input isnt pos tagged
  • We wont worry about morphological analysis
  • All the words are known

51
Top-Down Parsing
  • Since were trying to find trees rooted with an S
    (Sentences) start with the rules that give us an
    S.
  • Then work your way down from there to the words.

52
Top Down Space
53
Bottom-Up Parsing
  • Of course, we also want trees that cover the
    input words. So start with trees that link up
    with the words in the right way.
  • Then work your way up from there.

54
Bottom-Up Space
55
Control
  • Of course, in both cases we left out how to keep
    track of the search space and how to make choices
  • Which node to try to expand next
  • Which grammar rule to use to expand a node

56
Top-Down, Depth-First, Left-to-Right Search
57
Example
58
Example
59
Example
60
Control
  • Does this sequence make any sense?

61
Top-Down and Bottom-Up
  • Top-down
  • Only searches for trees that can be answers (i.e.
    Ss)
  • But also suggests trees that are not consistent
    with the words
  • Bottom-up
  • Only forms trees consistent with the words
  • Suggest trees that make no sense globally

62
So Combine Them
  • There are a million ways to combine top-down
    expectations with bottom-up data to get more
    efficient searches
  • Most use one kind as the control and the other as
    a filter
  • As in top-down parsing with bottom-up filtering

63
Bottom-Up Filtering
About PowerShow.com