Syntax - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Syntax

Description:

Syntax: from Greek syntaxis, 'setting out together, arrangmenet' ... tortoise. on. the. rug. rug. the. the. on. tortoise. put. boy. The. Parsing ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 37
Provided by: DanJur1
Category:
Tags: syntax | tortoise

less

Transcript and Presenter's Notes

Title: Syntax


1
Syntax
  • Sudeshna Sarkar
  • 25 Aug 2008

2
Some Fundamental Questions
  • What is Language?
  • How to define a Language?
  • What makes a language different from another?
  • Is there anything common to all languages?

3
Syntax
  • Syntax from Greek syntaxis, setting out
    together, arrangmenet
  • Refers to the way words are arranged together,
    and the relationship between them.
  • Distinction
  • Prescriptive grammar how people ought to talk
  • Descriptive grammar how they do talk
  • Goal of syntax is to model the knowledge of that
    people unconsciously have about the grammar of
    their native language

4
The Two Schools
  • Rationalists
  • Its all hardcoded in our brains
  • Principle and Parameter Theory
  • Poverty of Stimulus
  • Recursion
  • Empiricists
  • Just a special kind of pattern recognition
  • No different from other cognitive abilities like
    vision
  • Language is a stochastic phenomenon

5
The Generative Grammar
  • The grammatical principles underlying
    languages are innate and fixed, and the
    differences among the world's languages can be
    characterized in terms of parameter settings in
    the brain
  • - www.wikipedia.org

Noam Chomsky 1928- Courtesy www.chomsky.info
6
I E Languages
  • I Language Mentally represented system of
    rules (I internal)
  • E Language Observable external products of
    I-language (written text, utterances)
  • Language Collective E-language of a very large
    group of speakers
  • Syntax Study of the I-language from E-language

7
The Chomsky Hierarchy
Production rules
Automaton
Languages
Grammar
No restrictions
Turing machine
Recursively enumerable
Type-0
aAß ? a?ß
Linear-bounded non-deterministic Turing machine
Context-sensitive
Type-1
A ? ?
Non-deterministic pushdown automaton
Context-free
Type-2
A ? aBA ? a
Finite state automaton
Regular
Type-3
8
From Formal to Natural Languages
Organizational Unit Complexity
Word Regular
Sounds Regular
Sentence Context-free
Discourse ??
9
Some Observations on NLs
  • Constituency A group of words acts as a single
    unit phrases, clauses etc.
  • Grammatical Relations Different words/ phrases
    are related to the main verb of the sentence
    object, subject, instrument
  • Subcategorization and Dependency Relations Not
    all verbs can take all type of arguments
    transitive, intransitive etc.

10
Syntax
  • Why should you care?
  • Grammar checkers
  • Question answering
  • Information extraction
  • Machine translation

11
Why NLP is difficultNewspaper headlines
  • Iraqi Head Seeks Arms
  • Juvenile Court to Try Shooting Defendant
  • Teacher Strikes Idle Kids
  • Stolen Painting Found by Tree
  • Local High School Dropouts Cut in Half
  • Red Tape Holds Up New Bridges
  • Clinton Wins on Budget, but More Lies Ahead
  • Hospitals Are Sued by 7 Foot Doctors
  • Kids Make Nutritious Snacks

12
Why is NLU difficult? The hidden structure of
language is hugely ambiguous
  • Tree for Fed raises interest rates 0.5 in
    effort to control inflation (NYT headline
    5/17/00)

13
Where are the ambiguities?
14
The bad effects of V/N ambiguities
15
Context-Free Grammars
  • Capture constituency and ordering
  • Ordering is easy
  • What are the rules that govern the ordering of
    words and bigger units in the language
  • Whats constituency?
  • How words group into units and how the various
    kinds of units behave wrt one another

16
Constituency
  • We have NLP classes from 530 to 630 pm on
    Tuesday.
  • On Tuesday we have NLP classes from 530 630
    pm.
  • From 530 to 630 pm on Tuesday we have NLP
    classes.
  • We have NLP on Tuesday from 530 to 630 pm
    classes.
  • On we have NLP classes from Tuesday 530 to 630
    pm.
  • From 530 we have to 630 pm on Tuesday NLP
    classes.

17
Constituency
  • We have NLP classes from 530 to 630 pm on
    Tuesday.
  • On Tuesday we have NLP classes from 530 630
    pm.
  • From 530 to 630 pm on Tuesday we have NLP
    classes.
  • We have NLP on Tuesday from 530 to 630 pm
    classes.
  • On we have NLP classes from Tuesday 530 to 630
    pm.
  • From 530 we have to 630 pm on Tuesday NLP
    classes.

18
Phrases
  • Phrase Group of words that act as a unit
  • Noun Phrase NP
  • A midsummer nights dream, My experiments with
    truth, The man who knew infinity
  • Verb Phrase VP
  • Gone with the wind, Saving private Ryan
  • Prepositional Phrases PP
  • Of sons and lovers, to sir with love, Beyond the
    blue mountains, Into the heart of the mind

19
Modelling the Syntax of English
  • Let us try CFGs
  • S ? NP VP I love India.
  • S ? VP Love your country.
  • S ? Aux NP VP Do you love your country?
  • S ? Wh-NP VP Who loves his country?
  • S ? Wh-NP Aux NP VP
  • Which country do you live in?

20
Phrase Structure Grammar
  • Context Free Grammars are also called phrase
    structure grammars
  • Phrases are the building blocks of any PSG (i.e.
    CFG)
  • Phrases in turn are defined by CFG (PSG)

21
Is CFG Necessary?
  • Can we model the syntax of English using Regular
    Grammar?
  • NO! we cannot model recursion in RG
  • S ? NP VP
  • VP ? Verb S
  • I think that Einstein thought that Newton said

22
CFG Examples
  • S -gt NP VP
  • NP -gt Det NOMINAL
  • NOMINAL -gt Noun
  • VP -gt Verb
  • Det -gt a
  • Noun -gt flight
  • Verb -gt left

23
CFGs
  • S -gt NP VP
  • This says that there are units called S, NP, and
    VP in this language
  • That an S consists of an NP followed immediately
    by a VP
  • Doesnt say that thats the only kind of S
  • Nor does it say that this is the only place that
    NPs and VPs occur

24
Context Free Grammars
  • A CFG consists of a tuple (N,T,S,P)
  • N is a finite set of non-terminal symbols
  • T is a finite set of terminal symbols
  • S is the start symbol
  • P is a finite set of rules of the form X ? ?
    where X ? N and ??N U T

25
Phrase Structure Parsing
  • Phrase structure organizes words into phrases,
    often called constituents
  • This organization is hierarchical
  • For a given string there is often ambiguity about
    the correct phrase structure
  • This ambiguity often corresponds to semantic
    ambiguity

26
(No Transcript)
27
Simple examples of a CFG
  • Take the non-terminals S, NP, VP, V
  • And the terminals boys, study, play, books,
    cricket)
  • Let the start symbol be S
  • Let the rule set be
  • S ? NP VP
  • VP ? V
  • VP ? V NP
  • NP ? boys
  • NP ? books
  • NP ? cricket
  • V ?study
  • V ?play

This CFG licenses a finite number of tree
sentences
28
Generativity
  • As with FSAs and FSTs you can view these rules as
    either analysis or synthesis machines
  • Generate strings in the language
  • Reject strings not in the language
  • Impose structures (trees) on strings in the
    language

29
Derivations
  • A derivation is a sequence of rules applied to a
    string that accounts for that string
  • Covers all the elements in the string
  • Covers only the elements in the string

30
Derivations as Trees
31
Two views of linguistic structure 1.
Constituency (phrase structure)
  • Phrase structure organizes words into nested
    constituents.
  • How do we know what is a constituent? (Not that
    linguists don't argue about some cases.)
  • Distribution a constituent behaves as a unit
    that can appear in different places
  • John talked to the children about drugs.
  • John talked about drugs to the children.
  • John talked drugs to the children about
  • Substitution/expansion/pro-forms
  • I sat on the box/right on top of the box/there.
  • Coordination, regular internal structure, no
    intrusion, fragments, semantics,

32
Two views of linguistic structure 2. Dependency
structure
  • Dependency structure shows which words depend on
    (modify or are arguments of) which other words.

put
on
boy
tortoise
rug
The
the
rug
The
boy
put
the
tortoise
on
the
the
33
Parsing
  • Parsing is the process of taking a string and a
    grammar and returning a (many?) parse tree(s) for
    that string
  • It is completely analogous to running a
    finite-state transducer with a tape
  • Its just more powerful
  • Remember this means that there are languages we
    can capture with CFGs that we cant capture with
    finite-state methods

34
Other Options
  • Regular languages (expressions)
  • Too weak
  • Context-sensitive or Turing equiv
  • Too powerful (maybe)

35
Context?
  • The notion of context in CFGs has nothing to do
    with the ordinary meaning of the word context in
    language.
  • All it really means is that the non-terminal on
    the left-hand side of a rule is out there all by
    itself (free of context)
  • A -gt B C
  • Means that
  • I can rewrite an A as a B followed by a C
    regardless of the context in which A is found
  • Or when I see a B followed by a C I can infer an
    A regardless of the surrounding context

36
Key Constituents (English)
  • Sentences
  • Noun phrases
  • Verb phrases
  • Prepositional phrases
Write a Comment
User Comments (0)
About PowerShow.com