Title: How do linguists study grammar?
1How do linguists study grammar?
- Lori Levin
- 11-721 Grammars and Lexicons
- August 29, 2007
2Outline
- Views of language
- Prescriptive
- Artistic
- Descriptive
- Claims about knowledge of a language
- Unconscious
- Complex
- Systematic
- Can be studied scientifically
- A research tool grammaticality judgments
- What is grammaticality?
- Problems with grammaticality
- Rationalism vs empiricism
- Why should language technologists care about
grammaticality?
3Prescriptive and Descriptive Linguistics
- Natural phenomena cannot be legislated, just
described. - You cant declare the value of p to be 3.
- Sag, Wasow, and Bender, page 1
- Social phenomena can be legislated.
- Language use can be legislated as a social
phenomenon, but it can also be studied as a
natural phenomenon.
4Prescriptive view of language
- Rules about how language should be used
- Dont say Me and him went to the movies.
- It doesnt make sense because you cant say Me
went to the movies. - Focus on isolated phenomena that are thought to
be corruptions of the language. - Everybody should do their homework.
- Some people speak correctly and others dont.
- Rules are something that you are aware of.
5Artistic View of Language
- Language can be used creatively to make
literature and poetry. - Some people are better at it than others.
- Language is not systematic and rule governed.
6Descriptive view of language
- Study language as a natural phenomenon
- People say Me and him went to the movies.
- Thats interesting because they dont say Me went
to the movies. - Focus on all aspects of language, even very
normal sentences. - Every native speaker of a language speaks equally
well. - Unless there is an injury or an illness that
affects certain parts of the brain or speech
producing organs. - Language consists of systematic knowledge that
the speakers are not aware of.
7Outline
- Views of language
- Prescriptive
- Artistic
- Descriptive
- Claims about knowledge of a language
- Unconscious
- Complex
- Systematic
- Can be studied scientifically
- A research tool grammaticality judgments
- What is grammaticality?
- Problems with grammaticality
- Rationalism vs empiricism
- Why should language technologists care about
grammaticality?
8Knowledge of Language
- Every normal speaker of any natural language has
acquired an immensely rich and systematic body of
unconscious knowledge, which can be investigated
by consulting speakers intuitive judgments. - Languages are objects of considerable
complexity, which can be studied scientifically.
That is, we can formulate hypotheses about
linguistic structure and test them against the
facts of particular languages. - Sag et al., page 2
Claim 1
Claim 2
Claim 3
Claim 4
9Chomsky, 1957 on testable hypotheses
- The search for rigorous formulation in
linguistics has a much more serious motivation
than mere concern for logical niceties or the
desire to purify well-established methods of
linguistic analysis. Precisely constructed
models for linguistic structure can play an
important role, both negative and positive, in
the process of discovery itself. By pushing a
precise but inadequate formulation to an
unacceptable conclusion, we can often expose the
exact source of the inadequacy and, consequently,
gain a deeper understanding of the linguistic
data. More positively a formalized theory may
automatically provide solutions for many problems
other than those for which it was explicitly
designed. Obscure and intuition-bound notions
can neither lead to absurd conclusions nor
provide new and correct ones, and hence they fail
to be useful in two important respects. - (Noam Chomsky has been the most influential
linguist in many parts of the world since 1957.
You may have also heard his name associated with
politics. )
10Immensely rich and systematic body of
unconscious knowledge
- They saw Pat and Chris.
- They saw Pat with Chris.
- Who did they see Pat with?
- Who did they see Pat and?
- Has anyone ever had to tell you not to say this?
11Testable hypotheses about linguistic knowledge
- We like us.
- We like ourselves.
- She likes her. (She ? her)
- She likes herself.
- Nobody likes us.
- Leslie likes ourselves.
- Ourselves like us.
- Ourselves like ourselves.
12Testable hypotheses
- Use a reflexive pronoun only when
- Use a regular pronoun only when
13Counter-examples
- We think that Leslie likes us.
- We think that Leslie likes ourselves.
- We think that ourselves like Leslie.
14New Hypothesis
- Use a reflexive pronoun only when
- Use a regular pronoun only when
- (This is an English rule. Many languages do not
follow it.)
15Support for the new hypothesis
- We think that she voted for her. (she ? her)
- We think that she voted for herself.
- We think that herself voted for her.
- We think that herself voted for herself.
16Counter-examples
- Our friends like us.
- Our friends like ourselves.
- Those pictures of us offended us.
- Those pictures of us offended ourselves.
17New Hypothesis
- Use a reflexive pronoun only when
- Use a regular pronoun only when
18Counter-examples
- Vote for us.
- Vote for ourselves.
- Vote for you.
- Vote for yourselves.
19Counter-examples
- We appealed to them to vote for themselves.
- We appealed to them to vote for them.
- Them ? them
- We appealed to them to vote for us.
- We appealed to them to vote for ourselves.
- We appeared to them to vote for themselves.
- We appeared to them to vote for them.
- Them them
- We appeared to them to vote for us.
- We appeared to them to vote for ourselves.
The theoretical machinery required for a viable
grammatical analysis could be quite abstract.
Sag et al., page 6
20Knowledge of Language
- Every normal speaker of any natural language has
acquired an immensely rich and systematic body of
unconscious knowledge, which can be investigated
by consulting speakers intuitive judgments. - Languages are objects of considerable
complexity, which can be studied scientifically.
That is, we can formulate hypotheses about
linguistic structure and test them against the
facts of particular languages. - Sag et al., page 2
Claim 1
Claim 2
Claim 3
Claim 4
21Grammaticality Judgments as a scientific tool for
collecting data
- What is grammaticality?
- What are some problems in using it as a tool for
collecting data? - Grammaticality vs corpus analysis
22One more claim
- It is also possible to make testable hypotheses
about how languages differ and what they have in
common.
23Outline
- Views of language
- Prescriptive
- Artistic
- Descriptive
- Claims about knowledge of a language
- Unconscious
- Complex
- Systematic
- Can be studied scientifically
- A research tool grammaticality judgments
- What is grammaticality?
- Problems with grammaticality
- Rationalism vs empiricism
- Why should language technologists care about
grammaticality?
24Investigate hypotheses by consulting native
speakers intuitions
- Many linguists (probably a majority) assume that
people can distinguish strings of words that are
sentences of their language from strings of words
that are not sentences of their language. - So imagine that you are a machine or a classifier
that takes a sentence as input, and returns
accept or reject as output.
25Native speakers as automata that accept and
reject strings of words.
- The student read a book.
- Student the a read book.
26Grammaticality
- A string of words that you recognize as a
sentence in your native language is grammatical. - A string of words that you do not recognize as a
sentence in your native language is
ungrammatical. - When you decide whether a sentence is grammatical
or ungrammatical, this is called giving a
grammaticality judgment. - Ungrammatical sentences are preceded by an
asterisk or star (). Sometimes they are called
starred sentences. - If native speakers cant decide whether the
sentence is grammatical or ungrammatical, it is
preceded by a combination of stars and question
marks.
27Grammaticality Descriptive
- When you give a grammaticality judgment, you are
not supposed to judge whether the sentence is the
most elegant or appropriate --- just whether it
is a sentence of your language or not. - You may have a stylistic preference for one of
these, but they are all grammatical. - These are things you never want to hear.
- These are things you want never to hear.
- These are things you want to never hear.
28Grammatical ? meaningful
- It is unlikely that Pat will succeed.
- It is improbable that Pat will succeed.
- Pat is unlikely to succeed.
- Pat is improbable to succeed.
- This could be meaningful, but most people
consider it to be ungrammatical. - They saw Pat with Chris.
- They saw Pat and Chris.
- Who did they see Pat with?
- Who did they see Pat and?
- Again, this could be meaningful, but it is
ungrammatical.
29Syntactically well-formed vs semantically
well-formed
- Colorless green ideas sleep furiously.
- Syntactically well-formed
- Chomsky, 1957
- Colorless sleep green furiously ideas.
- Not syntactically well-formed
30Grammaticality Where to draw the line?
- Sentences that are understandable, but sound like
mistakes are probably not grammatical. - These are things that I dont know anyone who
says.
31Where to draw the line?
- Sentences of bad poetry are not grammatical.
- Strange word order in order to make lines rhyme.
- Fame to our alma mater
- Thousands of voices ring
- Telling of love we bear her
- To her we laurels bring.
- From my high school song. Dont ask how I could
remember something like that. - indirect-object subject direct-object
verb - We bring laurels to her.
- subject verb direct-object
indirect-object
32Grammaticality
- More bad poetry not grammatical
- Shout on high the ringing praises, loyal strong
and true - Bring we to our alma mater trust and
honor due. - verb subject indirect-object
direct-object - We bring trust and honor (that are) due
to our alma mater. - subject verb direct-object indirect-object
33Where to draw the line?
- However, many types of sentences that are found
in writing, or are restricted to special contexts
are considered to be grammatical and even have
names - Locative Inversion In this village live many
people. - Topicalization Sam, I like.
- Heavy NP Shift I presented to the students many
examples of strange and unusual constructions.
(indirect object comes before direct object
because the direct object is too long) - These are grammatical.
34Grammaticality
- Grammatical
- In this village live many people.
- I presented to the students many examples of
strange and unusual constructions. - Sam, I like.
- Not grammatical
- To her we laurels bring.
- Bring we to our alma mater trust and honor due.
- These are things that I dont know anyone who
says. - Who did they see Pat and?
- We told them to vote for ourselves.
35Problems with Grammaticality
- Dialect differences
- The car needs washed.
- (The car needs to be washed.)
- We go to the movies a lot anymore.
- (We go to the movies a lot these days.)
- I gave it her.
- (I gave it to her.)
- It were me what told her.
- (It was me that told her.)
- Mine is bigger than what yours is.
- (Mine is bigger than yours is.)
- Aint no chicken cant get into no coop.
- (No chicken can get into a coop.)
- (There isnt a chicken that can get into a coop.)
36Problems with grammaticality
- Changes over time
- (From Kroeger, Chapter 1)
- With two things hath God mens soul
endowed. - Normal word order in English before 1100 AD
- I know not what course others may take,
- Patrick Henry, 1775
37Grammaticality Discrete or Continuous?
- Manning (2003) Probabilistic Syntax
- We regard Kim to be an acceptable candidate.
- Consulting native speakers judgments.
- Conservatives argue that the Bible regards
homosexuality to be a sin. - Attested example.
- Kim turned out doing all the work.
- Consulting native speakers judgments.
- But it turned out having a greater impact than
any of us dreamed. - Attested example.
- Better to ask, How likely? than to ask,
Possible or not?
38Philosophy LessonRationalism and Empiricism
- Rationalism the source of knowledge is reason
- Empiricism the source of knowledge is data
39Rationalist view of linguistic data
- Language is something in peoples minds a set
of rules and principles that allows them to make
grammaticality judgments and produce and
understand sentences that they have never heard
before - i-language or internal language
- We study i-language asking people to give
grammaticality judgments. - A corpus (a collection of texts or speech) is
e-language, or external language. It is not the
object of study.
40Empiricist view of linguistic data
- Corpora are the objects of study.
- We study language by examining patterns in
corpora (collections of texts or speech).
41Why do we need the philosophy lesson?
- In the second half of the 20th century,
linguistics was heavily dominated by rationalism. - Computational linguistics was also initially
dominated by rationalism. - Rationalism/empiricism was heavily debated in
computational linguistics in the 1990s. - Rationalism people writing grammar rules for a
parser - Empiricism statistical, corpus-based models
- In current Language Technologies Research,
rationalism and empiricism are often combined. - Combination A person choosing linguistic
features as input to a machine learning
algorithm, which then learns from the
distribution of the features in a corpus. - Combination Syntax-based statistical machine
translation. - Empiricism is gaining ground in linguistics
(Manning 2003) - Linguistics textbooks are still mainly
rationalist. - Empiricism is mentioned only in one footnote in
Chapter 1 of the Sag et al book. - But a few years earlier, it would not have been
mentioned at all!
42Strong points of rationalism
- Infinite, creative capacity People can produce
and understand sentences that have never been
uttered before. They are not repeating memorized
patterns, but applying productive rules. - Leads people to wonder about things that dont
exist in a corpus Who did you see Pat and? - Probability is not grammaticality grammatical
sentences may have very low probability. - Probability reflects facts about the world, but
grammaticality is independent of context. - Clyde is an African elephant.
- Clyde is a pink elephant
43Strong points of empiricism
- Frequency of occurrence in a corpus is easier to
measure reliably than a grammaticality judgment. - Many ungrammatical sentences turn out to be
acceptable in the right context. - Identifying the right context turns out to be an
interesting question that does not arise in the
rationalist approach. - Bresnan et al., 2005, 2007
- I gave her the book.
- I gave the book to her.
44Grammaticality in language technologies
- Real input (especially spoken input) is not
always well-formed, so you should not build a
program that accepts only grammatical sentences. - Can we do away with grammar in language
technologies?
45Grammaticality in Language Technologies
- You cannot extract the meaning of a sentence
without processing the grammar - Sue interviewed Sam.
- Sam interviewed Sue.
- LT output has to be comprehensible, and
therefore, mostly grammatical - Synthesized speech
- An automatically produced translation
- An automatically produced summary
- Error detection programs for computer-assisted
language instruction or for word processing must
distinguish grammatical from ungrammatical
sentences.
46In favor of grammaticality
- Probability is not grammaticality grammatical
sentences may have very low probability. - Probability reflects facts about the world, but
grammaticality is independent of context. - Clyde is an African elephant.
- Clyde is a pink elephant