Title: LING 138238 SYMBSYS 138 Intro to Computer Speech and Language Processing
1LING 138/238 SYMBSYS 138Intro to Computer Speech
and Language Processing
2Today 9/28 Week 1
- Overview and history of the field
- Knowledge of language
- The role of ambiguity
- Models and Algorithms
- Eliza, Turing, and conversational agents
- History of speech and language processing
- Administration
- Overview of course topics
- 1 week on each course in NLPSpeechDialog!
- Regular expressions
- Start of finite automata
3Computer Speech and Language Processing
- What is it?
- Getting computers to perform useful tasks
involving human languages whether for - Enabling human-machine communication
- Improving human-human communication
- Doing stuff with language objects
- Examples
- Question Answering
- Machine Translation
- Spoken Conversational Agents
4Kinds of knowledge needed?
- Consider the following interaction with HAL the
computer from 2001 A Space Odyssey - Dave Open the pod bay doors, Hal.
- HAL Im sorry Dave, Im afraid I cant do that.
5Knowledge needed to build HAL?
- Speech recognition and synthesis
- Dictionaries (how words are pronounced)
- Phonetics (how to recognize/produce each sound of
English) - Natural language understanding
- Knowledge of the English words involved
- What they mean
- How they combine (what is a pod bay door?)
- Knowledge of syntactic structure
- Im I do, Sorry that afraid Dave Im cant
6Whats needed?
- Dialog and pragmatic knowledge
- open the door is a REQUEST (as opposed to a
STATEMENT or information-question) - It is polite to respond, even if youre planning
to kill someone. - It is polite to pretend to want to be cooperative
(Im afraid, I cant) - What is that in I cant do that?
- Even a system to book airline flights needs many
of this kind of knowledge
7Question Answering
- What does door mean?
- What year was Abraham Lincoln born?
- How many states were in the United States when
Lincoln was born? - Was there a military draft during the Hoover
administration? - What do US scientists think about whether human
cloning should be legal?
8Machine Translation
- Dai-yu alone on bed top think-of-with-gratitude
Bao-chai again listen to window outside bamboo
tip plantain leaf of on-top rain sound sigh drop
clear cold penetrate curtain not feeling again
fall down tears come - As she lay there along, Dai-yus thoughts turned
to Bao-chai Then she listened to the insistent
rustle of the rain on the bamboos and plantains
outside her window. The coldness penetrated the
curtains of her bed. Almost without noticing it
she had begun to cry.
9Machine Translation
- The Story of the Stone
- The Dream of the Red Chamber (Cao Xueqin 1792)
- Issues
- Breaking up into sentences
- Zero-anaphora
- Penetrate -gt penetrated
- Bamboo tip plaintain leaf -gt bamboos and
plantains - Curtain -gt curtains of her bed
10Ambiguity
- Find at least 5 meanings of this sentence
- I made her duck
11Ambiguity
- Find at least 5 meanings of this sentence
- I made her duck
- I cooked waterfowl for her benefit (to eat)
- I cooked waterfowl belonging to her
- I created the (plaster?) duck she owns
- I caused her to quickly lower her head or body
- I waved my magic wand and turned her into
undifferentiated waterfowl - At least one other meaning thats inappropriate
for gentle company.
12Ambiguity is Pervasive
- I caused her to quickly lower her head or body
- Grammar duck can be a noun (waterfowl) or a
verb (move body) - I cooked waterfowl belonging to her.
- Grammar her can be a possessive (of her) or
dative (for her) pronoun - I made the (plaster) duck statue she owns
- Meaning make can mean create or cook
13Ambiguity is Pervasive
- Grammar Make can be
- Transitive (verb has a noun direct object)
- I cooked waterfowl belonging to her
- Ditransitive (verb has 2 noun objects)
- I made her (into) undifferentiated waterfowl
- Action-transitive (verb has a direct object and
another verb) - I caused her to move her body
14Ambiguity is Pervasive
- Phonetics!
- I mate or duck
- Im eight or duck
- Eye maid her duck
- Aye mate, her duck
- I maid her duck
- Im aid her duck
- I mate her duck
- Im ate her duck
- Im ate or duck
- I mate or duck
15Models and Algorithms
- Models formalisms used to capture the various
kinds of linguistic structure. - State machines (fsa, transducers, markov models)
- Formal rule systems (context-free grammars,
feature systems) - Logic (predicate calculus, inference)
- Probabilistic versions of all of these others
(gaussian mixture models, probabilistic
relational models, etc etc) - Algorithms used to manipulate representations to
create structure. - Search (A, dynamic programming)
- Supervised learning, etc etc
16Language, Thought, Understanding
- A Gedanken Experiment Turing Test
- Question can a machine think is not
operational. - Operational version
- 2 people and a computer
- Interrogator talks to contestant and computer via
teletype - Task of machine is to convince interrogator it is
human - Task of contestant is to convince interrogator
she and not machine is human.
17Eliza Weizenbaum (1966)
- User You are like my father in some ways.
- ELIZA What resemblance do you see
- User You are not very aggressive but I think
you dont want me to notice that. - ELIZA What makes you think I am not aggressive
- User You dont argue with me
- ELIZA Why do you think I dont argue with you
- User You are afraid of me
- ELIZA Does it please you to believe I am afraid
of you
18History foundational insights 1940s-1950s
- Automaton
- Turing 1936
- McCulloch-Pitts neuron (1943)
- http//marr.bsee.swin.edu.au/dtl/het704/lecture10
/ann/node1.html - http//diwww.epfl.ch/mantra/tutorial/english/mcpit
s/html/ - Kleene (1951/1956)
- Shannon (1948) link between automata and Markov
models - Chomsky (1956)/Backus (1959)/Naur(1960) CFG
- Probabilistic/Information-theoretic models
- Shannon (1948)
- Bell Labs speech recognition (1952)
19History the two camps 1957-1970
- Symbolic
- Zellig Harris 1958 TDAP first parser?
- Cascade of finite-state transducers
- Chomsky
- AI workshop at Dartmouth (McCarthy, Minsky,
Shannon, Rochester) - Newell and Simon Logic Theorist, General Problem
Solver - Statistical
- Bledsoe and Browning (1959) Bayesian OCR
- Mosteller and Wallace (1964) Bayesian authorship
attribution - Denes (1959) ASR combining grammar and acoustic
probability
20Four paradigms 1970-1983
- Stochastic
- Hidden Markov Model 1972
- Independent application of Baker (CMU) and
Jelinek/Bahl/Mercer lab (IBM) following work of
Baum and colleagues at IDA - Logic-based
- Colmerauer (1970,1975) Q-systems
- Definite Clause Grammars (Pereira and Warren
1980) - Kay (1979) functional grammar, Bresnan and Kaplan
(1982) unification - Natural language understanding
- Winograd (1972) Shrdlu
- Schank and Abelson (1977) scripts, story
understanding - Influence of case-role work of Fillmore (1968)
via Simmons (1973), Schank. - Discourse Modeling
- Grosz and colleagues discourse structure and
focus - Perrault and Allen (1980) BDI model
21Empiricism and Finite State Redux 1983-1993
- Finite State Models
- Kaplan and Kay (1981) Phonology/Morphology
- Church (1980) Syntax
- Return of Empiricism
- Probabilistic models return to language
processing - Corpora created for language tasks
- Early statistical versions of NLP applications
(parsing, tagging, machine translation) - Training sets and test sets
22The field comes together 1994-2004
- Statistical models standard
- ACL conference
- 1990 39 articles 1 statistical
- 2003 62 articles 48 statistical
- Machine learning techniques key
- Information retrieval meets NLP
- Unified field
- IR, NLP, MT, ASR, TTS, Dialog
23How this course fits in
- This is our new introductory course in natural
language, speech, and dialog processing - Other courses
- http//www.stanford.edu/jurafsky/nlpcourses.html
- This course will cover 1 week each on material
from these other courses!
24Requirements and Grading
- Readings
- Selected chapters from Speech and Language
Processing by Jurafsky and Martin, Prentice-Hall
2000 - We are writing the 2nd edition, so you get to be
the guinea-pigs! - A few conference and journal papers
- Best 7 of 8 assignments
- Grading
- Homework 84
- Participation 16
25Overview of the course
- http//www.stanford.edu/class/linguist238
26Some brief demos
- Machine Translation
- http//translate.google.com/translate_t
- TTS
- http//www.rhetorical.com/cgi-bin/demo.cgi
- QA
- http//www.languagecomputer.com/scripts/question.c
gi
27Regular Expressions and Text Searching
- Emacs, vi, perl, grep, etc..
- // search delimiter
- character disjunction
- a-f character range disjunction
- a character negation
- ? zero or one instance of previous
- Kleene star, zero or more instances of prev.
- anchors start of line
- \b anchors word boundary
- disjunction
- () grouping, precedence
28Example
- Find me all instances of the word the in a
text. - /the/
- Misses capitalized examples
- /tThe/
- Returns other or theology
- /\btThe\b/
29Errors
- The process we just went through was based on two
fixing kinds of errors - Matching strings that we should not have matched
(there, then, other) - False positives
- Not matching things that we should have matched
(The) - False negatives
30Errors cont.
- Well be telling the same story for many tasks,
all quarter. Reducing the error rate for an
application often involves two antagonistic
efforts - Increasing accuracy (minimizing false positives)
- Increasing coverage (minimizing false negatives).
31More complex RE example
- Regular expressions for prices
- /0-9/
- Doesnt deal with fractions of dollars
- /0-9\.0-90-9/
- Doesnt allow 199, not word-aligned
- \b0-9(\.0-90-9)?\b)
32RE substitution, memory, ELIZA
- s/. you are (depressedsad) ./I am sorry to
hear you are \1/ - s/. you are (depressedsad) ./Why do you think
you are \1/ - s/. all ./In what way/
- S/. always ./Can you think of a specific
example/