LING 138238 SYMBSYS 138 Intro to Computer Speech and Language Processing - PowerPoint PPT Presentation

About This Presentation
Title:

LING 138238 SYMBSYS 138 Intro to Computer Speech and Language Processing

Description:

Overview and history of the field. Knowledge of language. The role of ambiguity ... Returns other or theology /b[tT]heb/ 9/5/09. LING 138/238 Autumn 2004. 29. Errors ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 33
Provided by: DanJur6
Learn more at: https://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: LING 138238 SYMBSYS 138 Intro to Computer Speech and Language Processing


1
LING 138/238 SYMBSYS 138Intro to Computer Speech
and Language Processing
  • Dan Jurafsky

2
Today 9/28 Week 1
  • Overview and history of the field
  • Knowledge of language
  • The role of ambiguity
  • Models and Algorithms
  • Eliza, Turing, and conversational agents
  • History of speech and language processing
  • Administration
  • Overview of course topics
  • 1 week on each course in NLPSpeechDialog!
  • Regular expressions
  • Start of finite automata

3
Computer Speech and Language Processing
  • What is it?
  • Getting computers to perform useful tasks
    involving human languages whether for
  • Enabling human-machine communication
  • Improving human-human communication
  • Doing stuff with language objects
  • Examples
  • Question Answering
  • Machine Translation
  • Spoken Conversational Agents

4
Kinds of knowledge needed?
  • Consider the following interaction with HAL the
    computer from 2001 A Space Odyssey
  • Dave Open the pod bay doors, Hal.
  • HAL Im sorry Dave, Im afraid I cant do that.

5
Knowledge needed to build HAL?
  • Speech recognition and synthesis
  • Dictionaries (how words are pronounced)
  • Phonetics (how to recognize/produce each sound of
    English)
  • Natural language understanding
  • Knowledge of the English words involved
  • What they mean
  • How they combine (what is a pod bay door?)
  • Knowledge of syntactic structure
  • Im I do, Sorry that afraid Dave Im cant

6
Whats needed?
  • Dialog and pragmatic knowledge
  • open the door is a REQUEST (as opposed to a
    STATEMENT or information-question)
  • It is polite to respond, even if youre planning
    to kill someone.
  • It is polite to pretend to want to be cooperative
    (Im afraid, I cant)
  • What is that in I cant do that?
  • Even a system to book airline flights needs many
    of this kind of knowledge

7
Question Answering
  • What does door mean?
  • What year was Abraham Lincoln born?
  • How many states were in the United States when
    Lincoln was born?
  • Was there a military draft during the Hoover
    administration?
  • What do US scientists think about whether human
    cloning should be legal?

8
Machine Translation
  • Dai-yu alone on bed top think-of-with-gratitude
    Bao-chai again listen to window outside bamboo
    tip plantain leaf of on-top rain sound sigh drop
    clear cold penetrate curtain not feeling again
    fall down tears come
  • As she lay there along, Dai-yus thoughts turned
    to Bao-chai Then she listened to the insistent
    rustle of the rain on the bamboos and plantains
    outside her window. The coldness penetrated the
    curtains of her bed. Almost without noticing it
    she had begun to cry.

9
Machine Translation
  • The Story of the Stone
  • The Dream of the Red Chamber (Cao Xueqin 1792)
  • Issues
  • Breaking up into sentences
  • Zero-anaphora
  • Penetrate -gt penetrated
  • Bamboo tip plaintain leaf -gt bamboos and
    plantains
  • Curtain -gt curtains of her bed

10
Ambiguity
  • Find at least 5 meanings of this sentence
  • I made her duck

11
Ambiguity
  • Find at least 5 meanings of this sentence
  • I made her duck
  • I cooked waterfowl for her benefit (to eat)
  • I cooked waterfowl belonging to her
  • I created the (plaster?) duck she owns
  • I caused her to quickly lower her head or body
  • I waved my magic wand and turned her into
    undifferentiated waterfowl
  • At least one other meaning thats inappropriate
    for gentle company.

12
Ambiguity is Pervasive
  • I caused her to quickly lower her head or body
  • Grammar duck can be a noun (waterfowl) or a
    verb (move body)
  • I cooked waterfowl belonging to her.
  • Grammar her can be a possessive (of her) or
    dative (for her) pronoun
  • I made the (plaster) duck statue she owns
  • Meaning make can mean create or cook

13
Ambiguity is Pervasive
  • Grammar Make can be
  • Transitive (verb has a noun direct object)
  • I cooked waterfowl belonging to her
  • Ditransitive (verb has 2 noun objects)
  • I made her (into) undifferentiated waterfowl
  • Action-transitive (verb has a direct object and
    another verb)
  • I caused her to move her body

14
Ambiguity is Pervasive
  • Phonetics!
  • I mate or duck
  • Im eight or duck
  • Eye maid her duck
  • Aye mate, her duck
  • I maid her duck
  • Im aid her duck
  • I mate her duck
  • Im ate her duck
  • Im ate or duck
  • I mate or duck

15
Models and Algorithms
  • Models formalisms used to capture the various
    kinds of linguistic structure.
  • State machines (fsa, transducers, markov models)
  • Formal rule systems (context-free grammars,
    feature systems)
  • Logic (predicate calculus, inference)
  • Probabilistic versions of all of these others
    (gaussian mixture models, probabilistic
    relational models, etc etc)
  • Algorithms used to manipulate representations to
    create structure.
  • Search (A, dynamic programming)
  • Supervised learning, etc etc

16
Language, Thought, Understanding
  • A Gedanken Experiment Turing Test
  • Question can a machine think is not
    operational.
  • Operational version
  • 2 people and a computer
  • Interrogator talks to contestant and computer via
    teletype
  • Task of machine is to convince interrogator it is
    human
  • Task of contestant is to convince interrogator
    she and not machine is human.

17
Eliza Weizenbaum (1966)
  • User You are like my father in some ways.
  • ELIZA What resemblance do you see
  • User You are not very aggressive but I think
    you dont want me to notice that.
  • ELIZA What makes you think I am not aggressive
  • User You dont argue with me
  • ELIZA Why do you think I dont argue with you
  • User You are afraid of me
  • ELIZA Does it please you to believe I am afraid
    of you

18
History foundational insights 1940s-1950s
  • Automaton
  • Turing 1936
  • McCulloch-Pitts neuron (1943)
  • http//marr.bsee.swin.edu.au/dtl/het704/lecture10
    /ann/node1.html
  • http//diwww.epfl.ch/mantra/tutorial/english/mcpit
    s/html/
  • Kleene (1951/1956)
  • Shannon (1948) link between automata and Markov
    models
  • Chomsky (1956)/Backus (1959)/Naur(1960) CFG
  • Probabilistic/Information-theoretic models
  • Shannon (1948)
  • Bell Labs speech recognition (1952)

19
History the two camps 1957-1970
  • Symbolic
  • Zellig Harris 1958 TDAP first parser?
  • Cascade of finite-state transducers
  • Chomsky
  • AI workshop at Dartmouth (McCarthy, Minsky,
    Shannon, Rochester)
  • Newell and Simon Logic Theorist, General Problem
    Solver
  • Statistical
  • Bledsoe and Browning (1959) Bayesian OCR
  • Mosteller and Wallace (1964) Bayesian authorship
    attribution
  • Denes (1959) ASR combining grammar and acoustic
    probability

20
Four paradigms 1970-1983
  • Stochastic
  • Hidden Markov Model 1972
  • Independent application of Baker (CMU) and
    Jelinek/Bahl/Mercer lab (IBM) following work of
    Baum and colleagues at IDA
  • Logic-based
  • Colmerauer (1970,1975) Q-systems
  • Definite Clause Grammars (Pereira and Warren
    1980)
  • Kay (1979) functional grammar, Bresnan and Kaplan
    (1982) unification
  • Natural language understanding
  • Winograd (1972) Shrdlu
  • Schank and Abelson (1977) scripts, story
    understanding
  • Influence of case-role work of Fillmore (1968)
    via Simmons (1973), Schank.
  • Discourse Modeling
  • Grosz and colleagues discourse structure and
    focus
  • Perrault and Allen (1980) BDI model

21
Empiricism and Finite State Redux 1983-1993
  • Finite State Models
  • Kaplan and Kay (1981) Phonology/Morphology
  • Church (1980) Syntax
  • Return of Empiricism
  • Probabilistic models return to language
    processing
  • Corpora created for language tasks
  • Early statistical versions of NLP applications
    (parsing, tagging, machine translation)
  • Training sets and test sets

22
The field comes together 1994-2004
  • Statistical models standard
  • ACL conference
  • 1990 39 articles 1 statistical
  • 2003 62 articles 48 statistical
  • Machine learning techniques key
  • Information retrieval meets NLP
  • Unified field
  • IR, NLP, MT, ASR, TTS, Dialog

23
How this course fits in
  • This is our new introductory course in natural
    language, speech, and dialog processing
  • Other courses
  • http//www.stanford.edu/jurafsky/nlpcourses.html
  • This course will cover 1 week each on material
    from these other courses!

24
Requirements and Grading
  • Readings
  • Selected chapters from Speech and Language
    Processing by Jurafsky and Martin, Prentice-Hall
    2000
  • We are writing the 2nd edition, so you get to be
    the guinea-pigs!
  • A few conference and journal papers
  • Best 7 of 8 assignments
  • Grading
  • Homework 84
  • Participation 16

25
Overview of the course
  • http//www.stanford.edu/class/linguist238

26
Some brief demos
  • Machine Translation
  • http//translate.google.com/translate_t
  • TTS
  • http//www.rhetorical.com/cgi-bin/demo.cgi
  • QA
  • http//www.languagecomputer.com/scripts/question.c
    gi

27
Regular Expressions and Text Searching
  • Emacs, vi, perl, grep, etc..
  • // search delimiter
  • character disjunction
  • a-f character range disjunction
  • a character negation
  • ? zero or one instance of previous
  • Kleene star, zero or more instances of prev.
  • anchors start of line
  • \b anchors word boundary
  • disjunction
  • () grouping, precedence

28
Example
  • Find me all instances of the word the in a
    text.
  • /the/
  • Misses capitalized examples
  • /tThe/
  • Returns other or theology
  • /\btThe\b/

29
Errors
  • The process we just went through was based on two
    fixing kinds of errors
  • Matching strings that we should not have matched
    (there, then, other)
  • False positives
  • Not matching things that we should have matched
    (The)
  • False negatives

30
Errors cont.
  • Well be telling the same story for many tasks,
    all quarter. Reducing the error rate for an
    application often involves two antagonistic
    efforts
  • Increasing accuracy (minimizing false positives)
  • Increasing coverage (minimizing false negatives).

31
More complex RE example
  • Regular expressions for prices
  • /0-9/
  • Doesnt deal with fractions of dollars
  • /0-9\.0-90-9/
  • Doesnt allow 199, not word-aligned
  • \b0-9(\.0-90-9)?\b)

32
RE substitution, memory, ELIZA
  • s/. you are (depressedsad) ./I am sorry to
    hear you are \1/
  • s/. you are (depressedsad) ./Why do you think
    you are \1/
  • s/. all ./In what way/
  • S/. always ./Can you think of a specific
    example/
Write a Comment
User Comments (0)
About PowerShow.com