438538 Computational Linguistics - PowerPoint PPT Presentation

Loading...

PPT – 438538 Computational Linguistics PowerPoint presentation | free to download - id: 1fcefe-ZTQ3N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

438538 Computational Linguistics

Description:

Build a toy machine translation system. Administrivia. Laboratory Exercises ... Ian told the man that he hired a story. Ian told the man that he hired a secretary ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 39
Provided by: sandiw
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: 438538 Computational Linguistics


1
438/538 Computational Linguistics
  • Sandiway Fong
  • Lecture 1 8/22

2
Part 1
  • Administrivia

3
Administrivia
  • Where
  • BIO W 212
  • When
  • TR 1230145PM
  • No Class
  • Thursday September 14th
  • Thursday September 28th
  • Thursday November 23rd (Thanksgiving)
  • Office Hours
  • catch me after class, or
  • by appointment
  • Location Douglass 311

4
Administrivia
  • Map
  • Office (Douglass)
  • Classroom (FCS)

5
Administrivia
  • Email
  • sandiway_at_email.arizona.edu
  • Homepage
  • http//dingo.sbs.arizona.edu/sandiway
  • Lecture slides
  • available on homepage after each class
  • in both PowerPoint (.ppt) and Adobe PDF formats
  • animation in powerpoint
  • last years slides are available
  • (new material for this year will be rotated in)

6
Administrivia
  • Reference Textbook
  • Speech and Language Processing, Jurafsky
    Martin, Prentice-Hall 2000
  • 21 chapters (900 pages)
  • Concepts, algorithms, heuristics
  • Sound/speech side
  • N. Warner Speech Tech LING 578 (this semester)
  • Y. Lin Statistical NLP LING 539 (Spring 2007)
  • Intersection with research areas
  • Parsing and Linguistic Theory (Sentence
    Processing)
  • Computational Morphology
  • Machine Translation, WordNet

7
Administrivia
  • Course Objectives
  • Theoretical
  • Introduction to a broad selection of natural
    language processing techniques
  • Survey course
  • Relevance to linguistic theory
  • Practical
  • Acquire some expertise
  • Parsing algorithms
  • Write grammars and machines
  • Build a toy machine translation system

8
Administrivia
  • Laboratory Exercises
  • To run tools and write grammars
  • you need access to computational facilities
  • use your PC (Windows, Linux) or Mac
  • Homework exercises

9
Administrivia
  • Homeworks and Grading
  • 68 homeworks
  • no final or mid-terms
  • mix of theoretical and practical exercises
  • there will be mandatory and extra credit
    questions
  • extra credit questions matter
  • make up points lost on other questions in the
    homework
  • may bump you up a grade at the end of the
    semester in borderline situations
  • some simple programming is involved (no
    prerequisite)
  • use of a spreadsheet (Excel) for numerical
    calculation

10
Administrivia
  • Homeworks and Grading
  • Homeworks will be presented/explained in class
  • (good chance to ask questions)
  • Please attempt homeworks early
  • (then you can ask questions before the deadline)
  • Unless otherwise specified, you have one week to
    do the homework
  • (midnight deadline)
  • (email submission to me)
  • e.g. homework comes out on Thursday, it is due in
    my mailbox by next Thursday midnight
  • Look for acknowledgement email from me

11
Administrivia
  • Homework Ethics
  • you may discuss homework with your classmates
  • however, you must do the work and write them up
    independently
  • sources must be acknowledged,
  • e.g. if you borrow program code off the internet
  • discovered cheaters will be sanctioned
  • Late Policy
  • all homework is mandatory
  • you cant get an A skipping a homework
  • some homeworks may depend on earlier homeworks
  • deductions if late
  • if you know you are going to be late, notify me
    ahead of time
  • e.g. upcoming emergencies

12
Administrivia
  • 438 vs. 538
  • 538
  • 438
  • 1 classroom presentation of a selected chapter
  • 438 extra credit homework questions are
    obligatory

13
Administrivia
  • There is a laptop being passed around
  • Fill out spreadsheet entry
  • Name
  • Email
  • Year/Major
  • 438 or 538
  • Relevant background

14
Administrivia
  • Class demographics (8/20 classlist)

15
Part 2
  • Introduction

16
Human Language Technology (HLT)
  • ... is everywhere
  • information is organized and accessed using
    language

17
Human Language Technology (HLT)
  • Beginnings
  • c. 1950 (just after WWII)
  • Electronic computers invented for
  • numerical analysis
  • code breaking
  • Killer Apps
  • Language comprehension tasks and Machine
    Translation (MT)
  • Reference
  • Readings in Machine Translation
  • Eds. Nirenburg, S. et al. MIT Press 2003.
  • (Part 1 Historical Perspective)

18
Human Language Technology (HLT)
  • Cryptoanalysis Basis
  • early optimism
  • Translation. Weaver, W.
  • Citing Shannons work, he asks
  • If we have useful methods for solving almost any
    cryptographic problem, may it not be that with
    proper interpretation we already have useful
    methods for translation?

19
Human Language Technology (HLT)
  • Popular in the early days and has undergone a
    modern revival
  • The Present Status of Automatic Translation of
    Languages (Bar-Hillel, 1951)
  • I believe this overestimation is a remnant of
    the time, seven or eight years ago, when many
    people thought that the statistical theory of
    communication would solve many, if not all, of
    the problems of communication
  • Much valuable time spent on gathering statistics
  • perhaps no longer a bottleneck

20
Human Language Technology (HLT)
  • uneasy relationship between linguistics and
    statistical analysis
  • Statistical Methods and Linguistics (Abney, 1996)
  • Chomsky vs. Shannon
  • Statistics and low (zero) frequency items
  • Smoothing
  • No relation between order of approximation and
    grammaticality
  • Parameter estimation problem is intractable (for
    humans)
  • IBM (17 million parameters)

21
Human Language Technology (HLT)
  • recent exciting developments in HLT
  • precipitated by progress in
  • computers stochastic machine learning methods
  • storage large amounts of training data
  • recent improvements in stochastic models from
    incorporating linguistic knowledge
  • (Hovy, MT Summit 2003)

22
Human Language Technology (HLT)
  • Killer Application?

23
Natural Language Processing (NLP) Computational
Linguistics
  • Question
  • How to process natural languages on a computer
  • Intersects with
  • Computer science (CS)
  • Mathematics/Statistics
  • Artificial intelligence (AI)
  • Linguistic Theory
  • Psychology Psycholinguistics
  • e.g. the human sentence processor

24
Natural Language Properties
  • which properties are going to be difficult for
    computers to deal with?
  • Grammar (Rules for putting words together into
    sentences)
  • How many rules are there?
  • 100, 1000, 10000, more …
  • Portions learnt or innate
  • Do we have all the rules written down somewhere?
  • Lexicon (Dictionary)
  • How many words do we need to know?
  • 1000, 10000, 100000 …

25
Computers vs. Humans
  • Knowledge of language
  • Computers are way faster than humans
  • They kill us at arithmetic and chess
  • But human beings are so good at language, we
    often take our ability for granted
  • Processed without conscious thought
  • Exhibit complex behavior

IBMs Deep Blue
26
Examples
  • Innate Knowledge?
  • Which report did you file without reading?
  • (Parasitic gap sentence)
  • file(x,y)
  • read(u,v)

the report was filed without reading
x you y report u x you v y report and
there are no other possible interpretations
27
Examples
  • Changes in interpretation
  • John is too stubborn to talk to
  • John is too stubborn to talk to Bill

talk_to(x,y) (1) x arbitrary person, y
John (2) x John, y Bill
28
Examples
  • Ambiguity
  • Where can I see the bus stop?
  • stop verb or part of the noun-noun compound bus
    stop
  • Context (Discourse or situation)
  • Where can I see the NN bus stop?
  • Where can I see the bus V stop?

29
Examples
  • Ungrammaticality
  • Which book did you file the report without
    reading?
  • ungrammatical
  • relative
  • ungrammatical vs. incomprehensible

30
Example
  • The human parser has quirks
  • Ian told the man that he hired a story
  • Ian told the man that he hired a secretary
  • Garden-pathing
  • Temporary ambiguity
  • tell multiple syntactic frames for the verb
  • Ian told the man that he hired a story
  • Ian told the man that he hired a secretary

31
Examples
  • More subtle differences
  • The reporter who the senator attacked admitted
    the error
  • The reporter who attacked the senator admitted
    the error
  • Processing time differences
  • Subject vs. object relative clauses
  • Q Do we want to mimic the human parser
    completely?

32
Frequently Asked Questions from the Linguistic
Society of America (LSA)
  • http//www.lsadc.org/info/ling-faqs.cfm

33
  • LSA (Linguistic Society of America) pamphlet
  • by Ray Jackendoff
  • A Linguists Perspective on Whats Hard for
    Computers to Do …
  • is he right?

34
If computers are so smart, why can't they use
simple English?
  • Consider, for instance, the four letters read
    they can be pronounced as either reed or red. How
    does the machine know in each case which is the
    correct pronunciation? Suppose it comes across
    the following sentences
  • (l) The girls will read the paper. (reed)
  • (2) The girls have read the paper. (red)
  • We might program the machine to pronounce read as
    reed if it comes right after will, and red if it
    comes right after have. But then sentences (3)
    through (5) would cause trouble.
  • (3) Will the girls read the paper? (reed)
  • (4) Have any men of good will read the paper?
    (red)
  • (5) Have the executors of the will read the
    paper? (red)
  • How can we program the machine to make this come
    out right?

35
If computers are so smart, why can't they use
simple English?
  • (6) Have the girls who will be on vacation next
    week read the paper yet? (red)
  • (7) Please have the girls read the paper. (reed)
  • (8) Have the girls read the paper?(red)
  • Sentence (6) contains both have and will before
    read, and both of them are auxiliary verbs. But
    will modifies be, and have modifies read. In
    order to match up the verbs with their
    auxiliaries, the machine needs to know that the
    girls who will be on vacation next week is a
    separate phrase inside the sentence.
  • In sentence (7), have is not an auxiliary verb at
    all, but a main verb that means something like
    'cause' or 'bring about'. To get the
    pronunciation right, the machine would have to be
    able to recognize the difference between a
    command like (7) and the very similar question in
    (8), which requires the pronunciation red.

36
Next time …
  • We will begin by introducing you to a programming
    language you will become familiar with
  • Two introductory lectures
  • Name PROLOG (Programming in Logic)
  • Variant SWI-PROLOG (free software from
    University of Amsterdam)
  • Download http//www.swi-prolog.org/
  • Install it on your PC or Mac
  • Based on mathematical logic
  • logic and inference are useful tools
  • Contains built-in grammar rules
  • programming language was originally designed for
    NLP

37
Prolog Resources
  • Some background in programming?
  • Useful Online Tutorials
  • An introduction to Prolog
  • (Michel Loiseleur Nicolas Vigier)
  • http//invaders.mars-attacks.org/boklm/prolog/
  • Learn Prolog Now!
  • (Patrick Blackburn, Johan Bos Kristina
    Striegnitz)
  • http//www.coli.uni-saarland.de/kris/learn-prolog
    -now/lpnpage.php?pageidonline

38
Prolog Resources
  • No background at all?
  • Audit
  • LING 388 Computers and Language
  • (also taught by me)
  • first couple of weeks
  • introduces Prolog at a more gentle pace
  • uses lab classes for practice
  • Lectures TR 330445pm
  • Harvill 313
  • Hands-on Lab Class this Thursday
  • Social Sciences 224
About PowerShow.com