LING 406 Intro to Computational Linguistics Course Overview and Introduction - PowerPoint PPT Presentation

Loading...

PPT – LING 406 Intro to Computational Linguistics Course Overview and Introduction PowerPoint presentation | free to view - id: 37ace-MTlhN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

LING 406 Intro to Computational Linguistics Course Overview and Introduction

Description:

The Gorn uses the Universal Translator in the Star Trek episode 'Metamorphosis') 6/4/09 ... Characters are 8 bit. 6/4/09. Linguistics 406. 31. Two-Byte Codes: ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 39
Provided by: richar781
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: LING 406 Intro to Computational Linguistics Course Overview and Introduction


1
LING 406Intro to Computational
LinguisticsCourse Overview and Introduction
  • Richard Sproat
  • URL http//catarina.ai.uiuc.edu/L406_08/

2
This Lecture
  • Overview of course
  • What is NLP?
  • Some NLP problems
  • A fun application
  • Text and document encoding

3
Format of the course
  • Lectures
  • Homeworks
  • Roughly one homework a week
  • Some of the homeworks will require a lot of work
  • You will need to know how to program
  • A midterm and in-class final
  • The final may be substituted with a project

4
Format of the course
  • Grades
  • 70 homeworks
  • 15 each midterm and final
  • There will be one homework that will require a
    (brief) in-class presentation

5
Final projects
  • If you elect this option (and if you do a good
    job) you will get extra credit I expect it to be
    more work than taking the final.
  • But you will probably learn more!
  • The project can be on any topic related to the
    course, e.g.
  • Implement a parsing algorithm
  • Design a morphological analyzer for a non-trivial
    amount of morphology for a language
  • Build a sense-disambiguation system

6
Final projects
  • If you elect to do a project you must
  • Present a proposal to me no later than the end of
    the 7th week.
  • Present a progress report no later than the 11th
    week (so that I know you are on track).
  • Do a brief (15-20 minute) presentation on your
    work during the last week.
  • Turn in, by the end of the 15th week, a short
    (5-10 page) writeup describing your work.

7
Readings for the course
  • Textbooks
  • Jurafsky Martin, Speech and Language Processing
    (Optional)
  • Manning Schütze, Foundations of Statistical
    Language Processing (Optional)
  • Roark Sproat, Computational Approaches to
    Syntax and Morphology (Online free)
  • This is a draft if you find significant errors
    in the book we will acknowledge you by name
  • Various assigned readings from online sources

8
Prerequisites for this course
  • Not many really
  • Helpful if you have taken an intro to
    linguistics, but not essential
  • YOU MUST KNOW HOW TO PROGRAM
  • You will need access to a computer (duh).
  • Some of the tools will require a linux or
    linux-like programming environment.
  • For Windows users I recomment cygwin
    (http//www.cygwin.com)

9
Goals of Computational Linguistics/ Natural
Language Processing
  • To get computers to deal with language the way
    humans do
  • They should be able to understand language and
    respond appropriately in language
  • They should be able to learn human language the
    way children do
  • They should be able to perform linguistic tasks
    that skilled humans can do, such as translation
  • Yeah, right

10
A well-worn example
Astronauts Poole (Gary Lockwood) and Bowman (Keir
Dullea) trying to elude the HAL 9000 computer.
11
The HAL 9000
  • Perfect speech recognition
  • Perfect language understanding
  • Perfect synthesis. (Heres the current
    reality)
  • Perfect modeling of discourse
  • (Vision)
  • (World knowledge)
  • And experts in the 1960s thought this would
    all be possible

12
Another example
The Gorn uses the Universal Translator in the
Star Trek episode Metamorphosis)
13
Are these even reasonable goals?
  • These are nice goals but they have more to do
    with science fiction than with science fact
  • Realistically we dont have to go this far to
    have stuff that is useful
  • Spelling correctors, grammar checkers, MT
    systems, tools for linguistic analysis,
  • Limited speech interaction systems
  • Early systems like ATTs VRCP (Voice Recognition
    Call Processing)
  • Please say collect, third party or calling card
  • More recent examples 777-FILM, United Airlines
    flight info

14
Goals of this course
  • Become familiar with a wide range of areas of NLP
  • What are the basic tools techniques?
  • What kinds of things is it possible to do?
  • Know enough to be able to read research papers in
    the field and understand what the issues are
  • Know enough to be able to evaluate critically
    some of the claims one sees in the popular press
  • Hopefully youll agree that one or more of these
    areas is sufficiently interesting that youll
    want to pursue it yourself!

15
Basic Structure of Course
  • First half traditional symbolic methods
  • Second half statistical (stochastic) methods,
    which may be viewed to some extent as like
    traditional methods with probabilities added.

16
Some NLP Tasks
  • Create a system that can interact intelligently
    with a user for a specialized domain
  • Build a system that can read your email aloud.
  • Build a system that can answer questions such as
    What is the height of Mount Everest?

17
Named Entity Recognition
  • Build a system that can find the names in a text
  • Israeli Leader Suffers Serious Stroke
  • By STEVEN ERLANGER
  • JERUSALEM, Thursday, Jan. 5 - Israeli Prime
    Minister Ariel Sharon suffered a serious stroke
    Wednesday night after being taken to the hospital
    from his ranch in the Negev desert, and he
    underwent brain surgery early today to stop
    cerebral bleeding, a hospital official said.
  • Mr. Sharon's powers as prime minister were
    transferred to Vice Premier Ehud Olmert, said the
    cabinet secretary, Yisrael Maimon.

18
Name Transliteration
  • Handle cross-language transliteration

19
Abbreviation Expansion
  • Recover the underlying words in cases such as

20
Machine Translation
21
Machine Translation
22
Interpret text into scenes
the very huge fried egg is on the table-vp23846.
the very large american party hat is six inches
above the egg. the chinstrap of the hat is
invisible. the table is on the white tile floor.
the french door is behind the table. the tall
white wall is behind the french door. a white
wooden chair is to the right of the table. it is
facing left. it is sunrise. the impi-61
photograph is on the wall. it is three inches
left of the door. it is three feet above the
ground. the photograph is eighteen inches wide. a
white table-vp23846 is one foot to the right of
the chair. the big white teapot is on the table.
23
Interpret Text into Scenes
the humongous silver eye ball is on the shiny
marble ground. the sky is chartreuse.
24
Interpret Text into Scenes
the very humongous light stone sphere is fifty
feet above the sea. the stone castle is in the
sphere. the castle is 80 feet wide the sun is
black.
25
Interpret Text into Scenes
the glass bowling ball is behind the bowling pin.
the ground is silver. a goldfish is inside the
bowling ball.
26
Interpret Text into Scenes
the humongous blue transparent ice cube is on the
silver mountain range. the humongous green
transparent ice cube is next to the blue ice
cube. the humongous red transparent ice cube is
on top of the green ice cube. the humongous
yellow transparent ice cube is to the left of the
green ice cube. the tiny santa claus is inside
the red ice cube. the tiny christmas tree is
inside the blue ice cube. the four tiny reindeer
are inside the green ice cube. the tiny blue
sleigh is inside the yellow ice cube. the small
snowman-vp21048 is three feet in front of the
green ice cube. the sky is pink.
27
Interpret Text into Scenes
the donut shop is on the dirty ground. the donut
of the donut shop is silver. a green a tarmac
road is to the right of the donut shop. the road
is 1000 feet long and 50 feet wide. a yellow
volkswagen bus is eight feet to the right of the
donut shop. it is on the road. a restaurant
waiter is in front of the donut shop. a red
volkswagen beetle is eight feet in front of the
volkswagen bus. the taxi is ten feet behind the
volkswagen bus. the convertible is to the left of
the donut shop. it is facing right. the shoulder
of the road has a dirt texture. the grass of the
road has a dirt texture.
28
Interpret Text into Scenes
The shiny blue goldfish is on the watery ground.
The shiny red colorful-vp3982 is six inches away
from the shiny blue goldfish. The polka dot
colorful-vp3982 is to the right of the shiny blue
goldfish. The polka dot colorful-vp3982 is five
inches away from the shiny blue goldfish. The
transparent orange colorful-vp3982 is above the
shiny blue goldfish.The striped colorful-vp3982
is one foot away from the transparent orange
colorful-vp3982. The huge silver wall is facing
the shiny blue goldfish. The shiny blue goldfish
is facing the silver wall. The silver wall is
five feet away from the shiny blue goldfish.
29
How does the NLP in WordsEye work?
  • Statistical part-of-speech tagger
  • Simple morphological analyzer
  • Statistical parser
  • Reference resolution based on a world model
  • Semantic hierarchy (similar to WordNet)

30
Representation of Text Ascii
Characters are 7 bits
31
ISO-8859-x
Characters are 8 bit
32
Two-Byte Codes Chinese Big5
33
Grand Unification Unicode
  • Character encodings are arranged into planes
  • A plane consist of 65,536 (1000016) code points
  • There are 17 planes (0-16) with Plane 0 being the
    Basic Multilingual Plane
  • Texts are encoded in logical order, which is
    more abstract than the presentation order

34
Example Devanagari Code Points
35
Example of Logical Ordering Tamil /hoo/
36
UTF-8
  • Common encoding of Unicode.
  • Variable length depending upon which code points
    one is dealing with
  • Programming languages have libraries that make
    dealing with UTF-8 strings easy.
  • Makes it easy to mix-and-match text from various
    sources
  • ??? , ??????? , ????????????????? , ??????????,
    ???? ????? ???? ????

37
Other Issues Document Encoding
  • Standard Generalized Markup Language (SGML)
  • Hypertext Markup Language (HTML)
  • Text Encoding Initiative (TEI)
  • Extensible Markup Language (XML)
  • Again, programming languages have libraries for
    dealing with XML documents

38
Reading
  • JM chapter 1
  • Coyne Sproat paper about WordsEye (online)
About PowerShow.com