Title: LING 406 Intro to Computational Linguistics Course Overview and Introduction
1LING 406Intro to Computational
LinguisticsCourse Overview and Introduction
- Richard Sproat
- URL http//catarina.ai.uiuc.edu/L406_08/
2This Lecture
- Overview of course
- What is NLP?
- Some NLP problems
- A fun application
- Text and document encoding
3Format of the course
- Lectures
- Homeworks
- Roughly one homework a week
- Some of the homeworks will require a lot of work
- You will need to know how to program
- A midterm and in-class final
- The final may be substituted with a project
4Format of the course
- Grades
- 70 homeworks
- 15 each midterm and final
- There will be one homework that will require a
(brief) in-class presentation
5Final projects
- If you elect this option (and if you do a good
job) you will get extra credit I expect it to be
more work than taking the final. - But you will probably learn more!
- The project can be on any topic related to the
course, e.g. - Implement a parsing algorithm
- Design a morphological analyzer for a non-trivial
amount of morphology for a language - Build a sense-disambiguation system
6Final projects
- If you elect to do a project you must
- Present a proposal to me no later than the end of
the 7th week. - Present a progress report no later than the 11th
week (so that I know you are on track). - Do a brief (15-20 minute) presentation on your
work during the last week. - Turn in, by the end of the 15th week, a short
(5-10 page) writeup describing your work.
7Readings for the course
- Textbooks
- Jurafsky Martin, Speech and Language Processing
(Optional) - Manning Schütze, Foundations of Statistical
Language Processing (Optional) - Roark Sproat, Computational Approaches to
Syntax and Morphology (Online free) - This is a draft if you find significant errors
in the book we will acknowledge you by name - Various assigned readings from online sources
8Prerequisites for this course
- Not many really
- Helpful if you have taken an intro to
linguistics, but not essential - YOU MUST KNOW HOW TO PROGRAM
- You will need access to a computer (duh).
- Some of the tools will require a linux or
linux-like programming environment. - For Windows users I recomment cygwin
(http//www.cygwin.com)
9Goals of Computational Linguistics/ Natural
Language Processing
- To get computers to deal with language the way
humans do - They should be able to understand language and
respond appropriately in language - They should be able to learn human language the
way children do - They should be able to perform linguistic tasks
that skilled humans can do, such as translation - Yeah, right
10A well-worn example
Astronauts Poole (Gary Lockwood) and Bowman (Keir
Dullea) trying to elude the HAL 9000 computer.
11The HAL 9000
- Perfect speech recognition
- Perfect language understanding
- Perfect synthesis. (Heres the current
reality) - Perfect modeling of discourse
- (Vision)
- (World knowledge)
-
- And experts in the 1960s thought this would
all be possible
12Another example
The Gorn uses the Universal Translator in the
Star Trek episode Metamorphosis)
13Are these even reasonable goals?
- These are nice goals but they have more to do
with science fiction than with science fact - Realistically we dont have to go this far to
have stuff that is useful - Spelling correctors, grammar checkers, MT
systems, tools for linguistic analysis, - Limited speech interaction systems
- Early systems like ATTs VRCP (Voice Recognition
Call Processing) - Please say collect, third party or calling card
- More recent examples 777-FILM, United Airlines
flight info
14Goals of this course
- Become familiar with a wide range of areas of NLP
- What are the basic tools techniques?
- What kinds of things is it possible to do?
- Know enough to be able to read research papers in
the field and understand what the issues are - Know enough to be able to evaluate critically
some of the claims one sees in the popular press - Hopefully youll agree that one or more of these
areas is sufficiently interesting that youll
want to pursue it yourself!
15Basic Structure of Course
- First half traditional symbolic methods
- Second half statistical (stochastic) methods,
which may be viewed to some extent as like
traditional methods with probabilities added.
16Some NLP Tasks
- Create a system that can interact intelligently
with a user for a specialized domain - Build a system that can read your email aloud.
- Build a system that can answer questions such as
What is the height of Mount Everest?
17Named Entity Recognition
- Build a system that can find the names in a text
- Israeli Leader Suffers Serious Stroke
- By STEVEN ERLANGER
- JERUSALEM, Thursday, Jan. 5 - Israeli Prime
Minister Ariel Sharon suffered a serious stroke
Wednesday night after being taken to the hospital
from his ranch in the Negev desert, and he
underwent brain surgery early today to stop
cerebral bleeding, a hospital official said. - Mr. Sharon's powers as prime minister were
transferred to Vice Premier Ehud Olmert, said the
cabinet secretary, Yisrael Maimon.
18Name Transliteration
- Handle cross-language transliteration
19Abbreviation Expansion
- Recover the underlying words in cases such as
20Machine Translation
21Machine Translation
22Interpret text into scenes
the very huge fried egg is on the table-vp23846.
the very large american party hat is six inches
above the egg. the chinstrap of the hat is
invisible. the table is on the white tile floor.
the french door is behind the table. the tall
white wall is behind the french door. a white
wooden chair is to the right of the table. it is
facing left. it is sunrise. the impi-61
photograph is on the wall. it is three inches
left of the door. it is three feet above the
ground. the photograph is eighteen inches wide. a
white table-vp23846 is one foot to the right of
the chair. the big white teapot is on the table.
23Interpret Text into Scenes
the humongous silver eye ball is on the shiny
marble ground. the sky is chartreuse.
24Interpret Text into Scenes
the very humongous light stone sphere is fifty
feet above the sea. the stone castle is in the
sphere. the castle is 80 feet wide the sun is
black.
25Interpret Text into Scenes
the glass bowling ball is behind the bowling pin.
the ground is silver. a goldfish is inside the
bowling ball.
26Interpret Text into Scenes
the humongous blue transparent ice cube is on the
silver mountain range. the humongous green
transparent ice cube is next to the blue ice
cube. the humongous red transparent ice cube is
on top of the green ice cube. the humongous
yellow transparent ice cube is to the left of the
green ice cube. the tiny santa claus is inside
the red ice cube. the tiny christmas tree is
inside the blue ice cube. the four tiny reindeer
are inside the green ice cube. the tiny blue
sleigh is inside the yellow ice cube. the small
snowman-vp21048 is three feet in front of the
green ice cube. the sky is pink.
27Interpret Text into Scenes
the donut shop is on the dirty ground. the donut
of the donut shop is silver. a green a tarmac
road is to the right of the donut shop. the road
is 1000 feet long and 50 feet wide. a yellow
volkswagen bus is eight feet to the right of the
donut shop. it is on the road. a restaurant
waiter is in front of the donut shop. a red
volkswagen beetle is eight feet in front of the
volkswagen bus. the taxi is ten feet behind the
volkswagen bus. the convertible is to the left of
the donut shop. it is facing right. the shoulder
of the road has a dirt texture. the grass of the
road has a dirt texture.
28Interpret Text into Scenes
The shiny blue goldfish is on the watery ground.
The shiny red colorful-vp3982 is six inches away
from the shiny blue goldfish. The polka dot
colorful-vp3982 is to the right of the shiny blue
goldfish. The polka dot colorful-vp3982 is five
inches away from the shiny blue goldfish. The
transparent orange colorful-vp3982 is above the
shiny blue goldfish.The striped colorful-vp3982
is one foot away from the transparent orange
colorful-vp3982. The huge silver wall is facing
the shiny blue goldfish. The shiny blue goldfish
is facing the silver wall. The silver wall is
five feet away from the shiny blue goldfish.
29How does the NLP in WordsEye work?
- Statistical part-of-speech tagger
- Simple morphological analyzer
- Statistical parser
- Reference resolution based on a world model
- Semantic hierarchy (similar to WordNet)
30Representation of Text Ascii
Characters are 7 bits
31ISO-8859-x
Characters are 8 bit
32Two-Byte Codes Chinese Big5
33Grand Unification Unicode
- Character encodings are arranged into planes
- A plane consist of 65,536 (1000016) code points
- There are 17 planes (0-16) with Plane 0 being the
Basic Multilingual Plane - Texts are encoded in logical order, which is
more abstract than the presentation order
34Example Devanagari Code Points
35Example of Logical Ordering Tamil /hoo/
36UTF-8
- Common encoding of Unicode.
- Variable length depending upon which code points
one is dealing with - Programming languages have libraries that make
dealing with UTF-8 strings easy. - Makes it easy to mix-and-match text from various
sources - ??? , ??????? , ????????????????? , ??????????,
???? ????? ???? ????
37Other Issues Document Encoding
- Standard Generalized Markup Language (SGML)
- Hypertext Markup Language (HTML)
- Text Encoding Initiative (TEI)
- Extensible Markup Language (XML)
- Again, programming languages have libraries for
dealing with XML documents
38Reading
- JM chapter 1
- Coyne Sproat paper about WordsEye (online)