Introduction%20to%20Natural%20Language%20Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction%20to%20Natural%20Language%20Processing

Description:

Introduction to Natural Language Processing ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign * – PowerPoint PPT presentation

Number of Views:126
Avg rating:3.0/5.0
Slides: 22
Provided by: alex272
Learn more at: http://times.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction%20to%20Natural%20Language%20Processing


1
Introduction to Natural Language Processing
  • ChengXiang Zhai
  • Department of Computer Science
  • University of Illinois, Urbana-Champaign

2
Lecture Plan
  • What is NLP?
  • A brief history of NLP
  • The current state of the art
  • NLP and text information systems

3
What is NLP?
??? ???? ??????
Thai
How can a computer make sense out of this string?
- What are the basic units of meaning (words)? -
What is the meaning of each word? - How are
words related with each other? - What is the
combined meaning of words? - What is the
meta-meaning? (speech act) - Handling a large
chunk of text - Making sense of everything
Morphology
Syntax
Semantics
Pragmatics
Discourse
Inference
4
An Example of NLP
A dog is chasing a boy on the
playground
Lexical analysis (part-of-speech tagging)
Syntactic analysis (Parsing)
5
If we can do this for all the sentences, then
  • BAD NEWS
  • Unfortunately, we cant.
  • General NLP AI-Complete

6
NLP is Difficult!!
  • Natural language is designed to make human
    communication efficient. As a result,
  • we omit a lot of common sense knowledge, which
    we assume the hearer/reader possesses
  • we keep a lot of ambiguities, which we assume the
    hearer/reader knows how to resolve
  • This makes EVERY step in NLP hard
  • Ambiguity is a killer!
  • Common sense reasoning is pre-required

7
Examples of Challenges
  • Word-level ambiguity E.g.,
  • design can be a noun or a verb (Ambiguous POS)
  • root has multiple meanings (Ambiguous sense)
  • Syntactic ambiguity E.g.,
  • natural language processing (Modification)
  • A man saw a boy with a telescope. (PP
    Attachment)
  • Anaphora resolution John persuaded Bill to buy
    a TV for himself. (himself John or Bill?)
  • Presupposition He has quit smoking. implies
    that he smoked before.

8
Despite all the challenges, research in NLP has
also made a lot of progress
9
High-level History of NLP
  • Early enthusiasm (1950s) Machine Translation
  • Too ambitious
  • Bar-Hillel report (1960) concluded that
    fully-automatic high-quality translation could
    not be accomplished without knowledge
    (Dictionary Encyclopedia)
  • Less ambitious applications (late 1960s early
    1970s) Limited success, failed to scale up
  • Speech recognition
  • Dialogue (Eliza)
  • Inference and domain knowledge (SHRDLUblock
    world)
  • Real world evaluation (late 1970s now)
  • Story understanding (late 1970s early 1980s)
  • Large scale evaluation of speech recognition,
    text retrieval, information extraction (1980
    now)
  • Statistical approaches enjoy more success (first
    in speech recognition retrieval, later others)
  • Current trend
  • Heavy use of machine learning techniques
  • Boundary between statistical and symbolic
    approaches is disappearing.
  • We need to use all the available knowledge
  • Application-driven NLP research (bioinformatics,
    Web, Question answering)

Deep understanding in limited domain
Shallow understanding
Knowledge representation
Robust component techniques
Stat. language models
Learning-based NLP Applications
10
The State of the Art
A dog is chasing a boy on the
playground
POS Tagging 97
Det
Noun
Aux
Verb
Det
Noun
Prep
Det
Noun
Noun Phrase
Noun Phrase
Noun Phrase
Complex Verb
Prep Phrase
Verb Phrase
Parsing partial gt90(?)
Semantics some aspects - Entity/relation
extraction - Word sense disambiguation - Anaphora
resolution
Verb Phrase
Sentence
Speech act analysis ???
Inference ???
11
Technique Showcase POS Tagging
Training data (Annotated text)
This sentence serves as an
example of Det N V1
P Det N P
annotated text V2 N
POS Tagger
12
Technique Showcase Parsing
S
S? NP VP NP ? Det BNP NP ? BNP NP? NP PP BNP?
N VP ? V VP ? Aux V NP VP ? VP PP PP ? P NP V ?
chasing Aux? is N ? dog N ? boy N?
playground Det? the Det? a P ? on
NP
VP
Det
BNP
VP
PP
Grammar
A
N
Aux
V
NP
P
NP
on
chasing
dog
is
a boy
the playground
Generate
S
Choose a tree with highest prob.
NP
VP
NP
Det
BNP
Aux
V
Lexicon
PP
A
NP
N
is
chasing
NP
P
dog
a boy
on
the playground
Can also be treated as a classification/decision
problem
13
Semantic Analysis Techniques
  • Only successful for VERY limited domain or for
    SOME aspect of semantics
  • E.g.,
  • Entity extraction (e.g., recognizing a persons
    name) Use rules and/or machine learning
  • Word sense disambiguation addressed as a
    classification problem with supervised learning
  • Sentiment tagging
  • Anaphora resolution

In general, exploiting machine learning and
statistical language models
14
What We Cant Do
  • 100 POS tagging
  • He turned off the highway. vs He turned off
    the fan.
  • General complete parsing
  • A man saw a boy with a telescope.
  • Precise deep semantic analysis
  • Will we ever be able to precisely define the
    meaning of own in John owns a restaurant.?

Robust general NLP tends to be
shallow, while deep understanding doesnt
scale up
15
Major NLP Applications
  • Speech recognition e.g., Auto telephone call
    routing
  • Text management and analysis
  • Text retrieval/filtering
  • Text classification
  • Text summarization
  • Text mining
  • Query answering
  • Language tutoring
  • Spelling/grammar correction
  • Machine translation
  • Cross-language retrieval
  • Restricted natural language
  • Natural language user interface

16
NLP Text Information Systems
  • Better NLP gt Better Text Information Systems
  • Bad NLP gt Bad Text Information Systems?

Robust shallow NLP tends to be more useful than
fragile deep NLP. Errors in NLP can hurt a text
information system
17
How Much NLP is Really Needed?
Easier More workarounds
Tasks
Dependency on NLP
Classification/
Retrieval
Summarization/
Extraction/
Topic Mining
Translation/ Dialogue
Question Answering
18
Workaround Example I ELIZA
ELIZA plays the role of a therapist
Person Men are all alike. ELIZA In what
way? Person They're always bugging us about
something or other. ELIZA Can you think of a
specific example? Person Well, my boyfriend
made me come here. ELIZA Your boyfriend made
you come here? Person He says I'm depressed
much of the time. ELIZA I am sorry to hear that
you are depressed. Person It's true. I am
unhappy. ELIZA Do you think coming here will
help you not to be unhappy? Person I need some
help, that much seems certain. ELIZA What would
it mean to you if you got some help? Person
Perhaps I could learn to get along with my
mother. ELIZA Tell me more about your family
No NLP, but useful. Perhaps we should call this
NLP? Statistical NLP often has a similar flavor
with SOFT rules LEARNED from data
19
Workaround Example II Statistical Translation
  • Learn how to translate Chinese to English from
    many example translations
  • Intuitions
  • If we have seen all possible translations, then
    we simply lookup
  • If we have seen a similar translation, then we
    can adapt
  • If we havent seen any example thats similar,
    we try to generalize what weve seen

20
So, what NLP techniques are most useful for text
information systems?
  • Statistical NLP in general, and
  • statistical language models in particular
  • The need for high robustness and efficiency
    implies the dominant use of
  • simple models (i.e., unigram models)

21
What You Should Know
  • NLP is the foundation of text information systems
  • Better NLP enables better text management
  • Better NLP is necessary for sophisticated tasks
  • But
  • Bad NLP doesnt mean bad text information systems
  • There are often workarounds for a task
  • Inaccurate NLP can hurt the performance of a task
  • The most effective NLP techniques are often
    statistical with the help of linguistic knowledge
  • The challenge is to bridge the gap between
    imperfect NLP and useful application functions
Write a Comment
User Comments (0)
About PowerShow.com