Title: Natural Language Processing (highlights)
1Natural Language Processing(highlights)
Fall 2012 Chambers
2Early NLP
- Dave Open the pod bay doors, HAL.
- HAL Im sorry Dave. Im afraid I cant do that.
3Commercial NLP
4NLP is hard. (news headlines)
- Minister Accused Of Having 8 Wives In Jail
- Juvenile Court to Try Shooting Defendant
- Teacher Strikes Idle Kids
- Miners refuse to work after death
- Local High School Dropouts Cut in Half
- Red Tape Holds Up New Bridges
- Clinton Wins on Budget, but More Lies Ahead
- Hospitals Are Sued by 7 Foot Doctors
- Police Crack Found in Man's Buttocks
5NLP needs to adapt.
6NLP needs to adapt.
http//xkcd.com/1083/
7NLP is also a Knowledge Problem
8Language Models
- Language Modeling
- Build probabilities of words and phrases
- Author Detection
- Who wrote this email? (is it spam?)
- Historical analysis, who was the author of this
book? - Intelligence community, who wrote this incendiary
blog?
9Language Models Author ID
It was the year of Our Lord one thousand seven
hundred and seventy-five. Spiritual revelations
were conceded to England at that favoured period,
as at this. Mrs. Southcott had recently attained
her five-and-twentieth blessed birthday.
- Charles Dickens
Mr. Bennet was among the earliest of those who
waited on Mr. Bingley. He had always intended to
visit him, though to the last always assuring his
wife that he should not go and till the evening
after the visit was paid she had no knowledge of
it.
- Jane Austen
Baby, baby, baby oooh Like baby, baby, baby
nooo Like baby, baby, baby oooh I thought you'd
always be mine
- Justin Bieber
10Motivation
- We want to predict something.
- We have some text related to this something.
- something target label Y
- text text features X
- Given X, what is the most probable Y?
11Motivation Author Detection
- Alas the day! take heed of him he stabbed me in
- mine own house, and that most beastly in good
- faith, he cares not what mischief he does. If his
- weapon be out he will foin like any devil he
will - spare neither man, woman, nor child.
X
Charles Dickens, William Shakespeare, Herman
Melville, Jane Austin, Homer, Leo Tolstoy
Y
12N-gram Terminology
- Unigrams single words
- Bigrams pairs of words
- Trigrams three word phrases
- 4-grams, 5-grams, 6-grams, etc.
I saw a lizard yesterday
Unigrams I saw a lizard yesterday lt/sgt
Bigrams ltsgt I I saw saw a a lizard lizard
yesterday yesterday lt/sgt
Trigrams ltsgt ltsgt I ltsgt I saw I saw a saw a
lizard a lizard yesterday lizard yesterday lt/sgt
13Sentiment Analysis
14It's about finding out what people think...
15Online social media sentiment apps
- Several Sentiment Sites
- Twitter sentiment http//twittersentiment.appspot.
com/ - Twends http//twendz.waggeneredstrom.com/
- Twittratr http//twitrratr.com/
16Or was she?
17Twitter for Stock Market Prediction
Hey Jon, Derek in Atlanta is having a bacon and
egg, er, sandwich. Is that good for wheat
futures?
18(No Transcript)
19Sometimes science is hype
- The Bollen paper has since been strongly
questioned by others in the field. - It contained some overuse of statistical
significance tests that could have overestimated
how well sentiment actually aligned with market
movements. - Nobody has been able to recreate their findings.
20Monitor Real-World Events
21Learn a Lexicon
- Find some data that is labeled
- Movie reviews have star ratings
- Manually label data yourself
- Use a noisy label, such as angry on tweets
- Learn a model from the labeled data
- Naïve Bayes Classifier
- MaxEnt Model (you have not yet learned)
- Decision Trees
- etc.
Try it now!
22Track Population Moods
23Information Extraction
http//www.youtube.com/watch?vYLR1byL0U8M
24Current Examples
- Fact extraction about people. Instant
biographies. - Search tom hanks on google
- Never-ending Language Learning
- http//rtw.ml.cmu.edu/rtw/
25Where is the Naval Academy?
- The United States Naval Academy (also known as
USNA, Annapolis, or Navy) is a four-year
coeducational federal service academy located in
Annapolis. - Start your tour at the Armel-Leftwich Visitor
Center of the United States Naval Academy,
Annapolis, Md. - this is a great place to walk around, whether you
are a 1st time or frequent visitor to annapolis.
the academy's campus is situated along the creek,
thus offering beautiful views of the water and
horizons.
P(annapolis sentence) P(annapolis
features/ngrams/etc.)
26Extracting structured knowledge
Each article can contain hundreds or thousands of
items of knowledge...
The Lawrence Livermore National Laboratory
(LLNL) in Livermore, California is a scientific
research laboratory founded by the University of
California in 1952.
LLNL EQ Lawrence Livermore National Laboratory
LLNL LOC-IN California Livermore LOC-IN
California LLNL IS-A scientific research
laboratory LLNL FOUNDED-BY University of
California LLNL FOUNDED-IN 1952
27Sentence Parsing
28Sentence Parsing
- Fed raises interest rates
29Example 2
- I saw the man on the hill with a telescope.
30Words barely affect structure.
telescopes
planets
Incorrect
Correct!!!
31Machine Translation
- Start at 6min in.
- http//www.youtube.com/watch?vNu-nlQqFCKg
32Machine Translation
- Commercial-grade translation
- translate.google.com
33Machine Translation
- How to model translations?
- Words P( casa house )
- Spurious words P( a null )
- Fertility Pn( 1 house )
- English word translates to one Spanish word
- Distortion Pd( 5 2 )
- The 2nd English word maps to the 5th Spanish word
34Distortion
- Encourage translations to follow the diagonal
- P( 4 4 ) P( 5 5 )
35Learning Translations
- Huge corpus of aligned sentences.
- Europarl
- Corpus of European Parliamant proceedings
- The EU is mandated to translate into all 21
official languages - 21 languages, (semi-) aligned to each other
- P( casa house ) (count all casa/house pairs!)
- Pd( 2 5 ) (count all sentences where 2nd word
went to 5th word)
36Machine Translation Technology
- Hand-held devices for military
- Speak english -gt recognition -gt translation -gt
generate Urdu - Translate web documents
- Education technology?
- Doesnt yet receive much of a focus
37Text Influence
38Text Influence
- Can text style influence people?
- Can a computer learn to adapt language to
accomplish a goal?
- Obama 2012 campaign
- Sent emails to people every day asking for
donations - Sent variations of email, and learned what
features caused more donations - http//www.businessweek.com/articles/2012-11-29/th
e-science-behind-those-obama-campaign-e-mails
39Mobile Devices
40Mobile Devices
- Keystroke prediction has been around for a while
now. - New idea learn individual user preferences
- New idea use a users social media text to train
on - http//www.youtube.com/watch?v3hQT-o8ch0o
- http//www.youtube.com/watch?vkA5Horw_SOE