Natural Language Processing (highlights) - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Natural Language Processing (highlights)

Description:

Motivation: Author Detection. Alas the day! take heed of him; he stabbed me in. mine own house, and that most beastly: in good. faith, he cares not what mischief he does. – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 41
Provided by: usn46
Learn more at: http://www.usna.edu
Category:

less

Transcript and Presenter's Notes

Title: Natural Language Processing (highlights)


1
Natural Language Processing(highlights)
Fall 2012 Chambers
2
Early NLP
  • Dave Open the pod bay doors, HAL.
  • HAL Im sorry Dave. Im afraid I cant do that.

3
Commercial NLP
4
NLP is hard. (news headlines)
  • Minister Accused Of Having 8 Wives In Jail
  • Juvenile Court to Try Shooting Defendant
  • Teacher Strikes Idle Kids
  • Miners refuse to work after death
  • Local High School Dropouts Cut in Half
  • Red Tape Holds Up New Bridges
  • Clinton Wins on Budget, but More Lies Ahead
  • Hospitals Are Sued by 7 Foot Doctors
  • Police Crack Found in Man's Buttocks

5
NLP needs to adapt.
6
NLP needs to adapt.
http//xkcd.com/1083/
7
NLP is also a Knowledge Problem
8
Language Models
  • Language Modeling
  • Build probabilities of words and phrases
  • Author Detection
  • Who wrote this email? (is it spam?)
  • Historical analysis, who was the author of this
    book?
  • Intelligence community, who wrote this incendiary
    blog?

9
Language Models Author ID
It was the year of Our Lord one thousand seven
hundred and seventy-five. Spiritual revelations
were conceded to England at that favoured period,
as at this. Mrs. Southcott had recently attained
her five-and-twentieth blessed birthday.
- Charles Dickens
Mr. Bennet was among the earliest of those who
waited on Mr. Bingley. He had always intended to
visit him, though to the last always assuring his
wife that he should not go and till the evening
after the visit was paid she had no knowledge of
it.
- Jane Austen
Baby, baby, baby oooh Like baby, baby, baby
nooo Like baby, baby, baby oooh I thought you'd
always be mine
- Justin Bieber
10
Motivation
  • We want to predict something.
  • We have some text related to this something.
  • something target label Y
  • text text features X
  • Given X, what is the most probable Y?

11
Motivation Author Detection
  • Alas the day! take heed of him he stabbed me in
  • mine own house, and that most beastly in good
  • faith, he cares not what mischief he does. If his
  • weapon be out he will foin like any devil he
    will
  • spare neither man, woman, nor child.

X
Charles Dickens, William Shakespeare, Herman
Melville, Jane Austin, Homer, Leo Tolstoy
Y
12
N-gram Terminology
  • Unigrams single words
  • Bigrams pairs of words
  • Trigrams three word phrases
  • 4-grams, 5-grams, 6-grams, etc.

I saw a lizard yesterday
Unigrams I saw a lizard yesterday lt/sgt
Bigrams ltsgt I I saw saw a a lizard lizard
yesterday yesterday lt/sgt
Trigrams ltsgt ltsgt I ltsgt I saw I saw a saw a
lizard a lizard yesterday lizard yesterday lt/sgt
13
Sentiment Analysis
14
It's about finding out what people think...
15
Online social media sentiment apps
  • Several Sentiment Sites
  • Twitter sentiment http//twittersentiment.appspot.
    com/
  • Twends http//twendz.waggeneredstrom.com/
  • Twittratr http//twitrratr.com/

16
Or was she?
17
Twitter for Stock Market Prediction
Hey Jon, Derek in Atlanta is having a bacon and
egg, er, sandwich. Is that good for wheat
futures?
18
(No Transcript)
19
Sometimes science is hype
  • The Bollen paper has since been strongly
    questioned by others in the field.
  • It contained some overuse of statistical
    significance tests that could have overestimated
    how well sentiment actually aligned with market
    movements.
  • Nobody has been able to recreate their findings.

20
Monitor Real-World Events
21
Learn a Lexicon
  • Find some data that is labeled
  • Movie reviews have star ratings
  • Manually label data yourself
  • Use a noisy label, such as angry on tweets
  • Learn a model from the labeled data
  • Naïve Bayes Classifier
  • MaxEnt Model (you have not yet learned)
  • Decision Trees
  • etc.

Try it now!
22
Track Population Moods
23
Information Extraction
http//www.youtube.com/watch?vYLR1byL0U8M
24
Current Examples
  • Fact extraction about people. Instant
    biographies.
  • Search tom hanks on google
  • Never-ending Language Learning
  • http//rtw.ml.cmu.edu/rtw/

25
Where is the Naval Academy?
  • The United States Naval Academy (also known as
    USNA, Annapolis, or Navy) is a four-year
    coeducational federal service academy located in
    Annapolis.
  • Start your tour at the Armel-Leftwich Visitor
    Center of the United States Naval Academy,
    Annapolis, Md.
  • this is a great place to walk around, whether you
    are a 1st time or frequent visitor to annapolis.
    the academy's campus is situated along the creek,
    thus offering beautiful views of the water and
    horizons.

P(annapolis sentence) P(annapolis
features/ngrams/etc.)
26
Extracting structured knowledge
Each article can contain hundreds or thousands of
items of knowledge...
The Lawrence Livermore National Laboratory
(LLNL) in Livermore, California is a scientific
research laboratory founded by the University of
California in 1952.
LLNL EQ Lawrence Livermore National Laboratory
LLNL LOC-IN California Livermore LOC-IN
California LLNL IS-A scientific research
laboratory LLNL FOUNDED-BY University of
California LLNL FOUNDED-IN 1952
27
Sentence Parsing
28
Sentence Parsing
  • Fed raises interest rates

29
Example 2
  • I saw the man on the hill with a telescope.

30
Words barely affect structure.
telescopes
planets
Incorrect
Correct!!!
31
Machine Translation
  • Start at 6min in.
  • http//www.youtube.com/watch?vNu-nlQqFCKg

32
Machine Translation
  • Commercial-grade translation
  • translate.google.com

33
Machine Translation
  • How to model translations?
  • Words P( casa house )
  • Spurious words P( a null )
  • Fertility Pn( 1 house )
  • English word translates to one Spanish word
  • Distortion Pd( 5 2 )
  • The 2nd English word maps to the 5th Spanish word

34
Distortion
  • Encourage translations to follow the diagonal
  • P( 4 4 ) P( 5 5 )

35
Learning Translations
  • Huge corpus of aligned sentences.
  • Europarl
  • Corpus of European Parliamant proceedings
  • The EU is mandated to translate into all 21
    official languages
  • 21 languages, (semi-) aligned to each other
  • P( casa house ) (count all casa/house pairs!)
  • Pd( 2 5 ) (count all sentences where 2nd word
    went to 5th word)

36
Machine Translation Technology
  • Hand-held devices for military
  • Speak english -gt recognition -gt translation -gt
    generate Urdu
  • Translate web documents
  • Education technology?
  • Doesnt yet receive much of a focus

37
Text Influence
38
Text Influence
  • Can text style influence people?
  • Can a computer learn to adapt language to
    accomplish a goal?
  • Obama 2012 campaign
  • Sent emails to people every day asking for
    donations
  • Sent variations of email, and learned what
    features caused more donations
  • http//www.businessweek.com/articles/2012-11-29/th
    e-science-behind-those-obama-campaign-e-mails

39
Mobile Devices
40
Mobile Devices
  • Keystroke prediction has been around for a while
    now.
  • New idea learn individual user preferences
  • New idea use a users social media text to train
    on
  • http//www.youtube.com/watch?v3hQT-o8ch0o
  • http//www.youtube.com/watch?vkA5Horw_SOE
Write a Comment
User Comments (0)
About PowerShow.com