A COMPARISON OF HANDCRAFTED SEMANTIC GRAMMARS VERSUS STATISTICAL NATURAL LANGUAGE PARSING IN DOMAINS - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

A COMPARISON OF HANDCRAFTED SEMANTIC GRAMMARS VERSUS STATISTICAL NATURAL LANGUAGE PARSING IN DOMAINS

Description:

They are leaving in about fifteen minuets to go to her house. ... Hopefully, all with continue smoothly in my absence. Can they lave him my messages? ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 41
Provided by: CDSE
Learn more at: http://people.uncw.edu
Category:

less

Transcript and Presenter's Notes

Title: A COMPARISON OF HANDCRAFTED SEMANTIC GRAMMARS VERSUS STATISTICAL NATURAL LANGUAGE PARSING IN DOMAINS


1
A COMPARISON OF HAND-CRAFTED SEMANTIC GRAMMARS
VERSUS STATISTICAL NATURAL LANGUAGE PARSING IN
DOMAIN-SPECIFIC VOICE TRANSCRIPTION
  • Curry Guinn
  • Dave Crist
  • Haley Werth

2
Outline
  • Probabilistic language models
  • N-grams
  • The EPA project
  • Experiments

3
Probabilistic Language Processing What is it?
  • Assume a note is given to a bank teller, which
    the teller reads as I have a gub. (cf. Woody
    Allen)
  • NLP to the rescue .
  • gub is not a word
  • gun, gum, Gus, and gull are words, but gun has a
    higher probability in the context of a bank

4
Real Word Spelling Errors
  • They are leaving in about fifteen minuets to go
    to her house.
  • The study was conducted mainly be John Black.
  • Hopefully, all with continue smoothly in my
    absence.
  • Can they lave him my messages?
  • I need to notified the bank of.
  • He is trying to fine out.

5
Letter-based Language Models
  • Shannons Game
  • Guess the next letter

6
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • W

7
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • Wh

8
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • Wha

9
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What

10
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What d

11
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What do

12
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What do you think the next letter is?

13
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What do you think the next letter is?
  • Guess the next word

14
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What do you think the next letter is?
  • Guess the next word
  • What

15
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What do you think the next letter is?
  • Guess the next word
  • What do

16
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What do you think the next letter is?
  • Guess the next word
  • What do you

17
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What do you think the next letter is?
  • Guess the next word
  • What do you think

18
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What do you think the next letter is?
  • Guess the next word
  • What do you think the

19
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What do you think the next letter is?
  • Guess the next word
  • What do you think the next

20
Letter-based Language Models
  • Shannons Game
  • Guess the next letter
  • What do you think the next letter is?
  • Guess the next word
  • What do you think the next word is?

21
Word-based Language Models
  • A model that enables one to compute the
    probability, or likelihood, of a sentence S,
    P(S).
  • Simple Every word follows every other word w/
    equal probability (0-gram)
  • Assume V is the size of the vocabulary V
  • Likelihood of sentence S of length n is 1/V
    1/V 1/V
  • If English has 100,000 words, probability of
    each next word is 1/100000 .00001

22
Word Prediction Simple vs. Smart
  • Smarter probability of each next word is related
    to word frequency (unigram)
  • Likelihood of sentence S P(w1) P(w2)
    P(wn)
  • Assumes probability of each word is independent
    of probabilities of other words.
  • Even smarter Look at probability given previous
    words (N-gram)
  • Likelihood of sentence S P(w1) P(w2w1)
    P(wnwn-1)
  • Assumes probability of each word is dependent
    on probabilities of other words.

23
Training and Testing
  • Probabilities come from a training corpus, which
    is used to design the model.
  • Overly narrow corpus probabilities don't
    generalize
  • Overly general corpus probabilities don't
    reflect task or domain
  • A separate test corpus is used to evaluate the
    model, typically using standard metrics
  • Held out test set

24
Simple N-Grams
  • An N-gram model uses the previous N-1 words to
    predict the next one
  • P(wn wn-N1 wn-N2 wn-1 )
  • unigrams P(dog)
  • bigrams P(dog big)
  • trigrams P(dog the big)
  • quadrigrams P(dog chasing the big)

25
The EPA task
  • Detailed diary of a single individuals daily
    activity and location
  • Methods of collecting the data
  • External Observer
  • Camera
  • Self-reporting
  • Paper diary
  • Handheld menu-driven diary
  • Spoken diary

26
Spoken Diary
  • From an utterance like I am in the kitchen
    cooking spaghetti, map that utterance into
  • Activity(cooking)
  • Location(kitchen)
  • Text abstraction
  • Technique
  • Build a grammar
  • Example

27
Sample Semantic Grammar
  • ACTIVITY_LOCATION -gt ACTIVITY' LOCATION'
    CHAD(ACTIVITY',LOCATION') .
  • ACTIVITY_LOCATION -gt LOCATION' ACTIVITY'
    CHAD(ACTIVITY',LOCATION') .
  • ACTIVITY_LOCATION -gt ACTIVITY' CHAD(ACTIVITY',
    null) .
  • ACTIVITY_LOCATION -gt LOCATION'
    CHAD(null,LOCATION') .
  • LOCATION -gt IAM LOCx' LOCx' .
  • LOCATION -gt LOCx' LOCx' .
  • IAM -gt IAM1 .
  • IAM -gt IAM1 just .
  • IAM -gt IAM1 going to .
  • IAM -gt IAM1 getting ready to .
  • IAM -gt IAM1 still .
  • LOC2 -gt HOUSE_LOC' HOUSE_LOC' .
  • LOC2 -gt OUTSIDE_LOC' OUTSIDE_LOC' .
  • LOC2 -gt WORK_LOC' WORK_LOC' .
  • LOC2 -gt OTHER_LOC' OTHER_LOC' .
  • HOUSE_LOC -gt kitchen kitchen_code .
  • HOUSE_LOC -gt bedroom bedroom_code .
  • HOUSE_LOC -gt living room living_room_code .
  • HOUSE_LOC -gt house house_code .

28
Statistical Natural Language Parsing
  • Use unigram, bigram and trigram probabilities
  • Use Bayes rule to obtain these probabilities
    P(AB) P(BA) P(A)/ P(B)
  • The formula P(kitchen30121 Kitchen) is
    computed by determining the percentage of times
    the word kitchen appears in diary entries that
    have been transcribed in the category 30121
    Kitchen.
  • P(30121 Kitchen) is the probability that a diary
    entry is of the semantic category 30121 Kitchen.
  • P(kitchen) is the probability that kitchen
    appears in any diary entry.
  • Bayes rule can be extended to take into account
    each word in the input string.

29
The Experiment
  • Digital Voice Recorder Heart Rate Monitor
  • Heart rate monitor will beep if the rate changes
    by more than 15 beats per minute between
    measurements (every 2 minutes)

30
Subjects
31
Recordings Per Day
32
Heart Rate Change Indicator Tones and Subject
Compliance
33
Per Word Speech Recognition
34
Semantic Grammar Location/Activity Encoding
Precision and Recall
35
Word Recognition Accuracys Effect on Semantic
Grammar Precision and Recall
36
Statistical Processing Accuracy
37
Word Recognition Affects Statistical Semantic
Categorization
38
Per Word Recognition Rate Versus Statistical
Semantic Encoding Accuracy
39
Time, Activity, Location, Exertion Data Gathering
Platform
40
Research Topics
  • Currently, guesses for the current activity and
    location are computed independently of each other
  • They are not independent!
  • Currently, guesses are based on the current
    utterance.
  • However, the current activity/location is not
    independent from previous activity/locations.
  • How do we fuse data from other sources (gps,
    beacons, heart rate monitor, etc.)?
Write a Comment
User Comments (0)
About PowerShow.com