Supervised Categorization for Habitual versus Episodic Sentences - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Supervised Categorization for Habitual versus Episodic Sentences

Description:

Bears eat blackberries. Characteristic of specific individual ... Randomly selecting sentences from Penn Treebank (WSJ & Brown) ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 28
Provided by: TM72
Category:

less

Transcript and Presenter's Notes

Title: Supervised Categorization for Habitual versus Episodic Sentences


1
Supervised Categorization for Habitual versus
Episodic Sentences
  • Thomas Mathew
  • tam52_at_georgetown.edu
  • Graham Katz
  • egk7_at_georgetown.edu
  • Department of Linguistics
  • Georgetown University

2
Introduction
  • Habitual sentences state general facts
  • Describe properties of a class
  • Bears eat blackberries
  • Characteristic of specific individual
  • Angus Young wears school uniforms on stage
  • Is stative however main verb can be dynamic
  • Episodic sentences report on a finite number of
    specific events
  • Mary ate a steak
  • Angus Young wore a school uniform twice this
    week
  • Why the distinction matters ?
  • Event extraction
  • Document summarization

3
Scope
  • Determine automatically whether a sentence is
    habitual or episodic on the basis of sentence
    internal information
  • John smoked cigarettes when he was young
    habitual
  • John smoked a cigarette this morning
    episodic
  • Note Lexically stative predicates excluded
  • Italians like wine
  • Do not exhibit habitual/specific ambiguity

4
Related Work
  • Sangweon Suh (2006)
  • Distinguish generic from specific NP reference in
    context
  • Cats like tuna
  • A cat ate the tuna
  • Eric Siegel (1995), Michael Brent (1990)
  • Determine whether verb is stative or eventive
  • He called his father
  • He resembles his father
  • On basis of distribution of verbs with overt
    features
  • Siegel (1995) uses co-occurrence frequencies of
    14 features

5
Approach
  • Supervised Classification
  • Built training corpus
  • Selected features for machine learning
  • Evaluated features
  • Applied Machine Learning algorithms

6
Annotation of Corpus
  • Generated set of 1,816 sentences with 72 verb
    types by
  • Randomly selecting sentences from Penn Treebank
    (WSJ Brown)
  • Ignoring sentences with a lexically stative
    predicate
  • Adding all sentences in Penn Treebank whose main
    verb was a morphological variant of a verb from
    initial set

7
Annotation of Corpus
  • Annotated each sentence as habitual/episodic by
  • Checking for explicit attribution
  • Frequency adverbs (usually, often) habitual
  • Quantificational temporals (every night)
    habitual
  • Habitual past (used to) habitual
  • Definite temporals (yesterday) episodic
  • Tested whether sentence meaning changed by adding
    modifier usually
  • No change in meaning indicated habitual
  • Examining discourse context
  • Assumed bunching of categories in a discourse
  • Applying intuitive semantic judgment
  • Single event or habit

8
Data
  • Verbs varied significantly in lexical bias
  • report almost only episodic, require almost only
    habitual
  • Final step
  • Eliminated highly biased
  • lexical verbs
  • Final data set
  • 1,052 sentences
  • 57 verb forms
  • Baseline distribution

9
Features
  • Selected 14 sentence internal features
  • Features that can be derived from annotation
    scheme of Penn Treebank
  • Evaluated features relevance to classification
  • Compare feature distribution by category against
    baseline

10
Tense

Hungarian Radio saves its most politically
outspoken broadcasts for around midnight
habitual Mickie laughed episodic
11
Aspect
Everyone else was running episodic The school
has received letters from parents episodic
12
Temporals
Every time I closed my eyes, I saw gray eyes
rushing at me with a knife habitual On
Tuesday, Trellborgs directors announced plans to
spin off two big divisions as separately quoted
companies on Stockholms stock exchange
episodic
13
Subject Features
Commands go only from an office to the man of
nearest lower rank habitual The women
indicated which family member usually did
household chores episodic
14
Object Features
Not surprisingly, he sometimes bites
habitual In Los Angeles, in our lean years, we
gave parties habitual Robert Bernstein,
chairman and president of Random House Inc.,
announced his resignation from the publishing
house he has run for 23 years episodic
15
Conditionals
After all, gold prices soar when inflation is
high habitual
16
Prepositional Features
Anheuser-Busch announced its plan at the same
time it reported third quarter net income rose a
lower-than-anticipated 5.2 to 238.3 million
episodic Treasury prices ended mixed in light
trading episodic You ve got blood on your
cheek episodic
17
Feature Analysis Summary
  • Reliable features for episodicity
  • Less reliable features for habituality

18
Feature Limitations
  • Problem areas
  • Semantics of predicate arguments
  • She was moving like a ballet dancer
  • She was moving in café society as Lady Diana
    Harrington
  • Semantics of predicate
  • He is meeting a girl from Brooklyn
  • He is seeing a girl from Brooklyn
  • Sentence-external factors (discourse)
  • John rarely ate fruit. He just ate oranges
  • John didnt eat much at breakfast. He just ate
    oranges
  • Sentences with dual-category
  • Too rare to analyze statistically
  • After all, in all five recessions since 1960,
    stocks declined

19
Machine Learning
  • Considered three classifiers
  • Rule-based
  • Association Rule Classifier
  • Decision Tree (J48) Classifier
  • Probabilistic
  • Naïve Bayes
  • Evaluated against baseline where all sentences
    blindly with majority-class (episodic)
  • 73.1 overall precision

20
Association Rule Classifier
  • Applied Predictive Apriori algorithm (Scheffer
    2004) for multivariate analysis
  • Algorithm generates n-best feature patterns
    predicting a category
  • Manually pruned results
  • Only patterns selecting for episodicity gt 85
  • Only patterns selecting for habituality gt 80
  • If R1 Ì R2, discard R2
  • If sorted list R1, R2 .. Rn has same coverage
    as R1, R2 .. Rn1 for category, discard Rn1
  • Model
  • 4 patterns (213) are habitual 173 times
  • 11 patterns (882) are episodic 735 times

21
Association Rule based Classifier
22
Decision Tree (J48) Classifier
  • Wekas implementation of C4.5
  • Used ten-fold cross validation for evaluation
  • Model
  • 2 patterns (184) are habitual 161 times
  • 2 patterns (829) are episodic 727 times

23
Decision Tree (J48) Classifier
  • Impact of feature groups (J48)
  • All select roughly the same number of episodic
    sentences
  • Variation is more on habitual/incorrect sentences

24
Results
  • Classifier Performance
  • 1 Not evaluated using an independent validation
    set
  • Habituality Recall
  • Tense and presence of a quantificational temporal
    are best indicators of habituality
  • However both do not provide sufficient coverage
    of habitual examples by themselves

25
Conclusion
  • Syntactic features is a viable method for
    category disambiguation
  • Identification of episodic sentences outperforms
    identification of habitual sentences
  • There are more overt markers of habituality
    however more features show bias for episodicity
  • Performance
  • Impact of lexical verb and sentence external
    features
  • Feature extraction process in some cases
    approximation
  • Annotation errors/consistency in corpus

26
Future Work
  • Impact of discourse
  • Independently annotate sentence, predecessor,
    successor in isolated context
  • Weighting factor for ambiguous situations
  • Annotate sentence, predecessor, successor
    conscious of context

27
? Questions ?
Write a Comment
User Comments (0)
About PowerShow.com