Bootstrapping Toponym Classifiers - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Bootstrapping Toponym Classifiers

Description:

A sideshow juggler in Virginia Beach, Va., and a roller-coaster operator at ... the river and the shortest and best wagon-roads, both south and north of the ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 16
Provided by: researchM2
Category:

less

Transcript and Presenter's Notes

Title: Bootstrapping Toponym Classifiers


1
Bootstrapping Toponym Classifiers
  • David Smith and Gideon Mann
  • Center for Language and Speech Processing
  • Johns Hopkins University

2
Exploiting Labeled Text
  • Writers often insert explicit disambiguating
    labels after place names
  • A sideshow juggler in Virginia Beach, Va., and a
    roller-coaster operator at Hershey Park in
    Hershey, Pa., have also been victims of laser
    harassment.
  • Texts with these labels provide training and
    testing examples for geographic name
    disambiguation

3
Classifier Construction
  • Some lexical cues help predict a particular place

NASHVILLE , Tenn. - The home of country music is
singing the blues after the sale of its last
locally owned music publishing company to CBS
Records. Tree International Publishing, ranked as
Billboard magazine's No. 1 country music
publisher for the last 16 years, is being sold to
New York-based CBS for a reported 45 million to
50 million, The Tennessean reported today.
4
Classifier Construction
  • Other contexts provide cues as to general area,
    e.g. state

GRANTS PASS, Ore. - ... As more and more federal
lands are set aside for spotted owls and other
types of wildlife and recreation areas, the land
available for perpetual commercial timber
management decreases...
PORTLAND, Ore. - Environmentalists trying to
protect the northern spotted owl cheered a
federal judge's decision halting logging on five
timber tracts...
5
Classifier Construction
  • Other disambiguation cues are more tenuous

Researchers at Harvard Medical School, and
Canada's London Regional Cancer in London,
Ontario correlated genetic changes found in these
tumors with their sensitivity to
chemotherapy. He accomplished that feat when he
knocked down two Luftwaffe fighters near
Brunswick, Germany, on May 8, 1944, on his final
mission. A sideshow juggler in Virginia Beach,
Va., and a roller-coaster operator at Hershey
Park in Hershey, Pa., have also been victims of
laser harassment.
6
How Does This Help?
  • Readily available, growing corpus
  • Easy to apply WSD methods
  • Easy to adapt to many genres
  • The crucial questions
  • Realism Do contexts with labels well represent
    contexts without labels?
  • Generality Does the method perform well in
    different genres?

7
Annotated Text
  • Hand tag all toponyms to test coverage

All supplies for Rosecrans had to be brought from
ltname key"tgn,7013959" reg"Nashville-Davidson
(inhabited place), Davidson, Tennessee, United
States, North and Central America"
type"place"gtNashvillelt/namegt. The railroad
between this base and the army was in possession
of the government up to ltnamegtBridgeportlt/namegt,
the point at which the road crosses to the south
side of the ltnamegtTennessee Riverlt/namegt but
Bragg, holding Lookout and ltnamegtRaccoon
mountainslt/namegt west of ltnamegtChattanoogalt/namegt,
commanded the railroad, the river and the
shortest and best wagon-roads, both south and
north of the ltnamegtTennessee lt/namegt, between
ltnamegtChattanoogalt/namegt and ltnamegtBridgeportlt/nam
egt.
8
Corpus Statistics
9
How Hard is the Task?
  • Ambiguitys greatest hits
  • 41 Oxfords
  • 73 Springfields
  • 91 Washingtons
  • 97 Georgetowns
  • Anything with Creek
  • But not everywhere is this hard

10
How Hard is the Task?
11
How Hard is the Task?
12
Experimental Setup
  • Use labeled toponyms for training contexts
  • Strip stopwords and build naĂŻve Bayes classifiers
  • In untagged text, test on labeled toponyms
  • In tagged text, test on all toponyms
  • Back-off to state or country models
  • N.B. Baseline is most frequent referent for
    toponym

13
Evaluation
14
Discussion
  • Improvements from naĂŻve Bayes are small
  • 1.2 relative improvements on untagged data
  • But unseen toponyms, using backoff, could improve
  • Degraded performance on tagged data
  • Lack of many context clues
  • Broader range of toponym references
  • Strong locality effects not modeled (cf. Smith
    Crane 2001)

15
Future Work
  • Apply other methods from WSD
  • Incorporate proximity along with hierarchical
    information
  • Hand tag data to confirm conclusions in news
    genre
  • Disambiguation for extending gazetteers, personal
    name disambiguation, and event detection
Write a Comment
User Comments (0)
About PowerShow.com