Nearly-Automated Metadata Hierarchy Creation - PowerPoint PPT Presentation

About This Presentation
Title:

Nearly-Automated Metadata Hierarchy Creation

Description:

Want to assign items labels from multiple hierarchies. Motivation ... spurs; bandana on rider; old time cowboy hat; underchin thong; flying off. ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 27
Provided by: Sar1
Category:

less

Transcript and Presenter's Notes

Title: Nearly-Automated Metadata Hierarchy Creation


1
Nearly-Automated Metadata Hierarchy
Creation
  • Emilia Stoica and Marti HearstSIMSUniversity of
    California, Berkeley

2
Motivation
  • Want to assign items labels from multiple
    hierarchies

3
Motivation
  • Description 19th c. paint horse saddle and
    hackamore spurs bandana on rider old time
    cowboy hat underchin thong flying off.

4
Use in Browsing Interfaces like Flamenco

5
Use in Browsing Interfaces like Flamenco

6
How to Obtain the Hierarchies?
  • Goal
  • Help an information architect get started
  • Currently they do it all by hand!
  • Assume they will do some editing
  • Nearly automated
  • Multiple hierarchies (facets)
  • Automatically assign items to multiple hierarchies

7
Related Work
  • Automated text categorization
  • LOTS of work on this
  • Assumes that a set of categories is already
    created
  • To be intuitive, a categorization should contain
    sets of IS-A relations (hierarchical)
  • Rosenfeld and Morville, (2002)
  • Pratt, Hearst, and Fagan (1999)
  • Current automated approaches contain only
    associative relations

8
Examples ofAssociative Relations
  • Hofmann 1999
  • Collection Machine learning abstracts
  • Top-level categories
  • learn, paper, base, model, new
    train
  • Problem
  • These are not intuitive categories for machine
    learning
  • Sanderson and Croft 1999
  • Collection Medical texts
  • Top level categories
  • disease, post polio, serious disease, dengue,
    infection control, immunology,
  • Problem
  • These are at different levels of generality

9
Examples ofAssociative Relations
  • Schuetze 1993
  • Collection Arts descriptions
  • Sample Groupings
  • carriage cart horse ride walk passing horseback
    wagon men chicken rider
  • bald balding head facing hand faced arm hat
    haired glove long
  • Problem
  • Terms are associated with one another, but are
    not organized into hierarchies that can be
    navigated.

10
Our Approach
  • Leverage the structure of WordNet

Documents
11
1. Select Terms
  • Select well distributed
  • terms from collection

Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
12
2. Get Hypernym Path
  • Get hypernym path for each term

red
blue
13
3. Build Tree
  • Merge hypernym paths to build a tree

Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
red
blue
14
4. Compress Tree
  • Eliminate a parent with fewer than n children
    unless it is the root or its distribution is
    larger than 0.1maxdist

Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
chromatic color
red, redness
blue, blueness
green, greenness
red
blue
green
15
4. Compress Tree (cont.)
  • Eliminate a child whose name appears within
    parents

Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
color
chromatic color
red
blue
green
red
blue
green
16
5. Remove top Levels
  • Top levels of WordNet are too general, e.g.
  • Entity
  • Substance, matter
  • Abstraction

17
Disambiguation
  • Ambiguity in
  • Word senses
  • Paths up the hypernym tree

18
How to Select the Right Senses and Paths?
  • (This part is not in the paper.)
  • Solution Modify the algorithm
  • First build core tree
  • (1) Create paths for words with only one sense
  • (2) Use Domains
  • Wordnet has 212 Domains
  • medicine, mathematics, biology, chemistry,
    linguistics, soccer, etc.
  • Automatically scan the collection to see which
    domains apply
  • The user selects which of the suggested domains
    to use or he may add his own
  • Paths for terms that match the selected domains
    are added to the core tree
  • Then add remaining terms to the core tree.

19
Using Domains
dip glosses Sense 1 A depression in an
otherwise level surface Sense 2 The angle that a
magnet needle makes with horizon Sense 3 Tasty
mixture into which bite-size foods are dipped
dip hypernyms Sense 1
Sense 2 Sense 3
solid
shape, form food gt concave
shape gt space
gt ingredient, fixings gt
depression gt angle
gt flavorer
Given domain food, choose
sense 3
20
Enrich Core Tree
  • For each new term t
  • Q(t) ? 0 // set of candidate
    paths
  • for each path p of t
  • compute the fraction fp(t) of nodes in p that are
    shared with a path in the core tree
  • if (fp(t) gt thresh )
  • Q(t) Q(t) U p
  • if (Q(t) )
  • chose first sense of t
  • else
  • among all ps in Q(t), chose path in core tree
    with most items assigned

21
Enrich Core Tree
entity
entity
substance, matter object

food, nutrient artifact
nutriment instrumentality
dish
device fondue, fondu
conductor
semiconductor
diode
light-emitting diode
(led)
Core tree
Toaster with led indicators
22
Enrich Core Tree
entity
entity
entity entity
substance, matter
object
substance,matter object
food,
nutrient artifact
food, nutrient artifact
nutriment
instrumentality nutriment
instrumentality
dish device
dish
device fondue, fondu
conductor snack food
conductor
semiconductor
chip semiconductor
diode

chip

light-emitting diode (led)



Core tree
Chip (p1)
Chip (p2)
23
Enrich Core Tree
entity
entity
entity entity
substance, matter
object
substance,matter object
food,
nutrient artifact
food, nutrient artifact
nutriment
instrumentality nutriment
instrumentality
dish device
dish
device fondue, fondu
conductor snack food
conductor
semiconductor
chip semiconductor
diode

chip

light-emitting diode (led)



Core tree
Chip (p1)
Chip (p2)
24
Enrich Core Tree (contd)

entity entity

substance,
matter object

food,
nutrient artifact

nutriment
instrumentality
dish
(1699) device

fondue, fondu (40)
conductor

semiconductor (45)

diode

light-emitting diode (led)
Core tree
snack food chip

chip
25
Results on a Recipes/ Kitchen Appliances Data
Set
26
Results on a Recipes/ Kitchen Appliances Data
Set
27
Discussion
  • This is very simple, but works very well
  • Why hasnt this been done before?
  • Because WordNet did not have enough coverage?

28
Conclusions
  • Can nearly-automatically build a set of
    hierarchies by finding IS-A relations between
    terms using WordNet
  • The method has been tested on various domains
  • medicine, mathematics, recipes, news, arts
  • User study in progress
  • Limitations
  • The ontology has to be appropriate for the target
    domain
  • No disambiguation between nouns, verbs, and
    adjectives
Write a Comment
User Comments (0)
About PowerShow.com