University of Sheffield CIIR, University of Massachusetts - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

University of Sheffield CIIR, University of Massachusetts

Description:

Sanderson, University of Sheffield. Poliomyelitis and Post-Polio. TREC topic 302 ... Test properties of hierarchy. Does it mimic (in some way) Yahoo-like categories? ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 42
Provided by: jeanz
Category:

less

Transcript and Presenter's Notes

Title: University of Sheffield CIIR, University of Massachusetts


1
University of SheffieldCIIR, University of
Massachusetts
Deriving concept hierarchies from text Mark
Sanderson, Bruce Croft
2
The question is...
  • What paper already presented at this SIGIR is
    most like the one youre about to see?
  • Well have the answer, right after this!

3
Concept hierarchies from documents?
  • Hierarchy ofconcepts, Yahoo
  • General down to specific
  • Child under one or more parents
  • No training data
  • Why?
  • Understandable

4
Current methods
  • Polythetic clustering

5
An alternative?
  • Monothetic clustering
  • Clusters based on a single features
  • More Yahoo/Dewey decimal like?
  • Easier to understand?
  • Preferable to users?
  • What about hierarchies of clusters?

6
How to arrange cluster terms?
  • Existing techniques
  • WordNet
  • earthquake, volcano (eruption?)
  • Key phrases (Hearst 1998)
  • such as, especially
  • Phrase classification (Grefenstette 1997)
  • NP head or modifier types of research from
    research things
  • Hierarchical phrase analysis (Woods 1997)
  • Head modifier again, car washing under
    washing, not car

7
WordNet (aside)
  • 1 sense of earthquake, sense 1
  • earthquake, quake, temblor, seism -- (shaking and
    vibration at the surface of the earth resulting
    from underground movement along a fault plane of
    from volcanic activity)
  • geological phenomenon -- (a natural phenomenon
    involving the structure or composition of the
    earth)
  • natural phenomenon, nature -- (all non-artificial
    phenomena)
  • phenomenon -- (any state or process known through
    the senses rather than by intuition or reasoning)

8
WordNet (aside)
  • 5 senses of eruption, sense 1
  • volcanic eruption, eruption -- (the sudden
    occurrence of a violent discharge of steam and
    volcanic material)
  • discharge -- (the sudden giving off of energy)
  • happening, occurrence, natural event -- (an event
    that happens)
  • event -- (something that happens at a given place
    and time)

9
Start with something simpler?
  • Term clustering?
  • simple monothetic clusters
  • No ordering.

10
Use subsumption
  • Initially using subsumption.
  • Finds related terms
  • Decides which is more general, which is more
    specific (idf?)
  • Strict interpretation
  • X s Y iff P(xy) 1, P(yx) lt 1
  • In practice
  • X s Y iff P(xy) gt 0.8, P(yx) lt 1
  • P(xy) gt 0.8, P(yx) lt P(xy)

11
How to build a hierarchy
  • X s Y
  • X s Z
  • X s M
  • X s N
  • Y s Z
  • A s B
  • A s Z
  • B s Z

X
A
Y
M
N
B
Z
really its a DAG
12
How to display it?
  • DAGs were big
  • Unlikely to get all on screen
  • Only want to see current focus plus route to
    taken there?
  • Use a method users are familiar with
  • Hierarchical menus

X
A
Y
M
N
B
Z
Z
13
What about ambiguity?
  • Monothetic clusters of ambiguous terms?
  • Derive hierarchy from retrieved documents
  • Take a query and retrieve on it,
  • take top 500 documents,
  • build hierarchy from them.
  • Topics/concepts are words/phrases taken from
  • Query
  • Retrieved documents
  • Comparison of frequencies

14
Poliomyelitis and Post-PolioTREC topic 302
15
Poliomyelitis and Post-PolioTREC topic 302
16
Poliomyelitis and Post-PolioTREC topic 302
17
Poliomyelitis and Post-PolioTREC topic 302
18
Poliomyelitis and Post-PolioTREC topic 302
19
Poliomyelitis and Post-PolioTREC topic 302
20
Poliomyelitis and Post-PolioTREC topic 302
21
Poliomyelitis and Post-PolioTREC topic 302
22
Poliomyelitis and Post-PolioTREC topic 302
23
Poliomyelitis and Post-PolioTREC topic 302
24
Poliomyelitis and Post-PolioTREC topic 302
25
Poliomyelitis and Post-PolioTREC topic 302
26
Poliomyelitis and Post-PolioTREC topic 302
27
Poliomyelitis and Post-PolioTREC topic 302
28
Poliomyelitis and Post-PolioTREC topic 302
29
Poliomyelitis and Post-PolioTREC topic 302
30
Poliomyelitis and Post-PolioTREC topic 302
31
Poliomyelitis and Post-PolioTREC topic 302
32
Poliomyelitis and Post-PolioTREC topic 302
33
Poliomyelitis and Post-PolioTREC topic 302
34
Poliomyelitis and Post-PolioTREC topic 302
35
Did you guess the paper?
  • Bit like Peter Anicks work?

36
Experiment
  • Test properties of hierarchy
  • Does it mimic (in some way) Yahoo-like
    categories?
  • Parent related to child?
  • Parent more general than child?

37
Experimental set-up
  • Gathered eight subjects
  • Presented subsumption categories and random
    categories.
  • Ask if parent child pair are interesting.
  • If yes, then what type is relationship, (roughly)
    from WordNet
  • Aspect of
  • Type of
  • Same as
  • Opposite of
  • Dont know

38
Results
  • Question of parent/child pairing interesting or
    not
  • Random, 51
  • Subsumption, 67
  • Difference significant from t-test, plt0.002
  • If interesting, what is parent/child type?

Odd?
39
Yahoo categories?
40
Results and conclusions
  • Interesting AND (aspect of OR type of)
  • Random, 28 (51 (47 8))
  • Subsumption, 48 (67 (49 23))
  • Appears that subsumption and an ordering based on
    document frequency does a reasonable job.
  • Term frequency work see.
  • Sparck Jones, K. (1972) A statistical
    interpretation of term specificity and its
    application in retrieval, in Journal of
    Documentation, 28(1) 11-21
  • Caraballo, S.A., Charniak, E. (1999) Determining
    the specificity of nouns from text, in
    Proceedings of the Conference on Empirical
    Methods in Natural Language Processing (EMNLP)

41
Future work?
  • More user studies.
  • Incorporate other term relationship techniques
  • Other visualisations
  • Application of techniques to whole document
    collections.
  • Presentation of Cross Language IR results?
Write a Comment
User Comments (0)
About PowerShow.com