Compact Encodings for All Local Path Information in Web Taxonomies with Application to WordNet - PowerPoint PPT Presentation

About This Presentation
Title:

Compact Encodings for All Local Path Information in Web Taxonomies with Application to WordNet

Description:

See paper for table of statistical values for Wordnet, ODP, and Math taxonomies. Our Approach ... node label pairs of non-tree edges. EGDL Labeling - Example. ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 21
Provided by: univer58
Category:

less

Transcript and Presenter's Notes

Title: Compact Encodings for All Local Path Information in Web Taxonomies with Application to WordNet


1
Compact Encodings for All Local Path Information
in Web Taxonomies with Application to WordNet
  • Svetlana Strunja-Yoshikawa
  • Joint with Fred Annexstein and
  • Kenneth Berman
  • strunjs,annexste,berman_at_ececs.uc.edu
  • University of Cincinnati

2
Introduction
  • Consider Lowest Common Ancestor Query Problem
  • Find most specific common generalization or
    least common subsumer among 2 or more terms or
    attributes in a large hierarchical/classification
    data sets
  • Constraint Evaluate queries without indirection
  • Goal Compact labeling schemes for taxonomies

3
Introduction (contd)
  • Applications
  • Fast classification of sets and similarity, e.g.
    prediction sets similar to Google Sets (given
    Bush" and Clinton it predicts all other US
    presidents)
  • Fast answers to ancestor queries in XML search,
    e.g., test if 2 terms share a parent node without
    loading XML file (see1,2)
  • Fast navigation through voluminous web taxonomies
    (see 3)

4
Data Model
  • Structural properties found in well-known web
    taxonomies
  • large variance out-degree(?), i.e., some nodes
    have many subclasses
  • small in-degree (d) range and variance
  • small depth (s) (logarithmic)
  • small number (gt1) of paths from root
  • See paper for table of statistical values for
    Wordnet, ODP, and Math taxonomies

5
Our Approach
  • Given large, rooted web taxonomies represented
    abstractly as Directed Acyclic Graph or DAG with
    above statistics
  • Problem Label each node of the DAG so that all
    local path information for each taxonomy element
    is preserved in the encoding
  • Our labeling scheme is a variable-length,
    prefix-based scheme, and built up in two stages

6
Our Approach (contd)
  • 1.Greedy Dewey Labeling for Trees
  • (TGDL)
  • -Identifies a Breadth-First tree T in a DAG
  • -Encodes path information for the paths in T
  • -Label nodes with concatenation of edge labels

7
GDL example
8
TGDL example
9
Analysis of the Length for TGDL Labels
  • Performed in 2 steps
  • First step assume that delimiting labels are
    empty -- each node v labeled with
    bits at most
  • Second step Using different edge delimiting
    schemes estimated upper bound of node labels

10
Delimiting schemes
  • They encode length of each tree-edge label
  • Two approaches tested
  • Unary Length Encoding
  • Fixed Binary Length Encoding

11
Unary Length Encoding (ULE)
  • Comparable to Elias Gamma Code
  •     Gamma             ULE    1    
    1              10        2    
    010            113     011            
    0100 4     00100          01015    
    00101          01106     00110         
    01117     00111       0010008    
    0001000        001001
  • ULE assigns e-1 bits long zero prefix to an
    edge label e with GDL label of the length e

12
Unary Length Encoding (ULE) Analysis
  • Theorem
  • Upper bound on TGDL label length with
  • ULE of delimiters is
  • bits, for an arbitrary node v in a tree T
  • - is the depth of v in T
  • - n is number of nodes in T

13
Fixed Binary Length Encoding (FBLE)
  • For an edge e, this encoding is the binary
    representation of the length for GDL(e)
  • Encoded with a fixed number of bits
  • - is the maximum node out-degree in T
  • - uses 4 bits in our application

14
FBLE example
  • - 4 bits will encode delimiters for any T with
    maximum out-degree lt 216
  • - Let e is an edge in T with a given GDL
  • label, e.g. GDL(e)0000111111
  • Then FBLE produces delimiter 1010,
  • so label for e is 10100000111111

15
Fixed Binary Length Encoding (FBLE) Analysis
  • Upper bound on TGDL label length with FBLE of
    delimiters is
  • bits, for an arbitrary node v in a tree T

16
Our Approach (contd2)
  • 2.Extended Greedy Dewey Labeling for DAGs (EGDL)
  • -Augment codes generated from step 1
  • -Used for inferring paths not part of the
    Breadth-First tree
  • -Adds TGDL node label pairs of non-tree edges

17
EGDL Labeling - Example
.01.0.01 .01.0.0
.0.01.0.01
18
Experimental Results for Wordnet taxonomy (n
80K)
19
Experimental Results-Label Lengths
Encoding Length
Wordnet 2.1 Statistics
20
References
  • 1 Budanitsky, A., Hirst, G. Semantic distance
    in WordNet An
  • experimental, application-oriented evaluation of
    five
  • measures. Workshop on WordNet and Other Lexical
    Resources,
  • Second meeting of the North American Chapter of
    the Association for
  • Computational Linguistics, Pittsburgh,PA, 2001.
  • 2 Resnik, F. Using Information Content to
    Evaluate Semantic
  • Similarity in a Taxonomy. In Proceedings of the
    14th International
  • Joint Conference on Artificial Intelligence
    (IJCAI), pages 448453,
  • 1995.
  • 3 Christophides, V., Plexousakis, D. On
    Labeling Schemes for the Semantic Web. In
  • Proceedings of the 12th international conference
    on World Wide Web, pages 544555,
  • Budapest, Hungary.
  • 4 Abiteboul., S., Kaplan, H., Milo, T. Compact
    labeling schemes for ancestor
  • queries. In Proceedings of the twelfth annual
    ACM-SIAM symposium on
  • Discrete algorithms, pages 547556, Washington,
    D.C., 2001.
  • 5 Strunjas-Yoshikawa, S., Annexstein, F.,
    Berman, K. Compact Encodings for All Local
  • Path Information in Web Taxonomies with
    applications to WordNet . In Proceedings of the
  • 32nd International Conference on Current
    Trends in Theory and Practice of Computer
  • Science, Merin, Czech Republic, January 21-27,
    2006.
Write a Comment
User Comments (0)
About PowerShow.com