A Hybrid Approach to Faceted Classification Based on Analysis of Descriptor Suffixes - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

A Hybrid Approach to Faceted Classification Based on Analysis of Descriptor Suffixes

Description:

Classification-based Search and Knowledge Discovery ... Coral Reefs. Aquatic Ecosystems. Ecosystems. Step 1 of 6: Generate a list of terms to be organized ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 32
Provided by: elvisSli
Category:

less

Transcript and Presenter's Notes

Title: A Hybrid Approach to Faceted Classification Based on Analysis of Descriptor Suffixes


1
A Hybrid Approach to Faceted Classification
Based on Analysis of Descriptor Suffixes
  • Aaron Loehrlein, Dr. Elin K. Jacob,
  • Dr. Kiduk Yang, Seungmin Lee, Ning Yu

Classification-based Search and Knowledge
Discovery Research Group Indiana University
Bloomington ASIST, Charlotte, NC, November 1,
2005
2
objective
  • Develop heuristics for a hybrid approach to
    faceted classification
  • Part manual
  • Part automated
  • Combine the best aspects of both approaches
  • Reduce the man-hours required to create a
    classification scheme

3
Faceted classification
  • Identify aspects of the domain.
  • Each aspect consists of a separate hierarchy of
    categories.
  • aspects the domain automobiles might include
    model, year, and color.
  • Classes are created by combining values from
    multiple domains.
  • one class might be Camry, 1996, tan.
  • (Ranganathan, 19441945 Jacob Priss, 2001
    Priss Jacob, 1998)

4
Suffixes
  • Automated approach organizes terms according to
    their suffixes.

5
Existing research involving suffixes
  • Suffix stemming in order to conflate terms
    (Harman, 1991 Savoy, 1993)
  • Identification of syntigmatic relationships
    (Okada, Ando, Lee, Hayashi, and Aoe, 2001)
  • Utilizing suffix trees to more efficiently store
    strings (Stephen, 1994 Zamir and Etzioni, 1999
    Park, Chu, Yoon, and Won, 2003)
  • Analyzing relationships between morphological
    similarity and semantic similarity (Bybee, 1988)

6
Process
  • Generate a list of terms to be organized.
  • Generate a list of suffixes (and other
    ending-strings).
  • Assign meanings to each suffix.
  • Organize the suffix meanings.
  • Organize the terms according to their suffixes.
  • Manually revise and complete the faceted
    classification.

7
Step 1 of 6 Generate a list of terms to be
organized
Compiled two lexicon bases, each representing the
intellectual content of the domain
8
Step 1 of 6 Generate a list of terms to be
organized
  • Small Lexicon Base examples

9
Step 1 of 6 Generate a list of terms to be
organized
  • Large Lexicon Base example
  • Sample Document
  • BRAC 1995 Quick Reference Guide Community and
    Environmental Activities
  • Author-extracted keywords
  • Community
  • Environmental Activities
  • Base Realignment
  • (etc.)

10
Step 2 of 6 Generate a list of suffixes (and
other ending-strings)
  • Types of ending strings
  • Suffixes (1571 from Merriam-Webster online)
  • Pseudo-suffixes
  • Final cluster phonesthemes (174)
  • Any other ending-string (81)
  • The vast majority of the work was done by the
    suffixes from Merriam-Webster

11
Step 2 of 6 Generate a list of suffixes (and
other ending-strings)
12
Step 3 of 6Assign meanings to each suffix
Unit of Analysis is suffix/meaning pairs
Entry -ment / state, quality, or condition
13
Step 3 of 6Assign meanings to each suffix
  • Manually standardize certain meanings
  • -al of, relating to, or characterized by
  • -ative of, relating to, or connected with
  • becomes
  • -al characteristics
  • -ative characteristics

14
Step 4 of 6 Organize the suffix meanings
15
Step 5 of 6 Organize the terms according to
their suffixes
Manually validate each grouping Threshold 50
16
Step 5 of 6 Organize the terms according to
their suffixes
  • The same suffix/meaning pairs were valid for both
    lexicon bases (mostly).
  • Once suffix/meaning pairs are selected, organize
    the terms again.

17
Step 6 of 6 Manually revise and complete the
faceted classification
18
Step 6 of 6 Manually revise and complete the
faceted classification
  • Manual grouping
  • Disinfection
  • Pretreatment
  • Washing
  • Automatic output
  • Measurement
  • Reimbursement
  • Requirement
  • Treatment

19
Results
20
Large Lexicon Base - distribution of terms
21
Large Lexicon Base
  • Hypothesis 1
  • results will improve if we filter out of the
    lexicon base all terms that appear only once.

22
Large Lexicon Base - multiple terms only
23
Large Lexicon Base
  • Hypothesis 2
  • results will improve if we take all groups with
    more than 30 terms and truncate it to the 30 most
    frequent terms.

24
(No Transcript)
25
Large Lexicon Base - truncating big groups down
to 30
26
Small Lexicon Base - distribution of terms
27
Small Lexicon Base - multiple terms only
28
Many groups were retained
29
Conclusions and future work
  • Conclusions
  • Suffixes alone are not adequate.
  • However, they are useful, particularly for
    faceted classification
  • When combined with other approaches, the suffix
    heuristic may become more acceptable
  • Future work
  • Combination with other methods
  • implementation in other domains, including
    physics, economics and biology

30
References
  • Bybee, J.L. (1988). Morphology as lexical
    organization. In Theoretical morphology
    Approaches in modern linguistics (pp. 119-142).
    San Diego Academic Press, Inc.
  • Harman, D. (1991). How effective is suffixing?
    Journal of the American Society for Information
    Science, 42 (1), 7-15.
  • Jacob, E. K., and Priss, U. (2001).
    Non-traditional indexing structures for the
    management of electronic resources. In Advances
    in classification research, vol.10. Information
    Today for the American Society for Information
    Science, Medford, NJ, 73-90.
  • Okada, M., Ando, K., Lee, S.S., Hayashi, Y., and
    Aoe, J. (2001). An efficient substring search
    method by using delayed keyword extraction.
    Information Processing Management, 37, 741-761.
  • Priss, U., and Jacob, E.K. (1998). A graphical
    interface for faceted thesaurus design. In
    Proceedings of the 9th ASIS SIG/CR Classification
    Research Workshop (Pittsburgh, PA, October 25,
    1998). American Society for Information Science,
    Silver Spring, MD, 107-118.
  • Ranganathan, S. R. Library Classification
    fundamentals and procedures with 1008 graded
    examples and exercises. Madras Library
    Association, Madras, 1944.
  • Ranganathan, S. R. (1945). Elements of library
    classification based on lectures delivered at
    the University of Bombay in December 1944. N.K.
    Publishing House, Poona, 1945.
  • Savoy, J. (1993). Stemming of French words based
    on grammatical categories. Journal of the
    American Society for Information Science, 44 (1),
    1-9.
  • Shisler, B.K. (1997). The Dictionary of English
    Phonaesthemes. Retrieved June 28, 2005 from
    http//www.geocities.com/SoHo/Studios/9783/phonpap
    2.htmlglossary.
  • Stephen, G.A. (1994). String Searching
    Algorithms. New Jersey World Scientific.

31
Thank you
  • URL
  • http//ella.slis.indiana.edu/aloehrle/cskdASIST2
    005.ppt
  • Contact Info
  • Aaron Loehrlein
  • aloehrle_at_indiana.edu
  • Indiana University Bloomington
Write a Comment
User Comments (0)
About PowerShow.com