Title: A Hybrid Approach to Faceted Classification Based on Analysis of Descriptor Suffixes
1A Hybrid Approach to Faceted Classification
Based on Analysis of Descriptor Suffixes
- Aaron Loehrlein, Dr. Elin K. Jacob,
- Dr. Kiduk Yang, Seungmin Lee, Ning Yu
Classification-based Search and Knowledge
Discovery Research Group Indiana University
Bloomington ASIST, Charlotte, NC, November 1,
2005
2objective
- Develop heuristics for a hybrid approach to
faceted classification - Part manual
- Part automated
- Combine the best aspects of both approaches
- Reduce the man-hours required to create a
classification scheme
3Faceted classification
- Identify aspects of the domain.
- Each aspect consists of a separate hierarchy of
categories. - aspects the domain automobiles might include
model, year, and color. - Classes are created by combining values from
multiple domains. - one class might be Camry, 1996, tan.
- (Ranganathan, 19441945 Jacob Priss, 2001
Priss Jacob, 1998)
4Suffixes
- Automated approach organizes terms according to
their suffixes.
5Existing research involving suffixes
- Suffix stemming in order to conflate terms
(Harman, 1991 Savoy, 1993) - Identification of syntigmatic relationships
(Okada, Ando, Lee, Hayashi, and Aoe, 2001) - Utilizing suffix trees to more efficiently store
strings (Stephen, 1994 Zamir and Etzioni, 1999
Park, Chu, Yoon, and Won, 2003) - Analyzing relationships between morphological
similarity and semantic similarity (Bybee, 1988)
6Process
- Generate a list of terms to be organized.
- Generate a list of suffixes (and other
ending-strings). - Assign meanings to each suffix.
- Organize the suffix meanings.
- Organize the terms according to their suffixes.
- Manually revise and complete the faceted
classification.
7Step 1 of 6 Generate a list of terms to be
organized
Compiled two lexicon bases, each representing the
intellectual content of the domain
8Step 1 of 6 Generate a list of terms to be
organized
- Small Lexicon Base examples
9Step 1 of 6 Generate a list of terms to be
organized
- Large Lexicon Base example
- Sample Document
- BRAC 1995 Quick Reference Guide Community and
Environmental Activities - Author-extracted keywords
- Community
- Environmental Activities
- Base Realignment
- (etc.)
10Step 2 of 6 Generate a list of suffixes (and
other ending-strings)
- Types of ending strings
- Suffixes (1571 from Merriam-Webster online)
- Pseudo-suffixes
- Final cluster phonesthemes (174)
- Any other ending-string (81)
- The vast majority of the work was done by the
suffixes from Merriam-Webster
11Step 2 of 6 Generate a list of suffixes (and
other ending-strings)
12Step 3 of 6Assign meanings to each suffix
Unit of Analysis is suffix/meaning pairs
Entry -ment / state, quality, or condition
13Step 3 of 6Assign meanings to each suffix
- Manually standardize certain meanings
- -al of, relating to, or characterized by
- -ative of, relating to, or connected with
- becomes
- -al characteristics
- -ative characteristics
14Step 4 of 6 Organize the suffix meanings
15Step 5 of 6 Organize the terms according to
their suffixes
Manually validate each grouping Threshold 50
16Step 5 of 6 Organize the terms according to
their suffixes
- The same suffix/meaning pairs were valid for both
lexicon bases (mostly). - Once suffix/meaning pairs are selected, organize
the terms again.
17Step 6 of 6 Manually revise and complete the
faceted classification
18Step 6 of 6 Manually revise and complete the
faceted classification
- Manual grouping
- Disinfection
- Pretreatment
- Washing
- Automatic output
- Measurement
- Reimbursement
- Requirement
- Treatment
19Results
20Large Lexicon Base - distribution of terms
21Large Lexicon Base
- Hypothesis 1
- results will improve if we filter out of the
lexicon base all terms that appear only once.
22Large Lexicon Base - multiple terms only
23Large Lexicon Base
- Hypothesis 2
- results will improve if we take all groups with
more than 30 terms and truncate it to the 30 most
frequent terms.
24(No Transcript)
25Large Lexicon Base - truncating big groups down
to 30
26Small Lexicon Base - distribution of terms
27Small Lexicon Base - multiple terms only
28Many groups were retained
29Conclusions and future work
- Conclusions
- Suffixes alone are not adequate.
- However, they are useful, particularly for
faceted classification - When combined with other approaches, the suffix
heuristic may become more acceptable - Future work
- Combination with other methods
- implementation in other domains, including
physics, economics and biology
30References
- Bybee, J.L. (1988). Morphology as lexical
organization. In Theoretical morphology
Approaches in modern linguistics (pp. 119-142).
San Diego Academic Press, Inc. - Harman, D. (1991). How effective is suffixing?
Journal of the American Society for Information
Science, 42 (1), 7-15. - Jacob, E. K., and Priss, U. (2001).
Non-traditional indexing structures for the
management of electronic resources. In Advances
in classification research, vol.10. Information
Today for the American Society for Information
Science, Medford, NJ, 73-90. - Okada, M., Ando, K., Lee, S.S., Hayashi, Y., and
Aoe, J. (2001). An efficient substring search
method by using delayed keyword extraction.
Information Processing Management, 37, 741-761. - Priss, U., and Jacob, E.K. (1998). A graphical
interface for faceted thesaurus design. In
Proceedings of the 9th ASIS SIG/CR Classification
Research Workshop (Pittsburgh, PA, October 25,
1998). American Society for Information Science,
Silver Spring, MD, 107-118. - Ranganathan, S. R. Library Classification
fundamentals and procedures with 1008 graded
examples and exercises. Madras Library
Association, Madras, 1944. - Ranganathan, S. R. (1945). Elements of library
classification based on lectures delivered at
the University of Bombay in December 1944. N.K.
Publishing House, Poona, 1945. - Savoy, J. (1993). Stemming of French words based
on grammatical categories. Journal of the
American Society for Information Science, 44 (1),
1-9. - Shisler, B.K. (1997). The Dictionary of English
Phonaesthemes. Retrieved June 28, 2005 from
http//www.geocities.com/SoHo/Studios/9783/phonpap
2.htmlglossary. - Stephen, G.A. (1994). String Searching
Algorithms. New Jersey World Scientific.
31Thank you
- URL
- http//ella.slis.indiana.edu/aloehrle/cskdASIST2
005.ppt - Contact Info
- Aaron Loehrlein
- aloehrle_at_indiana.edu
- Indiana University Bloomington