Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification

Description:

Module 6a: Intro to Controlled Vocabularies, Taxonomies and ... Syndetic structure. Searchers. Formulated queries. Formal tradition vs. document tradition ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 23
Provided by: michaelc78
Category:

less

Transcript and Presenter's Notes

Title: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification


1
Module 6a Intro to Controlled Vocabularies,
Taxonomies and Classification
  • IMT530 Organization of Information Resources
  • Winter 2007
  • Michael Crandall

2
Module 6a Outline
  • Where we are
  • Controlled vocabularies
  • Types of controlled vocabularies
  • Tagging
  • Overview of building vocabularies

3
Recap
  • We looked at the indexing process to see how
    controlled vocabularies can be used to enhance
    access to information
  • Different methods of indexing provide different
    results
  • Need to decide on your approach based on an
    analysis of your business objectives, the user
    needs, and the domain
  • A combination of automatic and human indexing is
    often the best solution

4
Overview of Subject Representation
  • Subject analysis
  • a technique used to determine the subject(s)
    and disciplinary context exemplified by an object
  • Subject indexing
  • a technique through which subject terms (words,
    taxonomic categories, or notation) are added to
    an object representation to describe the subject
    content of the object
  • Controlled vocabularies
  • standards containing controlled subject terms
    (words, taxonomic categories, or notation) used
    in the indexing process

5
Controlled Vocabulary Definition
  • A controlled vocabulary is a list of terms (words
    or phrases) or codes (notation) used for indexing
  • Almost always, controlled vocabularies show
    relationships among terms

6
Purpose of Controlled Vocabularies
  • Specific Purposes
  • To provide access to content by subject, through
    providing hierarchical and associative
    relationships and synonym control for the terms
    used in the domain
  • Increase precision in retrieval and display by
    controlling homographs (words that are spelled
    the same but have different meanings)
  • General Purposes
  • Assist users by conveying meaning, orientation,
    and structure in a subject area
  • Assist users by providing rich relationships
    among concepts and terms

7
Buckland
  • Proposes five different vocabularies in any
    system
  • Authors
  • Indexers
  • Syndetic structure
  • Searchers
  • Formulated queries
  • Formal tradition vs. document tradition

8
Types of Controlled Vocabularies
Zeng, M.L. (2005). Construction of controlled
vocabularies A primer.
  • Subject Heading List
  • Taxonomy
  • Thesauri
  • Classification Scheme
  • More terminology on Leonard Wills site
  • http//www.willpowerinfo.co.uk/glossary.htm

9
Subject Heading Lists
  • General list of terms (words and phrases), not
    limited by discipline or subject area
  • Terms are called subject headings
  • The distinction between thesauri subject
    heading lists is largely historical (subject
    heading lists are older) there are very few
    subject heading lists because they are so
    expensive to maintain
  • Terms are mainly subject attributes, but there
    are many exemplified attributes used in
    subdivisions
  • Example Library of Congress Subject Headings
    (LCSH), used in library catalogs
  • Sample terms France Colonies History
    18th century Time and space Juvenile
    fiction Frogs (notice the use of
    subdivisions, marked here by dashes thesauri
    seldom use subdivisions)

10
Taxonomies
  • List of terms (words and phrases) that may be
    general or subject/discipline/domain specific
  • Terms are called taxons or (simply) terms
  • Terms represent subjects, disciplines/domains,
    and exemplified attributes
  • Used in digital environment only
  • Examples Microsoft Corporation intranet
    taxonomies Yahoo taxonomy used in the Yahoo
    directory
  • Sample terms from the Yahoo taxonomy (in Yahoo,
    youll find these at the top of the screen as you
    browse through the directory) Education
    Science gt Agriculture gt Research gt Government
    Agencies Health gt Nursing Health gt
    Education

11
Thesauri
  • Thesauri (pl.) / Thesaurus (s.)
  • List of terms (words and phrases) that are
    usually limited to a specific subject or
    disciplinary area
  • Terms listed in a thesaurus are often called
    descriptors
  • Thesauri were mostly defined and developed after
    the advent of the computer and were created for
    use in an computerized environment (or with
    computers in mind)
  • Terms are usually subject (about) attributes, but
    some thesauri also contain exemplified (example
    of) attributes- http//www.e-government.govt.nz/nz
    gls/thesauri
  • Example ERIC Thesaurus (education)
  • Sample terms from the ERIC Thesaurus School
    community relationship College entrance
    exams Age grade placement

12
Classification Schemes
  • Chart of subject categories contextualized by a
    hierarchical structure
  • Terms are lists of codes (notation)
  • Terms are called classes and class numbers
  • Classification schemes make use of disciplinary,
    subject, and (sometimes) exemplified attributes
  • Used often to arrange physical documents
    sometimes used in online environments

13
Classification Example
  • Examples Dewey Decimal Classification (DDC)
    Universal Decimal Classification (UDC) Colon
    Classification
  • Sample entries (DDC)
  • 510 (meaning Mathematics (a discipline and a
    subject))
  • 512.57 (meaning Mathematics / Linear,
    multilinear, multidimensional algebras / Factor
    algebras)
  • 362.582 (meaning Social problems and services
    / Problems of and services to the poor /
    Financial assistance)

14
Four Types of Classification
  • Kwasnik describes four classification systems
  • Hierarchies
  • Trees
  • Paradigms
  • Facets
  • Paradigms are useful primarily for analysis of
    subject gaps and relationships in a constrained
    space
  • Trees are a poor form of hierarchy with limited
    relationships
  • Well look at the other two in some detail over
    the next two weeks

15
Hierarchies
  • Good for representation of knowledge in mature
    domains where the nature of the entities and
    relationships are well known
  • Youll see examples of these in the thesauri that
    we will look at in todays exercise
  • Require a model that describes what entities are
    included, with rules of association and
    distinction
  • Tend to be monolithic and cumbersome for large
    domains

16
Facets
  • Actually a different approach rather than a
    different structure
  • May use hierarchies or trees as part of the
    structure
  • Originated in the work of S.R. Ranganathan
  • Proposed that any object could be viewed in five
    ways personality, matter, energy, space and
    time (PMEST)
  • Being used more and more in modern information
    systems because of flexibility in meeting
    multiple needs

17
Collaborative Tagging
  • Points out issues of basic level and
    collective sensemaking
  • Tug of war between personal storage
  • Identifying qualities
  • Self reference
  • Task organizing
  • and public nature of access
  • What or who it is about
  • What it is
  • Who owns it
  • Categories
  • Stability emerges from imitation and shared
    experience

18
Trees vs. Tags
  • Weinbergers article postulates three types of
    vocabularies
  • Trees (hierarchies)
  • Facets
  • Tags
  • Golder/Huberman and Weinberger both point out
    that each approach can be useful in particular
    situations
  • Choosing your approach is part of the process of
    subject and domain analysis

19
Steps in Constructing CVs
  • Define your domain
  • Gather concepts
  • From user interviews, search logs, content
    analysis, preexisting vocabularies
  • Select your approach
  • Extract terminology
  • Control your terms
  • Organize your terms
  • Maintain, maintain, maintain

20
Questions?
  • If not, take a break!!!

21
Exercise 6a
  • Purpose is to explore some existing controlled
    vocabularies to investigate their differences and
    similarities, how useful they might be for
    subject access, and to become familiar with the
    structure of controlled vocabularies in general
  • Spend the next 45 minutes on Exercise 6a
  • Ask questions and talk!!!
  • Be sure to hand in completed work at the end of
    class for credit!!!

22
Thursday
  • Well start to look at ways to build controlled
    vocabularies and the rules associated with them
  • Remember to read assignments BEFORE class
Write a Comment
User Comments (0)
About PowerShow.com