Guidelines and Principles for Developing Search and Browse Vocabularies May 31, 2003 Rice University Houston, TX - PowerPoint PPT Presentation

About This Presentation
Title:

Guidelines and Principles for Developing Search and Browse Vocabularies May 31, 2003 Rice University Houston, TX

Description:

... search terms based on user testing and on size of collection. ... BT Vehicle collisions. Browse Categories. Vehicle safety. Truck safety. Truck collisions ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 27
Provided by: amyw162
Category:

less

Transcript and Presenter's Notes

Title: Guidelines and Principles for Developing Search and Browse Vocabularies May 31, 2003 Rice University Houston, TX


1
Guidelines and Principles for Developing Search
and Browse Vocabularies May 31, 2003Rice
UniversityHouston, TX
  • Amy J. Warner, PhD
  • awarner_at_lexonomy.com

2
Epicurious.com
3
Navigation/Taxonomy
Vehicle Brands Vehicle Parts Cars Vehicle
Accessories MR2Spider Carriers
Celica Bicycle Carriers
Matrix Ski Carriers
Avalon Roof Racks Camry Solarus
Splash Guards Camry
Security Systems Prius Tires
Corolla ECHO
SUVs/Vans Engines Transmissions Land
Cruiser Sequoia
4 Runner Sienna Highlander
RAV4 Trucks Tundra Tacoma
Celica Brochure
Camry Brochure
4
Synonym Rings
Cholesterol Blood Cholesterol Serum
Cholesterol Good Cholesterol Bad Cholesterol LDL
. . .
5
Medline
6
MeSH UMLS
7
Controlled Vocabulary Defined
  • A subset of natural language.
  • A list of preferred and (sometimes)variant terms.
  • With semantic relationships (hierarchical and
    associative) (sometimes) defined.
  • Used to tag document attributes (describe
    facets).
  • Topic / Subtopic
  • Audience
  • Language
  • Form
  • Or can be used to create labeling scheme for
    navigation.

8
Cornerstones of Vocabulary Control
  • Use unambiguous labels/search terms.
  • Make distinctions among labels/search terms
    clear.
  • Make choices about wording and specificity of
    labels/search terms based on user testing and on
    size of collection.
  • Use other semantic relationships (hierarchical,
    associative) if necessary to organize large lists
    of labels/search terms.

9
Continuum of Vocabulary Control
Less More
  • Synonym
  • Control
  • USE/Used for relationship Vehicle
    crashes USE Vehicle
    collisions Vehicle collisions
    UF Vehicle crashes
  • Synonym Rings Vehicle collisions
    Vehicle crashes
    Crashes Collisions
  • Hierarchical
  • Relationships
  • Broader/Narrower Terms Vehicle
    collisions NT Truck
    collisions Truck collisions
    BT Vehicle collisions
  • Browse Categories Vehicle safety
    Truck safety Truck
    collisions Vehicle safety
  • Site Index
  • Taxonomies
  • Associative
  • Relationships
  • Part/Whole
  • Cause/Effect
  • etc. Vehicle parts RT Vehicles
    Vehicles RT Vehicle parts

10
Steps in Controlled Vocabulary Construction
  • Group terms by subject (facet analysis)
  • Link synonyms and variants. Synonym Rings
    Vehicle collisions
    Vehicle crashes
    Crashes
    Collisions
  • Identify broader and narrower terms. Taxonomies
    / Hierarchies
  • Identify related terms. Thesauri

11
Purposes of Standard
  • Base choices on best practice.
  • Base choices on known principles.
  • Foster interoperability.

12
Current NISO Thesaurus Standard
  • Guidelines for the construction, format, and
    management of monolingual thesauri Z39.19-1993.
  • Not a technical standard, but a set of
    guidelines.
  • Emphasizes search thesauri.
  • Emphasizes postcoordinate retrieval.
  • Used mainly for abstracting and indexing
    services.
  • Does not put the standard in context.

13
Why Revise
  • Not revised since 1993.
  • Number of downloads high, reflecting interest.
  • Does not take the web environment into account.
  • Navigation schemes are controlled vocabularies
    too.
  • Is out of date in terms of computing technology
    in general
  • Software for managing thesauri has advanced.
  • Software for leveraging thesauri though an
    interface has advanced.
  • Currently little attention paid to user testing.

14
Term forms
  • Currently
  • Emphasizes rigid rules for grammatical form.
  • Emphasizes short phrases as terms.
  • Suggested revision
  • Loosen rules on grammatical form.
  • Allow for longer, more complex phrases.
  • Rationale
  • Software can perform automatic stemming.
  • Navigation schemes are more precoordinate.

15
Semantic Relationships
  • Current standard
  • Only accounts for explicit equivalence
    relationships.
  • Hierarchical relationship only allowed for
    genus-species relationship, with a few
    exceptions.
  • Associative relationship only allowed across
    categories.
  • Proposed revision
  • Provide guidelines for choosing unambiguous
    labels.
  • Provide guidelines for loose, browse categories.
  • Rationale
  • Labeling schemes and pick lists often do not
    account for explicit synonymy relationships.
  • Hierarchical navigation schemes need to be less
    rigid.

16
Browse Categories
17
Usability Testing
  • Current standard
  • Discusses users but does not include guidelines
    for testing with users.
  • Proposed revision
  • Provide guidelines for open card sort testing of
    high level categories.
  • Provide guidelines for closed card sorting of
    term groups under high level categories.
  • Rationale
  • User testing important consideration for choose
    terms and term relationships.

18
Display
  • Current standard
  • Emphasizes print copies of thesauri.
  • Screen display section oriented toward display of
    print copy.
  • Proposed revision
  • Oriented more toward displays of vocabularies
    that only exist in digital format.
  • Rationale
  • Most web vocabularies do not have print
    counterparts.

19
Interoperability
  • Current standard
  • Does not address issues associated with
    interoperability
  • Proposed revision
  • Will address major issues and problems associated
    with interoperability, including multiple
    languages
  • Rationale
  • Being able to share information within and among
    organizations

20
Construction and Maintenance
  • Current standard
  • Emphasizes maintenance problems in print
    vocabularies.
  • Discusses software that manages stand-alone
    vocabularies.
  • Proposed revision
  • Advance standards for changing, adding, deleting
    terms automatically.
  • Provide guidance for software that is connected
    to information retrieval systems.
  • Rationale
  • Software has advanced significantly.

21
Process for Revising Standard
  • Appoint editor.
  • Appoint advisory group.
  • Draft revision.
  • Discuss drafts with advisory group.
  • Vote on final draft by NISO board.

22
Editor Advisory Group
  • Amy Warner, lexonomy.com
  • Vivian Bliss, Microsoft
  • Carol Brent, ProQuest
  • John Dickert, U.S. DoD
  • Lynn El-Hoshy, Library of Congress
  • Emily Fayen, SDC liaison
  • Patricia Harpring, Getty
  • Stephen Hearn, American Library Association
  • Sabine Kuhn, American Chemical Society/Chemical
    Abstracts
  • Pat Kuhr, H.W. Wilson
  • Diane McKerlie, Design Strategy
  • Peter Morville, Semantic Studios
  • Stuart Nelson, National Library of Medicine
  • Diane Vizine-Goetz, OCLC
  • Marcia Lei Zeng, Special Libraries Association

23
Progress to Date
  • Agreement on scope of revision.
  • Agreement that guidelines should be placed in
    context.
  • Agreement that guidelines should be educational
    as well as prescribing best practice.
  • Agreement that guidelines should be forward
    looking in terms of new technologies.
  • Agreement to write guidelines for elements and
    features that all vocabularies have in common,
    then consider their differences.
  • Survey conducted to determine use of standard,
    other standards, software.

24
Other Players
  • Communication with editor of British Standard.
  • Communication and work with W3C to address issues
    of implementation of controlled vocabularies.

25
Relationship with Semantic Web and OWL
  • Semantic Web is an ontological framework.
  • Both terms in the ontology and the relationships
    between them are standardized using OWL (Web
    Ontology Language).
  • Both the terms and the relationships are deep
    semantically.
  • This is a structure into which shallower terms
    provided by using Z39.19 could be inserted.
  • This would enhance interoperability because
    although we would not have complete agreement on
    vocabularies, we would have agreement on an
    effective structure for exchanging them.

26
Contact Me
Amy J. Warner awarner_at_lexonomy.com www.lexonomy.
com
Write a Comment
User Comments (0)
About PowerShow.com