Controlled Vocabularies: Name Authority Control - PowerPoint PPT Presentation

About This Presentation
Title:

Controlled Vocabularies: Name Authority Control

Description:

Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 49
Provided by: RayR160
Category:

less

Transcript and Presenter's Notes

Title: Controlled Vocabularies: Name Authority Control


1
Controlled Vocabularies Name Authority Control
  • University of California, Berkeley
  • School of Information Management and Systems
  • SIMS 202 Information Organization and Retrieval

Slide authors Ray R. Larson, Marti Hearst
2
Review
  • Dublin Core
  • Other Metadata Systems
  • Cognitive basis of categorization

3
Dublin Core Elements
  • Title
  • Creator
  • Subject
  • Description
  • Publisher
  • Other Contributors
  • Date
  • Resource Type
  • Format
  • Resource Identifier
  • Source
  • Language
  • Relation
  • Coverage
  • Rights Management

4
Issues in Dublin Core
  • Lack of guidance on what to put into each element
  • How to structure or organize at the element
    level?
  • How ensure consistency across descriptions for
    the same persons, places, things, etc.

5
More Metadata Systems
  • The following are a sample of metadata systems
    for a variety of special types of
    data/documents/objects.

6
Type of Metadata systems and standards
  • Naming and ID systems URLs, ISBNs
  • Bibliographic description MARC, Dublin Core,
    TEI, etc.
  • Music -- SMDL
  • Images and objects CIMI, VRA Core Categories
  • Numeric Data DDI, SDSM
  • Geospatial Data FGDC
  • Collections EAD

7
Metadata Resources
  • Check the Links section from the class home page
  • Best site is the Digital Library Metadata
    Resources page from IFLA at http//www.ifla.org/I
    I/metadata.htm

8
Today
  • More on Controlled vocabularies
  • Choice of names
  • Form of names
  • Name Authority files
  • Types of Controlled Vocabularies
  • Facetted vs. Hierarchic organization of
    vocabularies

9
Controlled Vocabularies
  • Vocabulary control is the attempt to provide a
    standardized and consistent set of terms (such as
    subject headings, names, classifications, etc.)
    with the intent of aiding the searcher in finding
    information.

10
Controlled Vocabularies
  • Names and name authorities Other Types of
    Controlled Vocabulary (Today)
  • Design of controlled vocabularies for subject
    access -- Thesaurus design (Thursday)

11
Names
  • Cutters objectives of bibliographic description
  • To enable a person to find a document of which
    the author is known
  • To show what the library has by a given author
  • First serves access
  • Second serves collocation

12
Problems with Names
  • How many names should be associated with a
    document?
  • Which of these should be the main entry?
  • What form should each of the names take?
  • What references should be made from other
    possible forms of names that havent been used?

13
The problem
  • Proliferation of the forms of names
  • Different names for the same person
  • Different people with the same names
  • Examples
  • from Books in Print (semi-controlled but not
    consistent)
  • ERIC author index (not controlled)

14
Rules for description
  • AACR II and other sets of descriptive cataloging
    rules provide guidelines for
  • Determining the number of name entries
  • Choosing a main entry
  • Deciding on the form of name to be used
  • Deciding when to make references

15
Authority control
  • Authority control is concerned with creation and
    maintenance of a set of terms that have been
    chosen as the standard representatives (also know
    as established) based on some set of rules.
  • If you have rules, why do you need to keep track
    of all of the headings? Cant you just infer the
    headings from the rules?

16
Conditions of Authorship?
  • Single person or single corporate entity
  • Unknown or anonymous authors
  • Fictitiously ascribed works
  • Shared responsibility
  • Collections or editorially assembled works
  • Works of mixed responsibility (e.g. translations)
  • Related Works

17
Added Entries
  • Personal names
  • Collaborators
  • Editors, compilers, writers
  • Translators (in some cases)
  • Illustrators (in some cases)
  • Other persons associated with the work (such as
    the honoree in a Festschrift).
  • Corporate Names
  • Any prominently named corporate body that has
    involvement in the work beyond publication,
    distribution, etc.

18
Choice of Name
  • AACR II says that the predominant form of the
    name used in a particular authors writings
    should be chosen as the form of name.
  • References should be made from the other forms of
    the name.

19
Form of the Name
  • When names appear in multiple forms, one form
    needs to be chosen. Criteria for choice are
  • Fullness (e.g. Full names vs. initials only)
  • Language of the name.
  • Spelling (choose predominant form)
  • Entry element
  • John Smith or Smith, John?
  • Mao Zedong or Zedong, Mao? (Mao Tse Tung?)

20
Name Authority Files
IDNAFL8057230 STp ELn STHa MSc
UIPa TD19910821174242 KRCa NMUa
CRCc UPNa SBUa SBCa DIDn
DF05-14-80 RFEa CSC SRUb SRTn
SRNn TSS TGA? ROM? MOD VSTd
08-21-91 Other Versions
earlier 040 DLCcDLCdDLCdOCoLC 053
PR6005.R517 100 10 Creasey, John 400 10
Cooke, M. E. 400 10 Cooke, Margaret,d1908-1973
400 10 Cooper, Henry St. John,d1908-1973
400 00 Credo,d1908-1973 400 10 Fecamps,
Elise 400 10 Gill, Patrick,d1908-1973 400
10 Hope, Brian,d1908-1973 400 10 Hughes,
Colin,d1908-1973 400 10 Marsden, James 400
10 Matheson, Rodney 400 10 Ranger, Ken 400
20 St. John, Henry,d1908-1973 400 10 Wilde,
Jimmy 500 10 wnnncaAshe, Gordon,d1908-1973
Different names for the same person
21
Name Authority Files
IDNAFO9114111 STp ELn STHa MSn
UIPa TD19910817053048 KRCa NMUa
CRCc UPNa SBUa SBCa DIDn
DF06-03-91 RFEa CSCc SRUb SRTn
SRNn TSS TGA? ROM? MOD VSTd
08-19-91 040 OCoLCcOCoLC 100 10 Marric,
J. J.,d1908-1973 500 10 wnnncaCreasey,
John 663 Works by this author are entered
under the name used in the item. For a
listing of other names used by this author,
search also underbCrease y, John 670
OCLC 13441825 His Gideon's day, 1955b(hdg.
Creasey, John usage J .J. Marric) 670
LC data base, 6/10/91b(hdg. Creasey, John
usage J.J. Marric) 670 Pseuds. and
nicknames dict., c1987b(Creasey, John,
1908-1973 Britis h author pseud.
Marric, J. J.)
22
Name authority files
IDNAFL8166762 STp ELn STHa MSc
UIPa TD19910604053124 KRCa NMUa
CRCc UPNa SBUa SBCa DIDn
DF08-20-81 RFEa CSC SRUb SRTn
SRNn TSS TGA? ROM? MOD VSTd
06-06-91 Other Versions
earlier 040 DLCcDLCdDLCdOCoLC 100 10
Butler, William Vivian,d1927- 400 10 Butler,
W. V.q(William Vivian),d1927- 400 10 Marric,
J. J.,d1927- 670 His The durable
desperadoes, 1973. 670 His The young
detective's handbook, c1981bt.p. (W.V. Butler)
670 His Gideon's way, 1986bCIP t.p.
(William Vivian Butler writing as J .J.
Marric)
Different people writing with the same name
23
Other Types of Controlled Vocabularies
  • Gazetteers (Geographic Names)
  • Code lists (e.g. LC Language Codes)
  • Subject Heading Lists
  • Classification Schemes
  • Thesaurii

24
Structure of an IR System
Storage Line
Interest profiles Queries
Documents data
Search Line
Information Storage and Retrieval System
Rules of the game Rules for subject indexing
Thesaurus (which consists of Lead-In Vocabulary
and Indexing Language
Indexing (Descriptive and Subject)
Formulating query in terms of descriptors
Storage of profiles
Storage of Documents
Store1 Profiles/ Search requests
Store2 Document representations
Comparison/ Matching
Potentially Relevant Documents
Adapted from Soergel, p. 19
25
Uses of Controlled Vocabularies
  • Library Subject Headings, Classification and
    Authority Files.
  • Commercial Journal Indexing Services and
    databases
  • Yahoo, and other Web classification schemes
  • Online and Manual Systems within organizations
  • SunSolve
  • MacArthur

26
Types of Indexing Languages
  • Uncontrolled Keyword Indexing
  • Indexing Languages
  • Controlled, but not structured
  • Thesauri
  • Controlled and Structured
  • Classification Systems
  • Controlled, Structured, and Coded
  • Faceted Classification Systems

27
Indexing Languages
  • An index is a systematic guide designed to
    indicate topics or features of documents in order
    to facilitate retrieval of documents or parts of
    documents.
  • An Indexing language is the set of terms used in
    an index to represent topics or features of
    documents, and the rules for combining or using
    those terms.

28
Indexing Languages
  • Library of Congress Subject Headings
  • Yellow Pages Topics
  • Wilson Indexes (Readers Guide)

29
Thesauri
  • A Thesaurus is a collection of selected
    vocabulary (preferred terms or descriptors) with
    links among Synonymous, Equivalent, Broader,
    Narrower and other Related Terms

30
Thesauri (cont.)
  • National and International Standards for Thesauri
  • ANSI/NISO z39.19--1994 -- American National
    Standard Guidelines for the Construction, Format
    and Management of Monolingual Thesauri
  • ANSI/NISO Draft Standard Z39.4-199x -- American
    National Standard Guidelines for Indexes in
    Information Retrieval
  • ISO 2788 -- Documentation -- Guidelines for the
    establishment and development of monolingual
    thesauri
  • ISO 5964-- Documentation -- Guidelines for the
    establishment and development of multilingual
    thesauri

31
Thesauri (cont.)
  • Examples
  • The ERIC Thesaurus of Descriptors
  • The Art and Architecture Thesaurus
  • The Medical Subject Headings (MESH) of the
    National Library of Medicine

32
Hierarchical vs. Faceted (Subject Heading vs.
Descriptor)Category Systems
Slide author Marti Hearst
33
Controlled Vocabulary(The following slides
follow Bates 88)
  • Start with the text of the document
  • Attempt to control or regularize
  • The concepts expressed within
  • mutually exclusive
  • exhaustive
  • The language used to express those concepts
  • limit the normal linguistic variations
  • regulate word order and structure of phrases
  • reduce the number of synonyms or near-synonyms
  • Also, provide cross-references between concepts
    and their expression.

Slide author Marti Hearst
34
Classification Schemes
  • Classify possible concepts.
  • Goals
  • Completely distinct conceptual categories
    (mutually exclusive)
  • Complete coverage of conceptual categories
    (exhaustive)

Slide author Marti Hearst
35
AssigningHeadings vs. Descriptors
  • Subject headings
  • assign one (or a few) complex heading(s) to the
    document
  • Descriptors
  • Mix and match

How would we describe recipes using each
technique?
Slide author Marti Hearst
36
Subject Heading vs. Descriptor
  • WILSONLINE
  • Athletes
  • Athletes--HeathHygiene
  • Athletes--Nutrition
  • Athletes--Physical Exams
  • Athletics
  • Athletics -- Administration
  • Athletics -- Equipment -- Catalogs
  • Sports -- Accidents and injuries
  • Sports -- Accidents and injuries -- prevention
  • ERIC
  • Athletes
  • Athletic Coaches
  • Athletic Equipment
  • Athletic Fields
  • Athletics
  • Sports psychology
  • Sportsmanship

Slide author Marti Hearst
37
Subject Headings vs. Descriptors
  • Describe one concept within a document
  • Designed to be used in Boolean searching
  • Combine to describe the desired document
  • Many (5-25) descriptors per document
  • Describe the contents of an entire document
  • Designed to be looked up in an alphabetical index
  • Look up document under its heading
  • Few (1-5) headings per document

Slide author Marti Hearst
38
Hierarchical Classification
  • Each category is successively broken down into
    smaller and smaller subdivisions
  • No item occurs in more than one subdivision
  • Each level divided out by a character of
    division. Also known as a feature.
  • Example distinguish Literature based on
  • Language
  • Genre
  • Time Period

Slide author Marti Hearst
39
Hierarchical Classification
Slide author Marti Hearst
40
Labeled Categories for Hierarchical Classification
  • LITERATURE
  • 100 English Literature
  • 110 English Prose
  • English Prose 16th Century
  • English Prose 17th Century
  • English Prose 18th Century
  • ...
  • 111 English Poetry
  • 121 English Poetry 16th Century
  • 122 English Poetry 17th Century
  • ...
  • 112 English Drama
  • 130 English Drama 16th Century
  • 200 French Literature

Slide author Marti Hearst
41
Faceted Classification
  • Create a separate, free-standing list for each
    characteristic of division (feature).
  • Combine features to create a classification.

Slide author Marti Hearst
42
Faceted Classification along with Labeled
Categories
  • A Language
  • a English
  • b French
  • c Spanish
  • B Genre
  • a Prose
  • b Poetry
  • c Drama
  • C Period
  • a 16th Century
  • b 17th Century
  • c 18th Century
  • d 19th Century
  • Aa English Literature
  • AaBa English Prose
  • AaBaCa English Prose 16th Century
  • AbBbCd French Poetry 19th Century
  • BbCd Drama 19th Century

Slide author Marti Hearst
43
Important QuestionHow to use both types
ofclassification structures?
  • How to look through them?
  • How to use them in search?

Slide author Marti Hearst
44
Classification Systems
  • A classification system is an indexing language
    often based on a broad ordering of topical areas.
    Thesauri and classification systems both use this
    broad ordering and maintain a structure of
    broader, narrower, and related topics.
    Classification schemes commonly use a coded
    notation for representing a topic and its place
    in relation to other terms.

45
Classification Systems (cont.)
  • Examples
  • The Library of Congress Classification System
  • The Dewey Decimal Classification System
  • The ACM Computing Reviews Categories
  • The American Mathematical Society Classification
    System

46
Automatic Indexing and Classification
  • Automatic indexing is typically the simple
    deriving of keywords from a document and
    providing access to all of those words.
  • More complex Automatic Indexing Systems attempt
    to select controlled vocabulary terms based on
    terms in the document.
  • Automatic classification attempts to
    automatically group similar documents using
    either
  • A fully automatic clustering method.
  • An established classification scheme and set of
    documents already indexed by that scheme.

47
Clustering
Agglomerative methods Polythetic, Exclusive or
Overlapping, Unordered clusters are
order-dependent.
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Rocchios method
1. Select initial centers (I.e. seed the
space) 2. Assign docs to highest matching centers
and compute centroids 3. Reassign all documents
to centroid(s)
48
Automatic Class Assignment
Automatic Class Assignment Polythetic, Exclusive
or Overlapping, usually ordered clusters are
order-independent, usually based on an
intellectually derived scheme
Doc
Doc
Doc
Doc
Search Engine
Doc
Doc
Doc
1. Create pseudo-documents representing
intellectually derived classes. 2. Search using
document contents 3. Obtain ranked list 4. Assign
document to N categories ranked over
threshold. OR assign to top-ranked category
Write a Comment
User Comments (0)
About PowerShow.com