Title: An Inquiry and Analysis of Metadata Utilization A Case Study of MARC
1An Inquiry and Analysis of Metadata Utilization
A Case Study of MARC
2005 ASIST Annual Meeting, November 1, 2005,
Charlotte, North Carolina
- William E. Moenltwemoen_at_unt.edugtSchool of
Library and Information SciencesTexas Center for
Digital KnowledgeUniversity of North
TexasDenton, TX 72603
2Two quality criteria
- Fullness/completeness
- Usefulness
3Context for the initial analysis
- Z39.50 Interoperability Testbed project
- A Institute of Museum and Library Services
National Leadership Grant - Goal Improve Z39.50 semantic interoperability
among libraries for information access and
resource sharing - Interoperability across library online catalogs
- Indexing of MARC records to support searching
- Richness of MARC content designation available
- Inform indexing guidelines and policies
4Indexing MARC
- Indexing Guidelines to Support Z39.50 Profile
Searches (available on Z-Interop website) - Identified all MARC 21 fields/subfields that can
contain author, title, or subject data - Author-related fields/subfields 119
- AuthorTitle-related fields/subfields 21
- Title-related fields/subfields 253
- Subject-related fields/subfields 144
5Z-Interop test dataset
- Approximately 1 sample of MARC records from
OCLCs WorldCat database - Weighted sampling based on number of libraries
holding the object represented by the record - 419,657 total MARC records
- 89 of records full level cataloging
- Formats represented in test dataset
- Books 91
- Cartographic Materials lt 1
- Electronic resources lt 1
- Archival/Mixed Materials lt1
- Sound recordings 4
- Visual Materials 1
- Serials 3
6MARC 21 content designation
MARC 21 Field Groups Currently Defined Obsolete Total MARC 1972 (Books Format Only)
00x 6 1 7 3
0xx 238 7 245 28
1xx 66 1 67 40
2xx 137 32 169 15
3xx 109 32 141 4
4xx 69 0 69 37
5xx 323 38 361 8
6xx 184 5 189 66
7xx 452 47 499 41
8xx 141 20 161 36
TOTAL 1725 183 1908 278
7Content designation in dataset
MARC 21 Field Groups Currently Defined Obsolete Unlikely Used Total
00x 6 0 0 6
0xx 96 1 33 130
1xx 49 0 2 51
2xx 81 0 19 100
3xx 23 6 0 29
4xx 10 0 30 40
5xx 128 1 3 132
6xx 104 1 7 112
7xx 205 0 5 210
8xx 105 3 8 116
TOTAL 807 12 107 926
8Summary frequency results
Total number of fields/subfields occurring in
dataset 13,849,499
Frequency of Fields/Subfields of All Occurrences
gt 600,000 1 4.4
500,000 gt 599,999 0 0
400,000 gt 499,999 13 39.9
300,000 gt 399,999 6 14.3
200,000 gt 299,999 6 10.6
100,000 gt 199,999 10 10.3
TOTAL 36 79.5
Only 4 of all fields/subfields account for 80
of all occurrences or 96 of all fields/subfields
account for 20 of all occurrences
9Characteristics of top 36
- Most frequently occurring 650 a Subject data
- 2nd most frequently occurring 040 d Cataloging
source - 3rd 4th most frequently occurring 260 a b
Publication information - 5th most frequently occurring 245 a Title
- Contain data useful to end users 28
- Contain control numbers, etc. 5
- Contain data useful to catalogers 3
- Top 36 fields/subfields
10Implications for indexing
- 537 fields/subfields contain author, title,
subject data - 381 of these actually occur in Z-Interop dataset
- Total occurrences of the 381 4,397,712
- 19 of the 381 (5) account for 80 of all
occurrences - 9 of 19 are subject-related
- 5 of 19 are author-related
- 5 of 19 are title-related
- Preliminary testing using only 19 indexed fields
- 95 - 100 of correct records retrieved!
11The MCDU Project
- The MARC Content Designation Utilization Project
- What is the extent of catalogers use of content
designation available in MARC 21? - Develop and implement systematic methods,
procedures, and software tools to produce
reliable and valid analysis of MARC 21 content
designation use - MARC record as artifact of cataloging enterprise
FOR MORE INFORMATION, VISIT THE PROJECT WEBSITE
http//www.mcdu.unt.edu/
12The MCDU dataset analysis
- 56 million MARC records all WorldCat bib
records - Parsed and stored in MySQL
- 20 databases
- LC and Non-LC created records
- 10 databases each based on type of record/format
- Frequency counts of all fields/subfields
- Non-LC Book Format field occurrence results
13Making sense of the numbers
- The numbers dont stand on their own
contextualizing, qualifying, exploring,
understanding - Metadata quality Fullness/completeness
- Identify core elements of bibliographic records
based on the analysis of format-specific samples
and compare with existing recommendations for
core records - Metadata quality Usefulness
- Comparing the FRBR conceptual frameworks user
tasks, MARC content designation supporting those
tasks, and utilization of that content
designation in the records
14References
- MARC Content Designation Utilization Project
- http//www.mcdu.unt.edu
- Z39.50 Interoperability Testbed
- http//www.unt.edu/zinterop/