Title: Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management
1Measuring Uniqueness in System-wide Book
Holdings Implications for Collection Management
- Constance Malpas
- Program Officer
- RLG Programs
2This presentation
- Summarizes recent data-mining efforts by OCLC
Programs and Research - System-wide sample (Summer 2007 Spring 2008)
- ARL unique print books (Autumn 2007)
- Suggests implications for collection managers
- Outlines next steps for RLG Programs
- An opportunity to discuss what additional
evidence and analysis is needed
3What we mean by last copy
- Monographic title uniquely-held by a single
WorldCat contributor - Cf. single copy repositories, where last copy
is relative to local/group holdings - May represent a last manifestation, expression or
work - Bibliographic records describe manifestations,
not copies unique manifestations are the point
of departure for analysis - Some are intrinsically unique others are
rendered unique by erosion of system-wide
holdings - Historical data may help document increased copy
or work-level availability, but werent included
in the studies presented here
4Distribution of wealth ARL unique books
20 of the population holds gt75 of unique titles
A classic Pareto distribution
institutional excellence?
(or) a network effect?
Median institutional holdings 19K titles
N 6.95 M titles
5Why focus on uniquely-held titles?
- Scarcity is common
- limited redundancy in holdings limited
preservation guarantee, limited opportunity to
create economies of scale by aggregating supply - Research institutions bear the brunt of
responsibility for long-term preservation and
access of unique titles - Academic and independent research libraries hold
up to 70 of aggregate unique print book
collection - Continuing costs of managing (storing, providing
access to) print collections are high use is
generally declining - Space pressure on physical plant (on-campus,
remote) is high understanding distribution and
characteristics of unique holdings can inform
decisions about disposition of physical
collection - Increased attention to stewardship of special
collections - ARL SCWG, CLIR, LC Task Force on Bibliographic
Control new attention to what constitutes
special collections, appropriate standards of
care, modes and metrics of use
6Challenges
- Identification requires group / network view of
holdings - ? WorldCat provides a reasonably proxy for
system-wide collection - Some materials (MSS, theses and dissertations,
etc.) are intrinsically unique not all can be
algorithmically identified in MARC records - ? hybrid approach combines computational and
manual analysis of bibliographic data - Sparse bibliographic records impede efficient
work/title matching, may introduce spurious
measure of uniqueness - ? external sources (including Google) sometimes
helpful in filling gaps - Non-English titles (especially transliterated
non-roman scripts) are especially difficult to
match - ? we resisted the temptation to exclude these
7Study I System-wide Sampling
- 250 randomly selected, uniquely-held titles
- Limited to printed books (including theses)
published before 2005 - English-language cataloging only
- Iterative re-sampling required to fill gaps
- Independently reviewed by three project staff
- Level of uniqueness
- Material type
- Results periodically collated for group analysis
- Compare results of individual analysis for
consistency - Seek consensus on difficult cases relatively
few of these - Re-sample as necessary to fill gaps
- White paper anticipated March 2008
8Study II ARL uniquely-held books
- Ad hoc analysis by RLG Programs, prompted by IMLS
Connecting to Collections grant announcement - How might the existing evidence base be used to
focus regional preservation investments? - Based on January 2007 snapshot of WorldCat
database 13M records for titles (6.95M print
books) uniquely held by ARL institutions 300
OCLC symbols 123 institutions - Iterative analysis examined relative impact of
theses/dissertations and recent imprints on
system-wide uniqueness regional and
institutional distribution of holdings - Findings shared with ARL Special Collections
Working Group (October 2007) and selected RLG
partner institutions (UC CIC ReCAP Harvard
ASU NYU) - Heritage Preservation willing to share Heritage
Health survey data for cross-tabulation on
as-needed basis
9Limitations
- Current studies limited to printed books
excludes serials, special collections only a
partial measure of uniqueness in system-wide
collection - Incomplete representation of world book
collection for non-English titles especially,
uniqueness of North American holdings is only
relative - Cataloging backlogs of up to 5 years mean that
holdings for recent acquisitions are imperfectly
reflected - Incomplete coverage of rare books and special
collections prior to (ongoing) integration of RLG
Union Catalog
10Our findings distribution of unique titles
- Research and academic libraries hold gt70 of
aggregate unique print book collection - while value and utility of these holdings may be
widely distributed across the library community,
holdings are concentrated at institutions with a
research / teaching / learning mandate - limited data on aggregate use, sources of demand
- Institutional distribution of unique holdings is
highly skewed, with a handful of libraries
holding a majority share of collective assets - ARL unique print book holdings range from 400
600K titles per institution median holdings
19K titles - generally, institutions with large collections
hold more unique materials but absolute size of
collection is not an indicator of relative
uniqueness
11Based on a randomly selected sample of 250
uniquely-held print book titles in WorldCat (Jan.
2007)
12Unique Print Books in ARL Institutions
CRLs focus on theses and dissertations is
evident most uniqueness is attributable to
these holdings
Institutions with younger collections, actively
seeking to increase scope of coverage - NCSU,
Temple are building uniqueness in new titles
13Content-type Distributions CRL and ARL
Intrinsically unique content, only copies
May include first copies in cataloging queue
uniqueness subject to rapid erosion
14Our findings levels of uniqueness
- 60 of titles represent unique works
- Ex Report and recommendation on a proposed
loan equivalent to US70 million to the
Islamic Republic of Pakistan for a power plant
efficiency improvement project (1987) World
Bank report held by George Washington University - 15 of titles represent unique manifestations
- Ex. Gallipolis an account of the French five
hundred and of the town they established
compiled by Workers of the Writers' program of
the Work projects administration (1940)
microform pamphlet held by Yale University
related manifestations at 40 libraries - 5 of titles represent unique expressions
- Ex E.J. Luck. A pedigree of the families Luck,
Lock and Lee (1908) book held by Masssanutten
Regional Library, VA similar title (Luck, Lock)
by same author, pubd in 1900, held at LC - 20 of titles not unambiguously unique
duplicate or near-duplicate records can be found
in WorldCat - Ex K. Kimura. Edo no akebono (1956) book held
by Harvard Yenching apparent duplicate
(cataloged with original scripts) held by Waseda,
Yale
15Our findings content characterization
- Material types
- 35 are books (gt50pp)
- most appear to be non-fiction titles, less likely
to have additional manifestations - 20 theses and dissertations
- many at Masters level unlikely to be held
beyond issuing institution - 15 government documents
- mostly federal and state, may be duplicated in
depositories - 10 pamphlets
- unique content, but rarely useful in isolation
- 10 analytics single articles or issues bound
as a separate volume - non-unique content
- lt5 early imprints
- lost treasures?
- Small numbers of by-laws, scripts, legal briefs,
minutes, etc.
16Implications
- Institutions with significant unique holdings may
benefit from splitting the difference between
unique works and manifestations - unique manifestations and analytics should be
judged with an eye to provenance history unless
they contribute to local distinctiveness,
immediate action may not be warranted - A preliminary sort by material type may help
guide local decision-making regarding the
physical disposition of unique holdings - pamphlets and technical reports may be
candidates for cataloging enhancement and storage
transfer books may be short-listed for
digitization and/or transfer to special
collections - Institutions with smaller unique print book
collections may benefit from collective action to
aggregate supply (through effective disclosure)
and demand (through special resource-sharing and
digitization initiatives) around specific topical
and disciplinary interests - local collections gain in significance when
presented in context with related holdings
17Recommendations
- Adopt a nuanced understanding of relative
uniqueness when assessing local holdings - Unique manifestations may not represent unique
intellectual content, but may have other value - As artifacts ? special collections
- As a networked resource ? increased availability
- Unique works may gain relevance and value when
presented as part of a larger disciplinary or
topical collection - Theses and dissertations may benefit from special
discovery tools, integration in local scholarly
communications initiatives - Pamphlets and technical reports may be virtually
aggregated for specific communities of use - Maximize disclosure of unique holdings to
increase their impact and value - Focus on use and utility of unique holdings to
ensure long-term preservation, enduring value to
parent institution
18Whats Next . . .
- Holdings validation study will examine a sample
of scarcely-held (lt5 copies) US imprints in
North-American research libraries - Compare current WorldCat holdings to historical
holdings looking for signs of collection
erosion elimination of local backlogs
(diminishing uniqueness) - Compare local holdings to current WorldCat
holdings location changes/storage transfers,
withdrawals - Assess impact of local preservation actions on
system-wide holdings (availability, condition)
and potential value of full disclosure - Collaborative effort with RLG partner
institutions anticipated Spring/Summer 2008
19Some closing observations
- Opportunities
- Large research libraries hold a wealth of unique
materials long tail resources with broad
potential audience - Aggregated bibliographic data supports
programmatic analysis and enrichment work-level
clustering, identification of duplicates - Largest institutions, with enduring commitments
to retention and access, hold majority of
potential at risk titles - Challenges
- Libraries ill-equipped to measure potential
demand for unique holdings - Technical and social infrastructure for
aggregating supply is lacking - University presses are potential distribution
partners, but alliances are weak
20Questions, Comments?
- Managing the Collective Collection work agenda
- Data-mining for management intelligence
- Shared print collections
- http//www.oclc.org/programs/ourwork/collectiveco
ll - Midwinter RLG Update Session
- 130-330
- Marriott 302-304
- Contact
- Constance Malpas
- Program Officer
- malpasc_at_oclc.org
-
21Median institutional holdings 96k unique titles
N5.9M titles