Title: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources
1Using OAI-PMH to Aggregate Metadata Describing
Cultural Heritage Resources
ALA/CLA Annual Meeting22 June 2003Toronto, CA
- Timothy W. Cole (t-cole3_at_uiuc.edu)University of
Illinois at Urbana-Champaign - http//dli.grainger.uiuc.edu/Publications/TWCole/A
LA2003OAI/
2Order of Presentation
- Perspectives on OAI-PMH
- Illinois OAI metadata harvesting project
- Goals objectives
- Findings regarding metadata
- Findings regarding search discovery
- New OAI projects at Illinois
- IMLS digital collections content
- CIC OAI metadata harvesting project
3OAI Protocol for Metadata Harvesting
- Harvesting approachto interoperabilityat
metadata level - Divides world intoMetadata Providers Service
Providers - Builds on HTTP,XML, Dublin Core
- http//www.openarchives.org/
4OAI Antecedents
- Call to other E-Print archives (July 1999)
- Paul Ginsparg, Rick Luce, Herbert Von de
Sompel - mobilize core group to work towards achieving
a - universal service for author self-archived
scholarly literature. - Santa Fe Mtgs. (Oct. 1999 June 2000)
- OAI PMH version history
- First Alpha Release, Sept. 2000
- 1.0 (Beta) Release January 2001
- 1.1 (Beta 2) Release July 2001
- 2.0 (Production) Release June 2002
5Original OAI Organization
- OAI Executive
- Carl Lagoze Herbert Van de Sompel
- OAI Steering Committee
- Co-Chairs Dan Greenstein, Cliff Lynch
- OAI Technical Committee
- Funded by NSF, DLF CNI
- Seeks to be user community driven
6OAI-PMH as a tool
- All about moving metadata around
- Designed to be a building block, useable by many
different communities - Can facilitate (in some cases enable) services
functions - Assumes widely distributed content,
butcentralized indexing(!) services - Build once, use for many applications
- Focus of OAI is interoperability
7Harvesting vs. Broadcast
- Competing approaches to interoperability
- Distributed/Broadcast searching search and
discovery over remote services and data - Harvesting is when data/metadata is transferred
from the remote source to the destination where
search discovery services are located (e.g.
Union catalogs) - OAI-PMH is a harvesting protocol
8As Compared to Z39.50
Z39.50 OAI
Content (Objects) Distributed Distributed
World View Bibliographic Bibliographic
Object Presentation Data provider Data provider
Searching is Distributed Centralized
Search done by Data provider Service provider
Metadata searched is Up to date Stale
Semantic Mapping When searching Metadata delivery
9Metadata vs. Resources
- Resource refers to information objects or digital
representations of information objects - Metadata item is a collection of properties about
a resource (e.g. title, author, etc.) - Metadata record is a metadata item expressed in a
specific syntax according to an XSD - OAI focuses on metadata, with the implicit
understanding that metadata contains useful links
to the source information object(s)
10Data and Service Providers
- Data Providers (Repositories) refer to entities
who possess resources metadata and are willing
to share metadata with others via well-defined
OAI protocols - Service Providers (Harvesters) are entities who
harvest metadata from Data Providers in order to
supply higher-level services to users (e.g.
search discovery) - OAI uses these denotations for its client/server
model (dataserver, serviceclient)
11Reliance on HTTP XML
- OAI-PMH is a REpresentational State Transfer
(REST) protocol (unlike RPC, SOAP) - OAI requests and responses are sent via the HTTP
protocol - OAI Requests are encoded as HTTP GET or POST
operations - OAI Responses are valid XML documents
12XML Namespaces and Schema
- Consistency and data quality is ensured by
using XML Schema Definitions (XSD) for all
responses - XML Namespaces are used where necessary to
clearly define which parts of the responses are
actual metadata and which support the Metadata
Harvesting Protocol
13OAI-PMH Use of Dublin Core
- DC is OAIs lowest common denominator
- OAI supports encourages use of other,
community-driven metadata schemas - Typically, metadata provider stores metadata in
best schema as dictated by material resources - Crosswalk (semantic mapping) to simpler schemas
- Semantic mapping at metadata delivery (rather
than at time of search) - As with Z39.50, cant search for whats not there
14When to use OAI-PMH
- Metadata is sufficient for services desired
- Normalization, dedupping, metadata augmentation
desired - Content is widely distributed across small,
non-Z39.50 enabled repositories - OAI-PMH is more lightweight than Z39.50
- Portals can use BOTH Z39.50 OAI-PMH
15What OAI-PMH Is Not
- Not search discovery on its own
- Not a database management system
- Not a single metadata schema
- Not OAIS
16How OAI Works
- OAI VERBS
- Identify
- ListMetadataFormats
- ListSets
- ListIdentifiers
- ListRecords
- GetRecord
Service Provider Metadata Provider
H A R VESTER
REPOSITORY
OAI
OAI
HTTP Request
(OAI Verb)
HTTP Response
(Valid XML)
17OAI Provider Architectures
Descriptive Metadata
OAI Administrative Metadata, e.g., Ids,
datestamps, sets, formats
OAI Harvesters
18A few projects using OAI-PMH
- Basic building block of the National Science
Digital Library - Large-scale implementations in E-Prints, OLAC,
NDLTD, - Built into ENCompass, ContentDM, Michigans DLXS,
D-Space, and other products - Open Archives Forum in Europe will be part of
federation activities in the UK and EU
19Univ. of Illinois OAI Metadata Harvesting Project
- Funded by Andrew W. Mellon Foundation(July 2001
May 2003) - Primary objectives
- Develop make available OAI harvesting tools
- Build search services for aggregated metadata in
the domain of cultural heritage - Examine metadata aggregation issues, including
use of EAD in OAI context - Investigate utility of aggregated metadata,
including preliminary testing with end-users
20Type of resources
- 39 data providers
- academic libraries
- Museums / cultural orgs
- digital libraries
- public library
- 1.1 million original DC records
- 1.5 million derived from EAD
21Variations in DC element usage
- Records containing subject description element
SUBJECT DESCRIPTION
Digital libraries (10 total, 122,719 records) 78 36
Museums, hist. societies, etc. (6 total, 255,800 records) 93 93
Academic libraries (7 total, 235,294 records) 15 13
- Many different controlled and local vocabularies
in use - Granularity a record may describe a collection
of coins or one coin
22Excerpt of a metadata record describing a cotton
coverlet
- Description Digital image of a single-sized
cotton coverlet for a bed with embroidered
butterfly design. Handmade by Anna F. Ginsberg
Hayutin. - Source Materials cotton and embroidery floss.
Dimensions 71 in. x 86 in. Markings top right
hand corner has 1 1/2 in. x 1/2 in. label cut
outs at upper left and right hand side for head
board fabric is woven in a variation of a rib
weave color each of yellow and gray
hand-embroidered cotton butterflies and flowers
from two shades of each color of embroidery floss
- blue, pink, green and purple and single top 20
in. bordered with blue and black cotton
embroidery thread stitches used for embroidery
running stitch, chain stitch, French knot and
back stitches selvage edges left unfinished
lower edges turned under and finished with large
gray running stitches made with embroidery floss. - Format Epson Expression 836 XL Scanner with
Adobe Photoshop version 5.5 300 dpi 21-53K
bytes. Available via the World Wide Web. - Coverage
- Date Created 2001-09-19 094518 Updated
20011107162451 Created 2001-04-05 Created
1912-1920? - Type Image
23Excerpt of a metadata record describing "American
woven coverlet
- Description Materials Textile--Multi,
PigmentDye Manufacturing Process
Weaving--Hand, Spinning, Dyeing, Hand-loomed blue
wool and white linen coverlet, worked in overshot
weave in plain geometric variant of a
checkerboard pattern.Coverlet is constructed from
finely spun, indigo-dyed wool and undyed linen,
woven with considerable skill. Although the
pattern is simpler, the overall craftsmanship is
higher than 1934.01.0094A. - D. Schrishuhn,
11/19/99 This coverlet is an example of early
"overshot" weaving construction, probably dating
to the 1820's and is not attributable to any
particular weaver. -- Georgette Meredith,
10/9/1973 - Source
- Format 228 x 169 x 1.2 cm (1,629 g)
- Coverage Euro-American America, North United
States Indiana? Illinois? - Date Early 19th c. CE
- Type cultural physical object original
24Implications
- Service providers
- Automatically normalize metadata encoding where
possible (e.g., dates) - Normalize for and co-locate by type / format
where possible - Metadata providers
- Create metadata for interoperability
- Consider more expressive schema e.g., Qualified
DC, MARC
25Original interface
- Portal had two search pagessimple (keyword) and
advanced.
26(No Transcript)
27Pilot study with student teachers
- 23 users in honors-level CI class
- Assignment Use the site in preparing a lesson
plan (high school social studies) - __________
- Introduced to aggregated metadata concept
- Focus group interviews conducted
- Students papers examined
- Transaction logs analyzed
28Results of initial user testing
- 1. Users expected all links pointed to digital
objects - Some records pointed to finding aids
- Some records pointed to collections web site
- Some records described analog objects
- 2. Users unable to make use of search results
- Simple searches produced 1000s of unranked
results - Advanced search (with limits) rarely used
- 3. Distinction between portal and data providers
unimportant to users
29What does online access mean?
30Response to test results
- EAD-derived records segregated
- Analog only collections excluded
- Categories of resource types reduced to 3
- Images and Video
- Text, Sheet Music, and Websites
- Museums and Archival Collections
31Revised interface
- Simple keyword advanced searchput on one page
- Clarify online access
- Natural language in Boolean operators
32Revised search results
- Link goes to finding aid or collection page?
Learn more. - Link displays object? View item.
- Subj/Desc expanded
33IMLS Digital Collections Content
- Build a registry of all National Leadership Grant
collections with digital content. - Assist and guide NLG projects in making
item-level metadata sharable using OAI. - Build a repository and search discovery tools
for integrated access to the content of NLG
collections (unique metadata schema?). - Research best practices for sharing metadata
about diverse digital content and for supporting
the interests of diverse user communities.
34http//imlsdcc.grainger.uiuc.edu/
35CIC OAI metadata harvesting
- Univ. of Illinois at UC will host an OAI-PMH
metadata harvesting service for 10 CIC libraries - Project Goals (3 year experimentation phase)
- Improve access to selected resources at CIC
libraries - Advertise these resources (internally
externally) - Prepare member institutions for future
grant-mandated OAI-based resource sharing - Serve as a useful testbed for experimentation
with OAI-PMH, development of metadata best
practices, usability and user needs testing, etc.
36Using OAI-PMH to Aggregate Metadata Describing
Cultural Heritage Resources
- http//dli.grainger.uiuc.edu/Publications/TWCole/A
LA2003OAI/ - Timothy W. Cole (t-cole3_at_uiuc.edu)University of
Illinois at Urbana-Champaign