Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources

Description:

Consistency and data 'quality' is ensured by using XML Schema Definitions (XSD) ... Assignment: Use the site in preparing a lesson plan (high school social studies) ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 33
Provided by: timothywco5
Category:

less

Transcript and Presenter's Notes

Title: Using OAI-PMH to Aggregate Metadata Describing Cultural Heritage Resources


1
Using OAI-PMH to Aggregate Metadata Describing
Cultural Heritage Resources
ALA/CLA Annual Meeting22 June 2003Toronto, CA
  • Timothy W. Cole (t-cole3_at_uiuc.edu)University of
    Illinois at Urbana-Champaign
  • http//dli.grainger.uiuc.edu/Publications/TWCole/A
    LA2003OAI/

2
Order of Presentation
  • Perspectives on OAI-PMH
  • Illinois OAI metadata harvesting project
  • Goals objectives
  • Findings regarding metadata
  • Findings regarding search discovery
  • New OAI projects at Illinois
  • IMLS digital collections content
  • CIC OAI metadata harvesting project

3
OAI Protocol for Metadata Harvesting
  • Harvesting approachto interoperabilityat
    metadata level
  • Divides world intoMetadata Providers Service
    Providers
  • Builds on HTTP,XML, Dublin Core
  • http//www.openarchives.org/

4
OAI Antecedents
  • Call to other E-Print archives (July 1999)
  • Paul Ginsparg, Rick Luce, Herbert Von de
    Sompel
  • mobilize core group to work towards achieving
    a
  • universal service for author self-archived
    scholarly literature.
  • Santa Fe Mtgs. (Oct. 1999 June 2000)
  • OAI PMH version history
  • First Alpha Release, Sept. 2000
  • 1.0 (Beta) Release January 2001
  • 1.1 (Beta 2) Release July 2001
  • 2.0 (Production) Release June 2002

5
Original OAI Organization
  • OAI Executive
  • Carl Lagoze Herbert Van de Sompel
  • OAI Steering Committee
  • Co-Chairs Dan Greenstein, Cliff Lynch
  • OAI Technical Committee
  • Funded by NSF, DLF CNI
  • Seeks to be user community driven

6
OAI-PMH as a tool
  • All about moving metadata around
  • Designed to be a building block, useable by many
    different communities
  • Can facilitate (in some cases enable) services
    functions
  • Assumes widely distributed content,
    butcentralized indexing(!) services
  • Build once, use for many applications
  • Focus of OAI is interoperability

7
Harvesting vs. Broadcast
  • Competing approaches to interoperability
  • Distributed/Broadcast searching search and
    discovery over remote services and data
  • Harvesting is when data/metadata is transferred
    from the remote source to the destination where
    search discovery services are located (e.g.
    Union catalogs)
  • OAI-PMH is a harvesting protocol

8
As Compared to Z39.50
Z39.50 OAI
Content (Objects) Distributed Distributed
World View Bibliographic Bibliographic
Object Presentation Data provider Data provider

Searching is Distributed Centralized
Search done by Data provider Service provider
Metadata searched is Up to date Stale
Semantic Mapping When searching Metadata delivery
9
Metadata vs. Resources
  • Resource refers to information objects or digital
    representations of information objects
  • Metadata item is a collection of properties about
    a resource (e.g. title, author, etc.)
  • Metadata record is a metadata item expressed in a
    specific syntax according to an XSD
  • OAI focuses on metadata, with the implicit
    understanding that metadata contains useful links
    to the source information object(s)

10
Data and Service Providers
  • Data Providers (Repositories) refer to entities
    who possess resources metadata and are willing
    to share metadata with others via well-defined
    OAI protocols
  • Service Providers (Harvesters) are entities who
    harvest metadata from Data Providers in order to
    supply higher-level services to users (e.g.
    search discovery)
  • OAI uses these denotations for its client/server
    model (dataserver, serviceclient)

11
Reliance on HTTP XML
  • OAI-PMH is a REpresentational State Transfer
    (REST) protocol (unlike RPC, SOAP)
  • OAI requests and responses are sent via the HTTP
    protocol
  • OAI Requests are encoded as HTTP GET or POST
    operations
  • OAI Responses are valid XML documents

12
XML Namespaces and Schema
  • Consistency and data quality is ensured by
    using XML Schema Definitions (XSD) for all
    responses
  • XML Namespaces are used where necessary to
    clearly define which parts of the responses are
    actual metadata and which support the Metadata
    Harvesting Protocol

13
OAI-PMH Use of Dublin Core
  • DC is OAIs lowest common denominator
  • OAI supports encourages use of other,
    community-driven metadata schemas
  • Typically, metadata provider stores metadata in
    best schema as dictated by material resources
  • Crosswalk (semantic mapping) to simpler schemas
  • Semantic mapping at metadata delivery (rather
    than at time of search)
  • As with Z39.50, cant search for whats not there

14
When to use OAI-PMH
  • Metadata is sufficient for services desired
  • Normalization, dedupping, metadata augmentation
    desired
  • Content is widely distributed across small,
    non-Z39.50 enabled repositories
  • OAI-PMH is more lightweight than Z39.50
  • Portals can use BOTH Z39.50 OAI-PMH

15
What OAI-PMH Is Not
  • Not search discovery on its own
  • Not a database management system
  • Not a single metadata schema
  • Not OAIS

16
How OAI Works
  • OAI VERBS
  • Identify
  • ListMetadataFormats
  • ListSets
  • ListIdentifiers
  • ListRecords
  • GetRecord

Service Provider Metadata Provider
H A R VESTER
REPOSITORY
OAI
OAI
HTTP Request
(OAI Verb)
HTTP Response
(Valid XML)
17
OAI Provider Architectures
Descriptive Metadata
OAI Administrative Metadata, e.g., Ids,
datestamps, sets, formats
OAI Harvesters
18
A few projects using OAI-PMH
  • Basic building block of the National Science
    Digital Library
  • Large-scale implementations in E-Prints, OLAC,
    NDLTD,
  • Built into ENCompass, ContentDM, Michigans DLXS,
    D-Space, and other products
  • Open Archives Forum in Europe will be part of
    federation activities in the UK and EU

19
Univ. of Illinois OAI Metadata Harvesting Project
  • Funded by Andrew W. Mellon Foundation(July 2001
    May 2003)
  • Primary objectives
  • Develop make available OAI harvesting tools
  • Build search services for aggregated metadata in
    the domain of cultural heritage
  • Examine metadata aggregation issues, including
    use of EAD in OAI context
  • Investigate utility of aggregated metadata,
    including preliminary testing with end-users

20
Type of resources
  • 39 data providers
  • academic libraries
  • Museums / cultural orgs
  • digital libraries
  • public library
  • 1.1 million original DC records
  • 1.5 million derived from EAD

21
Variations in DC element usage
  • Records containing subject description element

SUBJECT DESCRIPTION
Digital libraries (10 total, 122,719 records) 78 36
Museums, hist. societies, etc. (6 total, 255,800 records) 93 93
Academic libraries (7 total, 235,294 records) 15 13
  • Many different controlled and local vocabularies
    in use
  • Granularity a record may describe a collection
    of coins or one coin

22
Excerpt of a metadata record describing a cotton
coverlet
  • Description Digital image of a single-sized
    cotton coverlet for a bed with embroidered
    butterfly design. Handmade by Anna F. Ginsberg
    Hayutin.
  • Source Materials cotton and embroidery floss.
    Dimensions 71 in. x 86 in. Markings top right
    hand corner has 1 1/2 in. x 1/2 in. label cut
    outs at upper left and right hand side for head
    board fabric is woven in a variation of a rib
    weave color each of yellow and gray
    hand-embroidered cotton butterflies and flowers
    from two shades of each color of embroidery floss
    - blue, pink, green and purple and single top 20
    in. bordered with blue and black cotton
    embroidery thread stitches used for embroidery
    running stitch, chain stitch, French knot and
    back stitches selvage edges left unfinished
    lower edges turned under and finished with large
    gray running stitches made with embroidery floss.
  • Format Epson Expression 836 XL Scanner with
    Adobe Photoshop version 5.5 300 dpi 21-53K
    bytes. Available via the World Wide Web.
  • Coverage
  • Date Created 2001-09-19 094518 Updated
    20011107162451 Created 2001-04-05 Created
    1912-1920?
  • Type Image

23
Excerpt of a metadata record describing "American
woven coverlet
  • Description Materials Textile--Multi,
    PigmentDye Manufacturing Process
    Weaving--Hand, Spinning, Dyeing, Hand-loomed blue
    wool and white linen coverlet, worked in overshot
    weave in plain geometric variant of a
    checkerboard pattern.Coverlet is constructed from
    finely spun, indigo-dyed wool and undyed linen,
    woven with considerable skill. Although the
    pattern is simpler, the overall craftsmanship is
    higher than 1934.01.0094A. - D. Schrishuhn,
    11/19/99 This coverlet is an example of early
    "overshot" weaving construction, probably dating
    to the 1820's and is not attributable to any
    particular weaver. -- Georgette Meredith,
    10/9/1973
  • Source
  • Format 228 x 169 x 1.2 cm (1,629 g)
  • Coverage Euro-American America, North United
    States Indiana? Illinois?
  • Date Early 19th c. CE
  • Type cultural physical object original

24
Implications
  • Service providers
  • Automatically normalize metadata encoding where
    possible (e.g., dates)
  • Normalize for and co-locate by type / format
    where possible
  • Metadata providers
  • Create metadata for interoperability
  • Consider more expressive schema e.g., Qualified
    DC, MARC

25
Original interface
  • Portal had two search pagessimple (keyword) and
    advanced.

26
(No Transcript)
27
Pilot study with student teachers
  • 23 users in honors-level CI class
  • Assignment Use the site in preparing a lesson
    plan (high school social studies)
  • __________
  • Introduced to aggregated metadata concept
  • Focus group interviews conducted
  • Students papers examined
  • Transaction logs analyzed

28
Results of initial user testing
  • 1. Users expected all links pointed to digital
    objects
  • Some records pointed to finding aids
  • Some records pointed to collections web site
  • Some records described analog objects
  • 2. Users unable to make use of search results
  • Simple searches produced 1000s of unranked
    results
  • Advanced search (with limits) rarely used
  • 3. Distinction between portal and data providers
    unimportant to users

29
What does online access mean?
  • To librarian curator
  • To student teacher

30
Response to test results
  • EAD-derived records segregated
  • Analog only collections excluded
  • Categories of resource types reduced to 3
  • Images and Video
  • Text, Sheet Music, and Websites
  • Museums and Archival Collections

31
Revised interface
  • Simple keyword advanced searchput on one page
  • Clarify online access
  • Natural language in Boolean operators

32
Revised search results
  • Link goes to finding aid or collection page?
    Learn more.
  • Link displays object? View item.
  • Subj/Desc expanded

33
IMLS Digital Collections Content
  • Build a registry of all National Leadership Grant
    collections with digital content.
  • Assist and guide NLG projects in making
    item-level metadata sharable using OAI.
  • Build a repository and search discovery tools
    for integrated access to the content of NLG
    collections (unique metadata schema?).
  • Research best practices for sharing metadata
    about diverse digital content and for supporting
    the interests of diverse user communities.

34
http//imlsdcc.grainger.uiuc.edu/
35
CIC OAI metadata harvesting
  • Univ. of Illinois at UC will host an OAI-PMH
    metadata harvesting service for 10 CIC libraries
  • Project Goals (3 year experimentation phase)
  • Improve access to selected resources at CIC
    libraries
  • Advertise these resources (internally
    externally)
  • Prepare member institutions for future
    grant-mandated OAI-based resource sharing
  • Serve as a useful testbed for experimentation
    with OAI-PMH, development of metadata best
    practices, usability and user needs testing, etc.

36
Using OAI-PMH to Aggregate Metadata Describing
Cultural Heritage Resources
  • http//dli.grainger.uiuc.edu/Publications/TWCole/A
    LA2003OAI/
  • Timothy W. Cole (t-cole3_at_uiuc.edu)University of
    Illinois at Urbana-Champaign
Write a Comment
User Comments (0)
About PowerShow.com