Using the OAI Protocols with EAD JCDL 02 Portland, Oregon July 16, 2002 - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Using the OAI Protocols with EAD JCDL 02 Portland, Oregon July 16, 2002

Description:

Portland, Oregon. July 16, 2002. Christopher J. Prom. Assistant University Archivist ... Research Programmer. University of Illinois Engineering Library ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 21
Provided by: ahxa5
Category:
Tags: ead | jcdl | oai | july | oregon | portland | protocols | using

less

Transcript and Presenter's Notes

Title: Using the OAI Protocols with EAD JCDL 02 Portland, Oregon July 16, 2002


1
Using the OAI Protocols with EADJCDL
02Portland, OregonJuly 16, 2002
  • Christopher J. Prom
  • Assistant University Archivist
  • University of Illinois at Urbana-Champaign
  • prom_at_uiuc.edu
  • Thomas G. Habing
  • Research Programmer
  • University of Illinois Engineering Library
  • thabing_at_uiuc.edu

2
Why OAI for Cultural Heritage?
  • Plethora of DL projects for archives,
    manuscripts, photos, artifacts, objects
  • Encoded in different metadata standards
  • Research difficulties of humanities scholars,
    students
  • An interoperability protocol based on metadata
    harvesting, not distributed searching
  • Allows metasearches across projects and data
    types
  • OAI use growing, well supported, simple

3
UIUC Mellon Project Goals
  • Test feasibility of harvesting, searching
    cultural heritage with OAI
  • Develop data provider tools that produce usable
    OAI records from disparate sources
  • Build service provider tools for storage and
    retrieval of cultural heritage metadata
  • Re EAD
  • assess structural problems in mapping to OAI
  • develop an effective crossmapping
  • allow basic searching in an OAI environment
  • test effectiveness of the search
  • provide proof of concept

4
EAD Background
  • For finding aids, not individual documents (but
    can link to)
  • Collective description, lteadheadergt, ltarchdescgt
  • Can describe one document or millions, large file
    sizes possible
  • Very complex metadata structure, raises mapping
    questions

ARCHDESC TOP LEVEL SUMMARY ltarchdescgt ltdidgt
ltaddgt ltadmininfogt ltarrangementgt
ltbioghistgt ltcontrolaccessgt ltdaogt and
ltdaogrpgt ltnotegt ltoddgt ltorganizationgt
ltscopecontentgt
5
EAD Background
  • Multilevel Description
  • ltdscgt description of subordinate components
  • hierarchical ltc01gt, ltco2gt. . . ltc12gt
  • can include many tags in varied nesting very
    flexible DTD
  • Wide range of possible tagging practices--encoding
    standards vary by institution

ltc02gt ltdidgtltcontainer type"box"gt23lt/containe
rgt ltunittitlegtCorrespondence to J. R. R.
Tolkien, ltunitdategt1945lt/unitdategtlt/unittitlegt
ltphysdescgtincludes 21
letterslt/physdescgt lt/didgt lt/c02gt vs. ltc02
level"subseries"gt ltdidgtltcontainer
type"box"gt23lt/containergt ltunittitlegtTolkien, J.
R. R. (John Ronald Reuel),ltunitdategt1945lt/unitdate
gtlt/unittitlegt lt/didgt ltc03 level"file"gtltdidgt lt
physdescgt21 letters.lt/physdescgtlt/didgtlt/c03gt lt/c02gt
6
EAD/OAI Problems and Issues
  • Providing full context Mining the ltdscgt
  • Hierarchical inheritance in ltdscgt, what to
    preserve

ltarchdesc type"inventory" level"collection"gt ltdi
d id"a1"gtltunittitlegtIrene Gomez-Bethke
paperslt/unittitlegtltunitdategt1970-1993.lt/unitdategtlt
/didgt ltc01gt ltdidgtltunittitlegtHispanic
Organizations in Minnesotalt/unittitlegtlt/didgt
other c02 levels not shown ltc02gt
ltdidgtltphyslocgt151.H.1.1Blt/physlocgtltcontainergt1lt/
containergtltunittitlegtArchdiocesan Office of
Hispanic Ministrylt/unittitlegtlt/didgt
other c03 levels not shown ltc03gt
ltdidgtltunittitlegtHispanic Ministry Advisory
Boardlt/unittitlegtlt/didgt ltscopecontentgtltpgt
Advised thearchbishop.lt/pgtlt/scopecontentgt
other c04 levels not shown
ltc04gt ltdidgtltunittitlegtMinutes
of Board Meetings, 1986-1989.lt/unittitlegtlt/didgt
lt/c04gt lt/c03gt
lt/c02gt lt/c01gt lt/archdescgt
7
More Problems/Issues
  • What level of materials are being described
    inconsistent use of level attribute
  • Flexibility in DTD and encoding practices
  • Lack of standardization, little use of content
    standards like APPM, LCSH, LCNAF, inconsistent
    date styles, name conventions.
  • Everything is permissible, not everything is
    beneficial

8
Our Approach
  • Assumption Dumbing down metadata has benefits
  • Examine Encoding Standards
  • Mitigate inconsistent encoding practices
  • Generate multiple OAI records for one EAD
  • ltarchdescgt top level record
  • mini records from ltdscgt, with relation to top
    level
  • Preserve context for hits by linking user to
    finding aid in the search/retrieval mechanism

9
Recommended Crosswalk
  • To Simple Dublin Core
  • Top Level
  • flexible mapping, draws from lteadheadergt and
    (mainly) ltarchdescgt
  • Key fields identifier, title, date, type,
    description, subjects, relation
  • ltdscgt
  • provides metadata for the box listing
  • OAI records separate but related to top level
  • use of Xpointer to provide context
  • replicates hierarchical structure of EAD

10
XPointers
  • W3C Candidate Recommendation (2001-09-11)
  • Can identify XML fragments using a superset of
    the XPath syntax
  • xpointer(//dsc1/c012/c023/c0310))
  • When EADs are split into their subordinate
    components, XPointers are used to identify the
    individual parts and link them together
  • Server-side scripts use the XPointers for
    rendering and linking

11
  • ltrdfRDFgt ltrdfDescriptiongt
    ltdcidentifiergt http////test.xmlxpointer
    (//dsc1/c018/c025/c03244)
    lt/dcidentifiergt ltdctitlegtToensing,
    Richardlt/dctitlegt ltdctypegttextlt/dctypegt
  • ltdctypegtarchives or manuscriptslt/dctypegt
  • ltdctypegtfilelt/dctypegt
  • ltdctermsisPartOfgt
  • ltrdfDescriptiongt
  • ltdcidentifiergt
    http////test.xmlxpointer(//dsc1/c018/c025
    ) lt/dcidentifiergt
  • ltdctitlegtVarious Composerslt/dctitlegt
  • lt/rdfDescriptiongt
  • lt/dctermsisPartOfgt
  • lt/rdfDescriptiongt
  • lt/rdfRDFgt

12
Indexing and Retrieval Issues
  • Variations in consistency and quality of source
    EAD markup--currently being tested
  • Size of EAD finding aids 1 EAD can result on
    1000s of DC records (many of marginal
    usefulness, but some very useful)
  • Frequently occurring search term can overwhelm
    results list with many parts of one EAD

13
Performance Issues
  • Splitting many EADs can be a time consuming batch
    process
  • Many marginally useful DC files can effect search
    performance--need a logic to discard these
    records
  • Disk space requirements can be large

14
Simple Searchhttp//oai.grainger.uiuc.edu/oai/sea
rch
15
Advanced Search
16
Search Results
17
Full Record
18
Hit in Context of Finding Aid
19
Outstanding Issues
  • Improved display of multiple search hits within a
    single EAD
  • Summary and detail views or hierarchical
  • Improve display of hit in context of finding aid
  • Filter out superfluous subordinate components
  • Normalization of various EAD data elements

20
Using the OAI Protocols with EADJCDL
02Portland, OregonJuly 16, 2002
  • Christopher J. Prom
  • Assistant University Archivist
  • University of Illinois at Urbana-Champaign
  • prom_at_uiuc.edu
  • Thomas G. Habing
  • Research Programmer
  • University of Illinois Engineering Library
  • thabing_at_uiuc.edu
Write a Comment
User Comments (0)
About PowerShow.com