Title: Using the OAI Protocols with EAD JCDL 02 Portland, Oregon July 16, 2002
1Using the OAI Protocols with EADJCDL
02Portland, OregonJuly 16, 2002
- Christopher J. Prom
- Assistant University Archivist
- University of Illinois at Urbana-Champaign
- prom_at_uiuc.edu
- Thomas G. Habing
- Research Programmer
- University of Illinois Engineering Library
- thabing_at_uiuc.edu
2Why OAI for Cultural Heritage?
- Plethora of DL projects for archives,
manuscripts, photos, artifacts, objects - Encoded in different metadata standards
- Research difficulties of humanities scholars,
students - An interoperability protocol based on metadata
harvesting, not distributed searching - Allows metasearches across projects and data
types - OAI use growing, well supported, simple
3UIUC Mellon Project Goals
- Test feasibility of harvesting, searching
cultural heritage with OAI - Develop data provider tools that produce usable
OAI records from disparate sources - Build service provider tools for storage and
retrieval of cultural heritage metadata - Re EAD
- assess structural problems in mapping to OAI
- develop an effective crossmapping
- allow basic searching in an OAI environment
- test effectiveness of the search
- provide proof of concept
4EAD Background
- For finding aids, not individual documents (but
can link to) - Collective description, lteadheadergt, ltarchdescgt
- Can describe one document or millions, large file
sizes possible - Very complex metadata structure, raises mapping
questions
ARCHDESC TOP LEVEL SUMMARY ltarchdescgt ltdidgt
ltaddgt ltadmininfogt ltarrangementgt
ltbioghistgt ltcontrolaccessgt ltdaogt and
ltdaogrpgt ltnotegt ltoddgt ltorganizationgt
ltscopecontentgt
5EAD Background
- Multilevel Description
- ltdscgt description of subordinate components
- hierarchical ltc01gt, ltco2gt. . . ltc12gt
- can include many tags in varied nesting very
flexible DTD - Wide range of possible tagging practices--encoding
standards vary by institution
ltc02gt ltdidgtltcontainer type"box"gt23lt/containe
rgt ltunittitlegtCorrespondence to J. R. R.
Tolkien, ltunitdategt1945lt/unitdategtlt/unittitlegt
ltphysdescgtincludes 21
letterslt/physdescgt lt/didgt lt/c02gt vs. ltc02
level"subseries"gt ltdidgtltcontainer
type"box"gt23lt/containergt ltunittitlegtTolkien, J.
R. R. (John Ronald Reuel),ltunitdategt1945lt/unitdate
gtlt/unittitlegt lt/didgt ltc03 level"file"gtltdidgt lt
physdescgt21 letters.lt/physdescgtlt/didgtlt/c03gt lt/c02gt
6EAD/OAI Problems and Issues
- Providing full context Mining the ltdscgt
- Hierarchical inheritance in ltdscgt, what to
preserve
ltarchdesc type"inventory" level"collection"gt ltdi
d id"a1"gtltunittitlegtIrene Gomez-Bethke
paperslt/unittitlegtltunitdategt1970-1993.lt/unitdategtlt
/didgt ltc01gt ltdidgtltunittitlegtHispanic
Organizations in Minnesotalt/unittitlegtlt/didgt
other c02 levels not shown ltc02gt
ltdidgtltphyslocgt151.H.1.1Blt/physlocgtltcontainergt1lt/
containergtltunittitlegtArchdiocesan Office of
Hispanic Ministrylt/unittitlegtlt/didgt
other c03 levels not shown ltc03gt
ltdidgtltunittitlegtHispanic Ministry Advisory
Boardlt/unittitlegtlt/didgt ltscopecontentgtltpgt
Advised thearchbishop.lt/pgtlt/scopecontentgt
other c04 levels not shown
ltc04gt ltdidgtltunittitlegtMinutes
of Board Meetings, 1986-1989.lt/unittitlegtlt/didgt
lt/c04gt lt/c03gt
lt/c02gt lt/c01gt lt/archdescgt
7More Problems/Issues
- What level of materials are being described
inconsistent use of level attribute - Flexibility in DTD and encoding practices
- Lack of standardization, little use of content
standards like APPM, LCSH, LCNAF, inconsistent
date styles, name conventions. - Everything is permissible, not everything is
beneficial
8Our Approach
- Assumption Dumbing down metadata has benefits
- Examine Encoding Standards
- Mitigate inconsistent encoding practices
- Generate multiple OAI records for one EAD
- ltarchdescgt top level record
- mini records from ltdscgt, with relation to top
level - Preserve context for hits by linking user to
finding aid in the search/retrieval mechanism
9Recommended Crosswalk
- To Simple Dublin Core
- Top Level
- flexible mapping, draws from lteadheadergt and
(mainly) ltarchdescgt - Key fields identifier, title, date, type,
description, subjects, relation - ltdscgt
- provides metadata for the box listing
- OAI records separate but related to top level
- use of Xpointer to provide context
- replicates hierarchical structure of EAD
10XPointers
- W3C Candidate Recommendation (2001-09-11)
- Can identify XML fragments using a superset of
the XPath syntax - xpointer(//dsc1/c012/c023/c0310))
- When EADs are split into their subordinate
components, XPointers are used to identify the
individual parts and link them together - Server-side scripts use the XPointers for
rendering and linking
11- ltrdfRDFgt ltrdfDescriptiongt
ltdcidentifiergt http////test.xmlxpointer
(//dsc1/c018/c025/c03244)
lt/dcidentifiergt ltdctitlegtToensing,
Richardlt/dctitlegt ltdctypegttextlt/dctypegt - ltdctypegtarchives or manuscriptslt/dctypegt
- ltdctypegtfilelt/dctypegt
- ltdctermsisPartOfgt
- ltrdfDescriptiongt
- ltdcidentifiergt
http////test.xmlxpointer(//dsc1/c018/c025
) lt/dcidentifiergt - ltdctitlegtVarious Composerslt/dctitlegt
- lt/rdfDescriptiongt
- lt/dctermsisPartOfgt
- lt/rdfDescriptiongt
- lt/rdfRDFgt
12Indexing and Retrieval Issues
- Variations in consistency and quality of source
EAD markup--currently being tested - Size of EAD finding aids 1 EAD can result on
1000s of DC records (many of marginal
usefulness, but some very useful) - Frequently occurring search term can overwhelm
results list with many parts of one EAD
13Performance Issues
- Splitting many EADs can be a time consuming batch
process - Many marginally useful DC files can effect search
performance--need a logic to discard these
records - Disk space requirements can be large
14Simple Searchhttp//oai.grainger.uiuc.edu/oai/sea
rch
15Advanced Search
16Search Results
17Full Record
18Hit in Context of Finding Aid
19Outstanding Issues
- Improved display of multiple search hits within a
single EAD - Summary and detail views or hierarchical
- Improve display of hit in context of finding aid
- Filter out superfluous subordinate components
- Normalization of various EAD data elements
20Using the OAI Protocols with EADJCDL
02Portland, OregonJuly 16, 2002
- Christopher J. Prom
- Assistant University Archivist
- University of Illinois at Urbana-Champaign
- prom_at_uiuc.edu
-
- Thomas G. Habing
- Research Programmer
- University of Illinois Engineering Library
- thabing_at_uiuc.edu