A Digital Library Repository Utilizing the Open Archives Initiative - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

A Digital Library Repository Utilizing the Open Archives Initiative

Description:

... descriptions from Repositories into large databases, using OAI Harvesters ... Pre-developed repositories, harvesters, search engines, and more: ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 22
Provided by: jody63
Category:

less

Transcript and Presenter's Notes

Title: A Digital Library Repository Utilizing the Open Archives Initiative


1
A Digital Library RepositoryUtilizing theOpen
Archives Initiative
  • Developed to meet the needs of UTK Library
    Special Collections

2
The Problem
Tremendous quantities of valuable information
exist in Museums, Libraries, and Research
Centers which are not available in a
standardized format via
centralized search engines
How to make the connection???
Musical scores and sound tracks
Historical Documents
Theses and Dissertations
Photos and videos
Scientific records
Mathematical findings
3
The Open Archives Solution
  • Translation of records
  • Into a Common Format and Language
  • XML
    Unqualified Dublin Core
  • Storage of these translations
  • Response to a standardized set of queries
  • Gather document descriptions from Repositories
    into large databases, using OAI Harvesters
  • Set up search engines to offer up information in
    these databases

4
  • Required For Translation
  • Understanding of XML and XML schemas
  • Determining correct mapping of information to
    Unqualified Dublin Core Elements, in order to
    translate legacy files into a metadata format
    supported by the Open Archives Initiative
  • Scripts to reduce the labor of translation

Musical scores and sound tracks
Historical Documents
Theses and Dissertations
Photos and videos
Scientific records
Mathematical findings
5
The 15 elements of Dublin Core Unqualified
Content Title Description
Coverage Relation Source
Subject Type  Intellectual
Property Contributor Creator
Publisher Rights Instantiation
Date Format Identifier Language
A Common Language. Dublin Core
6
The XML schema constrains each
element of the document, providing rules and
framework for parsing
ltcomplexType name"dublincoreType"gt ltchoice
minOccurs"0" maxOccurs"unbounded"gt ltelement
name"subject" minOccurs"0" maxOccurs"unbounded"
type"string"/gt lt/choicegt lt/complexTypegt
lt/schemagt
A Common Framework XML schemas
7
From a TEI Lite SGML file segment
ltPROFILEDESCgtltTEXTCLASSgtltKEYWORDSgt SCHEME"LCSH"gtlt
LISTgt ltITEMgtLetterslt/ITEMgt ltITEMgtCherokee
IndiansClaims againstlt/ITEMgt ltITEMgtTennesseelt/ITE
Mgtlt/LISTgtlt/KEYWORDSgt lt/TEXTCLASSgtlt/PROFILEDESCgtlt/T
EIHEADERgt
To an Unqualified Dublin Core XML file segment
ltsubjectgt Letterslt/subjectgt ltsubjectgtCherokee
Indians Claims againstlt/subjectgt
ltsubjectgtTennesseelt/subjectgt
A Common Format. XML
8
Selected Portions of a TEI-Lite SGML
record
  • ltTEIHEADERgt ltFILEDESCgt ltTITLESTMTgt
  • ltTITLEgtLetter July 8, 1839, Washington City DC,
    to HP King, Qualla Town / William Holland
    Thomas a machine-readable transcription of an
    imagelt/TITLEgt
  • ltAUTHORgtThomas, William Hollandlt/AUTHORgt
  • ltPUBLISHERgtThe University of Tennessee
    Librarieslt/PUBLISHERgt
  • ltIDNOgtwt025lt/IDNOgt
  • ltAVAILABILITYgtltPgtThis work is the property of the
    Special Collections Library,
  • University of Tennessee, Knoxville, TN. It
    may be used freely by individuals for research,
    teaching, and personal use as long as this
    statement of availability is included in the
    text.lt/Pgtlt/AVAILABILITYgtlt/PUBLICATIONSTMTgt
    ltSOURCEDESCgtltBIBLgt
  • ltDATE VALUE"1839-07-08"gtJuly 8, 1839lt/DATEgt
  • ltNOTE TYPE"summary"gtThis document is a letter
    dated July 8, 1839 to H.P. King from William
    Holland Thomas with instructions for running the
    Indian Store.
  • lt/NOTEgt
  • ltPROFILEDESCgt ltTEXTCLASSgt KEYWORDS
    SCHEME"LCSH"gtltLISTgt
  • ltITEMgtCherokee Indianslt/ITEMgt
  • ltITEMgtGovernment relationslt/ITEMgt
  • lt/LISTgt /KEYWORDSgtlt/TEXTCLASSgtlt/PROFILEDESCgt
  • ltTEXTgtltBODYgtltDIV1 TYPE"letter"gt

9
Translated to XML Unqualified Dublin Core
lttitlegtLetter July 8, 1839, Washington City
DC, to HP King, QuallaTownlt/titlegt
ltcontributorgtThe University of Tennessee
Libraries, Knoxvillelt/contributorgt
ltcontributorgtSoutheastern Native American
Documents Collection (GALILEO
(Georgia statewide project)) GAGALlt/contributorgt
ltcreatorgtThomas, William Hollandlt/creatorgt
ltpublishergtThe University of Tennessee
Librarieslt/publishergt ltdategtJuly 8,
1839lt/dategt ltdescriptiongt This
document is a letter dated July 8, 1839 toH.P.
King from William Holland Thomas with
instructions for running the Indian
Store.lt/descriptiongt ltidentifiergtDocument ID
wt025lt/descriptiongt ltidentifiergthttp//www.he
lios.dii.utk.edu/oai/sgm/00178.html
ltsubjectgtCherokee Indianslt/subjectgt
ltsubjectgtGovernment relationslt/subjectgt
ltrightsgt This work is the property of
the Special Collections Library,
University of Tennessee, Knoxville, TN. It may be
used freely by individuals for research,
teaching, and personal use as long as this
statement of availability is included in
the text. lt/rightsgt lttypegtletterlt/typegt
lttypegtcomputer filelt/typegt
10
Translation Tools
Crosswalks available MARC to DC
http//www.loc.gov/marc/dccross.html Shown in
action at http//alcme.oclc.org/marc2dc/index.htm
l OTHERS http//www.sinica.edu.tw/metadata/tool/
mapping-foreign.html http//www.lub.lu.se/tk/metad
ata/MDin9612.html http//www.getty.edu/research/in
stitute/standards/intrometadata/3_crosswalks/index
.html
11
The Open Archives Solution
  • Translation of records
  • Into a Common Format and Language
  • XML
    Unqualified Dublin Core
  • Storage of these translations
  • Response to a standardized set of queries
  • Gather document descriptions from Repositories
    into large databases, using OAI Harvesters
  • Set up search engines to offer up information in
    these databases

12
  • Storage of OAI Records

MySQL small, fast, and free
http//www.mysql.com Use scripts to load database
and retrieve information
Store entire records, already marked up in
Unqualified Dublin Core, for quick response
or Store fields untagged, multiple values for
a field separated by tags, and retag upon
request flexibility. This structure allows for
a record to be entered once and retrieved in
various formats upon request. For local search
engines, also store hardcoded xml files in a
directory.
sth dbh-gtprepare("select listit from set
where date lt 'until'
and date gt 'from'
order by id")
mysqlgt create table gsm( -gt id char(10) not
null, -gt primary key (id), -gt date
char(10), -gt path char (80), -gt listit
text)
13
The Open Archives Solution
  • Translation of records
  • Into a Common Format and Language
  • XML
    Unqualified Dublin Core
  • Storage of these translations
  • Response to a standardized set of queries
  • Gather document descriptions from Repositories
    into large databases, using OAI Harvesters
  • Set up search engines to offer up information in
    these databases

14
  • Response
  • Offer up document descriptions via a
    standardized set of queries
    responses
    the Open Archives
    Initiative Protocol
  • 6 Verbs, with 5 required and/or optional
    arguments
  • 2) Unique Identifiers, Optional Sets, and
    Metadata Prefixes
  • 3) Flow control Resumption Tokens
  • 4) Error Codes

15
  • Verbs and arguments The Open Archives Protocol
  • Identify
  • ListSets
  • ListMetadataFormats optional identifier
  • ListIdentifiers
    required metadata prefix
    (oai_dc) optional
    from, until, set, resumption token
  • ListRecords
    required metadata prefix
    (oai_dc) optional
    from, until, set, resumption token
  • GetRecord required identifier and metadata
    prefix

16
  • Identifiers, Sets, and Metadata Prefixes

Sample Identifiers
Input as "Set"
Current Sets
har che civ etd emn ead gsm ldr
rth tdh vid
oaitknhar/har0001 oaitknche/che0003oaitknc
iv/civ0001 oaitknetd/etd0002oaitknemn/emn0001
oaitknead/ead0003oaitkngsm/gsm0045oaitknl
dr/ldr0002oaitknrth/rth0034oaitkntdh/tdh0005
oaitknvid/vid0001
Bessie Harvey Collection Cherokee Civil War
Collection Electronic Theses and Dissertations
Emancipator Encoded Archival Description Great
Smoky Mountains Library Development Review Roth
Photography Collection Tennessee Documentary
History Videos
Supported Metadata prefix     oai_dc
17
  • Flow Control and ResumptionTokens

For ListIdentifiers, ListSets and ListRecords
ltresumptionTokengt LRrtdc20f19990202u20020101 lt/re
sumptionTokengt
LR or LI for ListRecord or ListIdentifier rt
Number or letter combination which set next dc
Metadata format 20 Which record number to start
with this time f19990202 From date
1999-02-02 U20020101 Until date 2002-01-01
Specifies the call to the database when
this Resumption token is returned!!
18
  • Error Codes version 2.0

  • badResumptionToken
  • badVerb

  • badArgument
  • idDoesNotExist

  • cannotDisseminateFormat
  • noMetadataFormats
  • noRecordsMatch

  • noSetHierarchy

19
OAI 1.1 Test interface and Local Search Engine
http//oai.sunsite.utk.edu/1.1.html
Search by word or phrase Searching by
all or any field and set, Sorting by date or
set Returning Lists of identifiers or short
file descriptions, each with links to full
file in HTML, XML, and online document
Musical scores and sound tracks
Historical Documents
Theses and Dissertations
Videos and Photos
Scientific records
Mathematical findings
20
The Open Archives Solution
  • Translation of records
  • Into a Common Format and Language
  • XML
    Unqualified Dublin Core
  • Storage of these translations
  • Response to a standardized set of queries
  • Gather document descriptions from Repositories
    into large databases, using OAI Harvesters
  • Set up search engines to offer up information in
    these databases

21
More Information www.openarchives.org
CrossWalks http//www.sinica.edu.tw/metadata/too
l/mapping-foreign.html http//www.lub.lu.se/tk/met
adata/MDin9612.html http//www.getty.edu/research/
institute/standards/intrometadata/3_crosswalks/ind
ex.html
Pre-developed repositories, harvesters, search
engines, and more ?
http//www.openarchives.org/tools/tools.html
Current Service Providers, who can offer searches
of your records from your repository
responses http//www.openarchives.org/service/lis
tproviders.html
Write a Comment
User Comments (0)
About PowerShow.com