Title: A Digital Library Repository Utilizing the Open Archives Initiative
1A Digital Library RepositoryUtilizing theOpen
Archives Initiative
- Developed to meet the needs of UTK Library
Special Collections
2The Problem
Tremendous quantities of valuable information
exist in Museums, Libraries, and Research
Centers which are not available in a
standardized format via
centralized search engines
How to make the connection???
Musical scores and sound tracks
Historical Documents
Theses and Dissertations
Photos and videos
Scientific records
Mathematical findings
3The Open Archives Solution
- Translation of records
- Into a Common Format and Language
- XML
Unqualified Dublin Core - Storage of these translations
- Response to a standardized set of queries
- Gather document descriptions from Repositories
into large databases, using OAI Harvesters - Set up search engines to offer up information in
these databases
4- Required For Translation
- Understanding of XML and XML schemas
- Determining correct mapping of information to
Unqualified Dublin Core Elements, in order to
translate legacy files into a metadata format
supported by the Open Archives Initiative - Scripts to reduce the labor of translation
Musical scores and sound tracks
Historical Documents
Theses and Dissertations
Photos and videos
Scientific records
Mathematical findings
5 The 15 elements of Dublin Core Unqualified
Content Title Description
Coverage Relation Source
Subject Type Intellectual
Property Contributor Creator
Publisher Rights Instantiation
Date Format Identifier Language
A Common Language. Dublin Core
6 The XML schema constrains each
element of the document, providing rules and
framework for parsing
ltcomplexType name"dublincoreType"gt ltchoice
minOccurs"0" maxOccurs"unbounded"gt ltelement
name"subject" minOccurs"0" maxOccurs"unbounded"
type"string"/gt lt/choicegt lt/complexTypegt
lt/schemagt
A Common Framework XML schemas
7From a TEI Lite SGML file segment
ltPROFILEDESCgtltTEXTCLASSgtltKEYWORDSgt SCHEME"LCSH"gtlt
LISTgt ltITEMgtLetterslt/ITEMgt ltITEMgtCherokee
IndiansClaims againstlt/ITEMgt ltITEMgtTennesseelt/ITE
Mgtlt/LISTgtlt/KEYWORDSgt lt/TEXTCLASSgtlt/PROFILEDESCgtlt/T
EIHEADERgt
To an Unqualified Dublin Core XML file segment
ltsubjectgt Letterslt/subjectgt ltsubjectgtCherokee
Indians Claims againstlt/subjectgt
ltsubjectgtTennesseelt/subjectgt
A Common Format. XML
8 Selected Portions of a TEI-Lite SGML
record
- ltTEIHEADERgt ltFILEDESCgt ltTITLESTMTgt
- ltTITLEgtLetter July 8, 1839, Washington City DC,
to HP King, Qualla Town / William Holland
Thomas a machine-readable transcription of an
imagelt/TITLEgt - ltAUTHORgtThomas, William Hollandlt/AUTHORgt
- ltPUBLISHERgtThe University of Tennessee
Librarieslt/PUBLISHERgt - ltIDNOgtwt025lt/IDNOgt
- ltAVAILABILITYgtltPgtThis work is the property of the
Special Collections Library, - University of Tennessee, Knoxville, TN. It
may be used freely by individuals for research,
teaching, and personal use as long as this
statement of availability is included in the
text.lt/Pgtlt/AVAILABILITYgtlt/PUBLICATIONSTMTgt
ltSOURCEDESCgtltBIBLgt - ltDATE VALUE"1839-07-08"gtJuly 8, 1839lt/DATEgt
- ltNOTE TYPE"summary"gtThis document is a letter
dated July 8, 1839 to H.P. King from William
Holland Thomas with instructions for running the
Indian Store. - lt/NOTEgt
- ltPROFILEDESCgt ltTEXTCLASSgt KEYWORDS
SCHEME"LCSH"gtltLISTgt - ltITEMgtCherokee Indianslt/ITEMgt
- ltITEMgtGovernment relationslt/ITEMgt
- lt/LISTgt /KEYWORDSgtlt/TEXTCLASSgtlt/PROFILEDESCgt
- ltTEXTgtltBODYgtltDIV1 TYPE"letter"gt
9 Translated to XML Unqualified Dublin Core
lttitlegtLetter July 8, 1839, Washington City
DC, to HP King, QuallaTownlt/titlegt
ltcontributorgtThe University of Tennessee
Libraries, Knoxvillelt/contributorgt
ltcontributorgtSoutheastern Native American
Documents Collection (GALILEO
(Georgia statewide project)) GAGALlt/contributorgt
ltcreatorgtThomas, William Hollandlt/creatorgt
ltpublishergtThe University of Tennessee
Librarieslt/publishergt ltdategtJuly 8,
1839lt/dategt ltdescriptiongt This
document is a letter dated July 8, 1839 toH.P.
King from William Holland Thomas with
instructions for running the Indian
Store.lt/descriptiongt ltidentifiergtDocument ID
wt025lt/descriptiongt ltidentifiergthttp//www.he
lios.dii.utk.edu/oai/sgm/00178.html
ltsubjectgtCherokee Indianslt/subjectgt
ltsubjectgtGovernment relationslt/subjectgt
ltrightsgt This work is the property of
the Special Collections Library,
University of Tennessee, Knoxville, TN. It may be
used freely by individuals for research,
teaching, and personal use as long as this
statement of availability is included in
the text. lt/rightsgt lttypegtletterlt/typegt
lttypegtcomputer filelt/typegt
10Translation Tools
Crosswalks available MARC to DC
http//www.loc.gov/marc/dccross.html Shown in
action at http//alcme.oclc.org/marc2dc/index.htm
l OTHERS http//www.sinica.edu.tw/metadata/tool/
mapping-foreign.html http//www.lub.lu.se/tk/metad
ata/MDin9612.html http//www.getty.edu/research/in
stitute/standards/intrometadata/3_crosswalks/index
.html
11The Open Archives Solution
- Translation of records
- Into a Common Format and Language
- XML
Unqualified Dublin Core - Storage of these translations
- Response to a standardized set of queries
- Gather document descriptions from Repositories
into large databases, using OAI Harvesters - Set up search engines to offer up information in
these databases
12MySQL small, fast, and free
http//www.mysql.com Use scripts to load database
and retrieve information
Store entire records, already marked up in
Unqualified Dublin Core, for quick response
or Store fields untagged, multiple values for
a field separated by tags, and retag upon
request flexibility. This structure allows for
a record to be entered once and retrieved in
various formats upon request. For local search
engines, also store hardcoded xml files in a
directory.
sth dbh-gtprepare("select listit from set
where date lt 'until'
and date gt 'from'
order by id")
mysqlgt create table gsm( -gt id char(10) not
null, -gt primary key (id), -gt date
char(10), -gt path char (80), -gt listit
text)
13The Open Archives Solution
- Translation of records
- Into a Common Format and Language
- XML
Unqualified Dublin Core - Storage of these translations
- Response to a standardized set of queries
- Gather document descriptions from Repositories
into large databases, using OAI Harvesters - Set up search engines to offer up information in
these databases
14- Response
- Offer up document descriptions via a
standardized set of queries
responses
the Open Archives
Initiative Protocol
- 6 Verbs, with 5 required and/or optional
arguments - 2) Unique Identifiers, Optional Sets, and
Metadata Prefixes - 3) Flow control Resumption Tokens
- 4) Error Codes
15- Verbs and arguments The Open Archives Protocol
- Identify
- ListSets
- ListMetadataFormats optional identifier
- ListIdentifiers
required metadata prefix
(oai_dc) optional
from, until, set, resumption token - ListRecords
required metadata prefix
(oai_dc) optional
from, until, set, resumption token - GetRecord required identifier and metadata
prefix
16- Identifiers, Sets, and Metadata Prefixes
Sample Identifiers
Input as "Set"
Current Sets
har che civ etd emn ead gsm ldr
rth tdh vid
oaitknhar/har0001 oaitknche/che0003oaitknc
iv/civ0001 oaitknetd/etd0002oaitknemn/emn0001
oaitknead/ead0003oaitkngsm/gsm0045oaitknl
dr/ldr0002oaitknrth/rth0034oaitkntdh/tdh0005
oaitknvid/vid0001
Bessie Harvey Collection Cherokee Civil War
Collection Electronic Theses and Dissertations
Emancipator Encoded Archival Description Great
Smoky Mountains Library Development Review Roth
Photography Collection Tennessee Documentary
History Videos
Supported Metadata prefix oai_dc
17- Flow Control and ResumptionTokens
For ListIdentifiers, ListSets and ListRecords
ltresumptionTokengt LRrtdc20f19990202u20020101 lt/re
sumptionTokengt
LR or LI for ListRecord or ListIdentifier rt
Number or letter combination which set next dc
Metadata format 20 Which record number to start
with this time f19990202 From date
1999-02-02 U20020101 Until date 2002-01-01
Specifies the call to the database when
this Resumption token is returned!!
18-
-
badResumptionToken - badVerb
-
badArgument - idDoesNotExist
-
cannotDisseminateFormat - noMetadataFormats
-
- noRecordsMatch
-
noSetHierarchy
19OAI 1.1 Test interface and Local Search Engine
http//oai.sunsite.utk.edu/1.1.html
Search by word or phrase Searching by
all or any field and set, Sorting by date or
set Returning Lists of identifiers or short
file descriptions, each with links to full
file in HTML, XML, and online document
Musical scores and sound tracks
Historical Documents
Theses and Dissertations
Videos and Photos
Scientific records
Mathematical findings
20The Open Archives Solution
- Translation of records
- Into a Common Format and Language
- XML
Unqualified Dublin Core - Storage of these translations
- Response to a standardized set of queries
- Gather document descriptions from Repositories
into large databases, using OAI Harvesters - Set up search engines to offer up information in
these databases
21More Information www.openarchives.org
CrossWalks http//www.sinica.edu.tw/metadata/too
l/mapping-foreign.html http//www.lub.lu.se/tk/met
adata/MDin9612.html http//www.getty.edu/research/
institute/standards/intrometadata/3_crosswalks/ind
ex.html
Pre-developed repositories, harvesters, search
engines, and more ?
http//www.openarchives.org/tools/tools.html
Current Service Providers, who can offer searches
of your records from your repository
responses http//www.openarchives.org/service/lis
tproviders.html