OAIPMH and METS for automated export of items from DSpace PowerPoint PPT Presentation

presentation player overlay
1 / 26
About This Presentation
Transcript and Presenter's Notes

Title: OAIPMH and METS for automated export of items from DSpace


1
OAI-PMH and METS for automatedexport of items
from DSpace
Stuart LewisUniversity of Wales Aberystwyth
2
Contents
  • The problem
  • The bigger picture
  • Solution overview
  • How this worked with DSpace
  • The finished solution

3
The problem
  • The federal University of Wales
  • National library of Wales
  • Thesis collection agreement
  • Now moving towards electronic theses

4
The problem
  • Harvest electronic theses from repositories
    across Wales
  • DSpace
  • GNU Eprints
  • Import items into NLW repository
  • Fedora

5
Repository Bridge
  • JISC funded
  • 1 year project
  • Collaboration between
  • University of Wales Aberystwyth
  • University of Wales Swansea
  • National Library of Wales

6
The bigger picture
  • EThOS
  • Electronic Theses Online Service
  • UK Database of Theses
  • They also want to import our theses
  • Welsh consortium

7
The solution
  • OAI-PMH
  • Open Archives Initiative
  • Protocol for Metadata Harvesting
  • Harvest daily any new items
  • Ingest in Fedora
  • Available for harvesting by EThOS

8
The solution
  • Decide which sets (collections) to harvest from
  • request?verbListSets
  • Find new items in those sets
  • request?verbListIdentifierssethdl_2160_20from
    2006-04-20

ltsetgt   ltsetSpecgthdl_2160_20lt/setSpecgt  
ltsetNamegtAdvanced Reasoning Group
Theseslt/setNamegt lt/setgt
ltheadergt   ltidentifiergtoaicadair.aber.ac.uk2160/
64lt/identifiergt   ltdatestampgt2006-03-21T111452Z
lt/datestampgt   ltsetSpecgthdl_2160_21lt/setSpecgt
lt/headergt
9
The solution
  • Harvest each item
  • request?verbGetRecordidentifieroaicadair.aber.
    ac.uk2160/64
  • Ingest into Fedora
  • Parse metadata
  • Download items (files)
  • Ingest into Fedora
  • Data available for EThOS
  • Ready to be queried via OAI-PMH

10
The solution
  • Web-based control panel

11
The solution
  • Web-based control panel

12
The solution
  • Web-based control panel

13
Which metadata format?
  • EThOS defines its own metadata set
  • UKETD (United Kingdom Electronic Thesis and
    Dissertation)
  • qualified Dublin Core
  • E.g. ltdccreatorgtBell, Jonathanlt/dccreatorgt
  • Additional schema
  • DSpace crosswalk file
  • metadataPrefixuketd_dc

ltuketd_dcuketddc xmlnsuketd_dchttp//naca.centr
al.cranfield.ac.uk/ethos-oai/2.0/ xmlnsuketdterms
http//naca.central.cranfield.ac.uk/ethos-oai/ter
ms/ xmlnsxsihttp//www.w3.org/2001/XMLSchema-ins
tance xsischemaLocation"http//naca.central.cran
field.ac.uk/ethos-oai/2.0/ http//naca.central.cr
anfield.ac.uk/ethos-oai/2.0/uketd_dc.xsd"gt
14
Which metadata format?
  • Fedora (and the NLW)
  • Prefer METS (Metadata and Encoding Transmission
    Standard)
  • and MODS (Metadata Object Description Schema)

ltmodsnamegt ltmodsrolegt ltmodsroleTerm
type"text"gtauthorlt/modsroleTermgt
lt/modsrolegt ltmodsnamePartgtBell,
Jonathanlt/modsnamePartgt lt/modsnamegt
15
Which metadata format?
  • DSpace support for METS MODS
  • Patch for DSpace version 1.3.2
  • Built into DSpace version 1.4
  • metadataPrefixmets
  • dc2mods.cfg

contributor.author ltmodsnamegtltmodsrolegt ltmods
roleTerm type"text"gtauthorlt/modsroleTermgtlt/mods
rolegt ltmodsnamePartgtslt/modsnamePartgtlt/modsname
gt
16
But which one?
  • metadataPrefixuketd_dc
  • Or
  • metdataPrefixmets

17
But which one?
  • Combine the two!
  • metdataPrefixuketd_mets

18
Two dmdSecs
  • METS holds metadata within Descriptive MetaData
    Sections
  • Lets use two of them!
  • ltdmdSec ID"DMD_hdl_2160_24_mods"gt
  • ltdmdSec ID"DMD_hdl_2160_24_uketd"gt

19
Licence encoding
  • METS holds rights within Administrative MetaData
    Sections
  • ltamdSecgt
  • Use the licence bitstream text

Bundle bundles item.getBundles() for (int i
0 i lt bundles.length i) // Assume
license will be in its own bundle
Bitstream bitstreams bundlesi.getBitstreams(
) if (bitstreams0.getFormat().getID()
licenseFormat) // Return the
license return bitstreams0.retrieve(
)
20
Bitstreams (files)
  • METS describes files within File Sections
  • ltfileSecgt
  • All bitstreams except licence
  • Includes filtered text

ltfileSecgt ltfileGrp USE"ORIGINAL"gt ltfile
ID"f2160_24_1" MIMETYPE"application/pdf"
SIZE"1728984" CHECKSUM"eb5d0b9d51042
12aed5f0cd76e99cf11" CHECKSUMTYPE"MD5"
OWNERID"http//cadiar.aber.ac.uk/dspace/bitstre
am/2160/24/1/Thesis.pdf" GROUPID"GROUP_f2160_24_1
"gt   ltFLocat LOCTYPE"URL" xlinktype"simple"
xlinkhref"http//cadair.aber.ac.uk/dspace/bitst
ream/2160/24/1/Thesis.pdf" /gt   lt/filegt  
lt/fileGrpgt ltfileGrp USE"TEXT"gt ltfile
IDf2160_24_3" MIMETYPE"text/plain"
SIZE"616933" CHECKSUM"f0bc8e35293028
54ea9bedbac30ec0dd" CHECKSUMTYPE"MD5"
OWNERID"http//cadair.aber.ac.uk/dspace/bitstrea
m/2160/24/3/Thesis.pdf.txt" GROUPID"GROUP_f2160_2
4_1"gt   ltFLocat LOCTYPE"URL"
xlinktype"simple" xlinkhref"http//cadair.aber
.ac.uk/dspace/bitstream/2160/24/3/Thesis.pdf.txt"
/gt   lt/filegt lt/fileGrpgt lt/fileSecgt
21
File structure map
  • METS supports file structure relationships
  • ltstructMapgt   
  •     ltdivgt
  •       ltdivgtlt/divgt
  •     lt/divgt
  •   lt/structMapgt
  • Useful for file hierarchy, e.g. sections and
    subsections, or pages of a book
  • DSpace does not hold structural information

22
File structure map
  • Cant do it alphabetically
  • Appendix A
  • Appendix B
  • Chapter 1
  • Chapter 2
  • Chapter 3
  • Chapter 4

23
File structure map
  • Decided upon order of item upload
  • Not perfect, but does the job!
  • Future requirement for DSpace?

ltstructMapgt ltdiv DMDID"DMD_hdl_123456789_24_uk
etd DMD_hdl_2160_24_mods"
ADMID"rights_123456789_24_mods
TMD_hdl_2160_24"gt   ltfptr FILEID"f2160_24_1"
/gt   ltfptr FILEID"f2160_24_3" /gt  
lt/divgt lt/structMapgt
24
The final solution
  • Fedora ingests two copies of the metadata
  • It uses MODS for itself
  • It provides qDC for EThOS
  • MODS from
  • org.dspace.app.mets.METSExport
  • public static void writeMETS
  • qDC from
  • uk.bl.ethos.UKETDDCCrosswalk.java
  • public String writeMetadataWithSchema

25
Final METS document
Header
GetRecord
Metadata
dmdSec MDTypeMODS
dmdSec MDTypeOTHER OTHERMDTYPEUKETD_DC
andSec ltmodsuseAndReproductiongt
fileSec
structMap
26
The end!
  • Stuart Lewis
  • Stuart.Lewis_at_aber.ac.uk
  • Repository Bridge
  • http//www.inf.aber.ac.uk/bridge/
  • Any questions?
Write a Comment
User Comments (0)
About PowerShow.com