Excellent XML systems interoperability at the Wellcome Library - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Excellent XML systems interoperability at the Wellcome Library

Description:

One circulation system to manage and one set of circ stats ... Built on CALM server using freeware University of Illinois ... to register, use a light box, ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 52
Provided by: jonatha169
Category:

less

Transcript and Presenter's Notes

Title: Excellent XML systems interoperability at the Wellcome Library


1
Excellent XML systems interoperability at the
Wellcome Library
  • EIUG 11th Conference, Stirling University
  • 1 2 September 2005
  • Margaret Savage-Jones
  • m.savage-jones_at_wellcome.ac.uk

2
Wellcome Library Systems
  • Millennium - Innovative Interfaces Inc.
  • http//catalogue.wellcome.ac.uk Includes
    online requesting
  • from closed stack since mid 2003
  • Calm - Archive system DS Ltd
    http//archives.wellcome.ac.uk
  • Online access to archive mss holdings
  • Miro/MedPhoto image system System Simulation
    Ltd
  • http//medphoto.wellcome.ac.uk
  • Online access to over 100,000 images, image
    retrieval delivery

3
Underlying protocol OAI-PMHOpen Archives
Initiative Protocol for Metadata Harvesting -
protocol for sharing and harvesting metadata
between different OAI-compliant systemsBased on
XML and HTTPOne system (CALM or MedPhoto)
exposes metadata via an OAI repository. This
metadata is harvested by the other system
(Millennium) and then loaded
4
Motivation With a MARC21, ISAD(G) a bespoke
image repository it was a strategic objective to
make these systems interoperate Phase II of the
Closed Stack project - Western Manuscripts and
Archives had to be requestable online by summer
2004 XML Harvester development by Innovative
with Michigan State University 2001-02.
Wellcome placed an order for XML Harvester in
January 2003 With CALM ver 4 it was possible
to export EAD XML
5
Benefits
  • Online requesting - Western MSS Archives
    collections
  • One circulation system to manage and one set of
    circ stats
  • Same interface for all online requests from stack
  • Archives manuscripts like other collections
  • Image sets for library objects displayed in Web
    OPAC
  • User can jump from one system to another
  • No need to rekey user search in other system
  • Selective harvesting for onward record updating

6
Example archive record (from Crick Coll.)
7
Harvested archive record in Web OPAC
8
Image of the archive item
9
Encoded Archival Description (EAD)
  • Initially XML Harvester dealt only with EAD and
    needed
  • encodinganalogs for parsing. Developed with
    Michigan
  • State University (MSU) whose EAD finding aids had
  • MARC encodinganalogs. Harvester parser read
    these tags.
  • Encodinganalogs are attributes in XML records
    indicateing
  • field, subfield, indicators etc. in another
    descriptive encoding
  • system e.g. MARC21 equivalent to EAD tagged
    element

10
Archive system metadata
  • Hierarchical, tree structure with collection and
    component item
  • level records catalogued in General International
    Standard Archival
  • Description, ISAD(G)
  • Field export from CALM as default subset EAD DTD
    had some
  • empty fields had to export as DServe Natural
    XML which
  • includes field tags. Catalog.xml output with
    catalog.DTD

11
Pilot used Haddad catalogue XML
  • Used small set of 87 XML Arabic records a local
    variant
  • of MASTER XML DTD as a pilot to tes XML
    Harvester
  • Used stylesheets to filter unwanted fields, add
    encodinganalogs
  • and put 87 .xml files in a web server directory
    ready to be
  • harvested

12
(No Transcript)
13
Web crawler
  • Harvester reaches the XML files through port 80.
  • We added a page to the Millennium screens
    directory
  • listing files with redirections to the web server
    folder.
  • Harvester opened the page, scanned for HREF
    strings
  • which directed it to the XML records (file.xml)
  • The XML Harvester parser read tags from
    encodinganalogs
  • to create MARC21 records, writing to a file for
    loading

14
Redirection screen
  • lthtmlgt
  • ltheadgt
  • lttitlegt Harvester Testlt/titlegt
  • lt/headgt
  • ltbodygt
  • ltemgtMss Fileslt/emgtltbrgt
  • ltstronggt Sample Screen 2lt/stronggt
  • ltPREgt
  • Test to confirm if harvester can crawl files
    deposited on wtcalm01
  • lt/pregt
  • ltA HREFhttp//wtcalm01.wellcome.ac.uk/xml/002.xml
    gt002lt/Agt
  • ltA HREFhttp//wtcalm01.wellcome.ac.uk/xml/83.xmlgt
    83lt/Agt
  • ltA HREFhttp//wtcalm01.wellcome.ac.uk/xml/82.xmlgt
    82lt/Agt
  • lt/bodygt
  • lt/htmlgt

15
Example encodinganalogs for 856
  • - lthyperlinkgt
  • -lturl ENCODINGANALOG85607ugt
  • ltxsltextgthttp//http//wisdom
    .welcome.ac.uk/xml/lt/xsltextgt
  • ltxslvalue-of
    selectsubstring-after(/?idno,WMS Arabic)/gt
  • ltxsltextgt.htmllt/xsltextgt
  • lt/urlgt
  • lttext ENCODINGANALOG85607zgtView
    full manuscript recordlt/textgt
  • lt/hyperlinkgt

16
Harvested MARC21 Haddad record
17
Links to PDF and Request button
18
Lessons
  • Arabic records would be loaded only once but
    records from
  • CALM would need regular reharvesting/overlay
  • Need a more sophisticated approach than crawling
    a web
  • directory XML Harvester can harvest from OAI
    Repository and
  • use datestamps in OAI to harvest records created,
    or modified
  • in specified date range
  • XSLT could be used to transform records to MARC21
    OAI
  • without using encodinganalogs.

19
Archives OAI repository
  • Built on CALM server using freeware University of
    Illinois
  • Provider service tool (Runs under Windows IIS)
  • Other Requirements
  • Microsoft 2000 server
  • Microsoft IIS ver 4 or higher
  • Microsoft ASP
  • Microsoft XML Parser (MSXML) 4.0
  • Microsoft ActiveX Data objects and ODBC compliant
  • datasource i.e. MS Acces97 database
  • Firewall access on port 80

20
Key decisions
  • Metadata export chose full CALM record XML DTD
    (not EAD)
  • Matchpoint decided to load contents of Calm
    RefNo field to
  • Millennium 001 indexed in o
  • Also had to consider
  • Hierarchical record level to harvest
  • Navigation between the two systems
  • Millennium parameters

21
Decision Record level to harvest
  • A Collection could consist of more than 40
    boxes. Must have
  • 11 record relationship to make requesting and
    retrieval work
  • Decision to exclude archives Collection records
    use Component
  • level records. Each of these represent 1 item
    (box, folder, piece)
  • and links to a single bib records with attached
    item for circulation
  • in Millennium

22
Decision Navigation
  • Archivists wanted the archives (CALM) interface
    to offer
  • the main search route for Western Archives MSS
  • User is taken from CALM record into Millennium to
    place
  • their request then back to their CALM record to
    continue
  • browsing their hit list - two links were
    needed
  • Forward runs cgi script to search Millennium
    for
  • corresponding bib record
  • Back 856 with URL link (can be inserted by
    Harvester)

23
Example Links
  • Forward cgi script runs search of Millennium
    o index for
  • match on CALM RefNo value
  • http//catalogue.wellcome.ac.uk/search/o?SEARCHPP
    CRI2FA2F12F22F8
  • Back RefNo PP/CRI/A/1/2/8 built into OAI record
    URL linking
  • to CALM web front end - RefNo value built into
    search string
  • http//archives.wellcome.ac.uk/DServe/dserve.exe?
    dsqIni
  • Dserve.inidsqAppArchivedsqCmdshow.tcl dsqDb
  • CatalogdsqPos0dsqSearch((text)'PP/CRI/A/1/2/8
    ')

24
Calm XML export file
  • lt?xml version"1.0" encoding"utf-8" ?gt
  • - ltrecordgt
  • - ltDScribeRecordgt
  •   ltRecordTypegtComponentlt/RecordTypegt
  •   ltIDENTITY /gt
  •   ltRefNogtMS4385/4404lt/RefNogt
  •   ltAltRefNogtMS.4404lt/AltRefNogt
  •   ltPreviousNumbers /gt
  •   ltTitlegtNotes and extracts on Chemistry,
    Volumetric Analysis, (etc.)lt/Titlegt
  •   ltDategtc. 1865lt/Dategt
  •   ltLevelgtItemlt/Levelgt
  •   ltExtentgt1 volumelt/Extentgt
  •   ltUserText5gtBentley Houselt/UserText5gt
  •   ltLocation /gt
  •   ltUserText3gtWestern MSS series 3 -
    Requestablelt/UserText3gt
  •   ltUserWrapped9 /gt
  •   ltUserText6 /gt
  •   ltUserText7 /gt

25
Mapping Calm XML to Marc21
  • Fields tags used 001, 008, 245, 260, 500, 506,
    655, 856
  • And 949 to make the item. Harvester inserts a
    99x tag with load
  • identification code e.g. CALM20040820225128
  • Found that Component records do not have author
    which is
  • only held at Collection level but not a problem
  • Mock bib and item records keyed to Millennium
    to
  • - demonstrate navigation agree content with
    team
  • - act as a benchmark when harvested records
    loaded

26
XSLT eXtensible Style Language Transformation
  • Used XSLT to split the XML single output file
    into 48,000 component
  • .xml records using the ltDescribeRecordgt as record
    delimiter
  • and then transform them to MARC21 OAI records
    listed to
  • XML Harvester by our OAI repository
  • The OAI repository installed on the CALM staging
    server
  • uses the University of Illinois Provider service
    tool - freeware

27
Millennium parameters
  • To cope with open v closed archive
    collections
  • new codes were added to archives records and
    mapped to
  • new Millennium branch codes which would trigger
    Millcirc rules
  • New branch codes added to Request Rules,
    Determiner Table,
  • WWWOPTIONS, Locations served
  • New MATTYPE to exclude Western Mss and archives
    from the
  • Asian Mss scope

28
Config file for archives record harvest
  • _at_LOGLEVELCONFIG
  • _at_DBNAMECALM
  • _at_URLhttp//wtcalm02/oai/oai.asp
  • _at_CREATEOVERLAYFROMURItrue
  • _at_9XXMARCTAG991
  • _at_USEOAItrue
  • _at_DATE20000606000000
  • _at_OAIFROMEMAILm.savage-jones_at_wellcome.ac.uk
  • _at_SHOWMETADATAtrue

29
Management interface for XML Harvester
30
Archive record Request link to Web OPAC
31
Harvested archive record in Millennium
32
Patron login screen to place request
33
Confirmation of request
34
Interoperation sought with image system
  • To integrate MedPhoto, a bespoke photo library
    system,
  • and Millennium for seamless display and ordering
    of images
  • MedPhoto holds images and records for more than
    60,000 items
  • catalogued in Millennium Iconographic
    collection, archives
  • manuscripts, rare books etc.
  • Specific need for Millennium User to see images
    associated with
  • library objects

35
Media management interface
36
Config file for image URL harvest
  • _at_LOGLEVELCONFIG
  • _at_DBNAMEMEDPHOTO
  • _at_URLhttp//aquarius.wellcome.ac.uk6969/ixbin/hix
    serv
  • _at_RECID_MARCTAG001
  • _at_CREATEOVERLAYFROMURItrue
  • _at_9XXMARCTAG991
  • _at_USEOAItrue
  • _at_REQUIRE_EADIDfalse
  • _at_DATE20000606000000
  • _at_OAIFROMDATE20050701000000
  • _at_OAIUNTILDATE20050731000000
  • _at_OAIFROMEMAILm.savage-jones_at_wellcome.ac.uk
  • _at_OAISETbib

37
Selective Harvesting images
  • Harvest full bib set and load to Millennium
    populating 962s
  • then each month request list of all new image
    URLs created since
  • the last harvest with a Millennium .b number in
    their record.
  • lthttp//medphoto.wellcome.ac.uk6969/ixbin/hixserv
    ?verbListRecordsmeta
  • dataPrefixmarc21setbibfrom2005-05-01until20
    05-05-31gt
  • (for records in May)
  • lthttp//medphoto.wellcome.ac.uk6969/ixbin/hixserv
    ?verbListRecordsmeta
  • dataPrefixmarc21setbibfrom2005-06-01until20
    05-06-30gt
  • (for records in June and so on)

38
Harvesting Image OAI repository
  • OAI repository built by SSL on MedPhoto server
  • Metadata matchpoint .b bib record no. is common
    element
  • Between Millennium and MedPhoto
  • XML Harvester selectively requests record set
    bib which all
  • Have .b nos, parses the returned list of MARC21
    OAI records
  • and creates a file of MARC records for loading
  • Matches on .b and overlays inserting 962 for each
    image
  • 962u holds URL for thumbnail and e holds
    launchpadURL

39
MARC21 record ready to load
  • File Name DONE-MEDPHOTO_20050601192747.marc
    (411,392 bytes) Offset
  • 256 Blocks 1 - 2
  • LEADER 00403nam a2200085uu 4500
  • DIRECTORY
  • 001000900000 035001500009 856008000024
    962018500104 991002800289
  • TAGS
  • 1 000 00403nam a2200085uu 4500_at_
  • 2 001 L0027751_at_
  • 3 035 a.b12857890_at_
  • 4 856 4 1
  • uhttp//medphoto.wellcome.ac.uk/ixbin/imageserv?M
    IDMIROL0027751zView image_at_
  • 5 962
  • a000000URLb0000000000000000000tImagev
    nuhttp//medphoto.wellcome.ac
  • .uk/ixbin/hixclient.exe?MIROPACL0027751ehttp//m
    edphoto.wellcome.ac.uk/ixbin/i
  • mageserv?MIROL0027751_at_
  • 6 991 aMEDPHOTO22820050601192747_at_

40
(No Transcript)
41
Example with t default
42
(No Transcript)
43
Launch pad
  • We saw an opportunity for further integration
    used
  • Intermediate screen URL delivered by MedPhoto
    repository and
  • loaded to 962 e
  • User can hotlink from this launch pad into
    image system
  • to register, use a light box, email, download or
    order the
  • image online from the image system before
    returning to
  • Web OPAC

44
(No Transcript)
45
(No Transcript)
46
What we used
  • XML Harvester product (III)
  • OAI repository software
  • VBScript for file splitting operation
  • Instant Saxon (command line XSLT processor)
  • Microsoft MSXML core services (e.g. ver 5)
  • Media Management for 962 (or load URLs to 856)
  • Three OAI-PMH compliant library systems
  • Shared Record IDs as matchpoints
  • Some experience of working with stylesheets
  • Some experience of load tables and record loading

47
Work in progress
  • Harvesting legacy catalogues/XML for other Asian
    MSS
  • e.g.Iskander and Jain project (with Oxford
    University)
  • Complete testing and batch loading of 60,000
    thumbnail and
  • launchpad URLs to 962s
  • Establish routines to manage updates for new,
    deleted
  • or amended records utilise OAI-PMH selective
    harvesting
  • Further automation of routines where practicable

48
Wish List/Enhancements
  • Global edit for 962 tag
  • More documentation for XML Harvester
  • Access to underlying harvester parameters e.g.
    for XSLT
  • processor and XML parser
  • Automation of selective harvesting for maintenance

49
Useful links
  • XML http//www.w3.org/XML
  • EAD http//www.loc.gov/ead/
  • OAI software http//oai.grainger.uiuc.edu/projecti
    nfo.htm
  • XSLT http//saxon.sourceforge.net/saxon6.4.3/insta
    nt.html
  • http//www.openarchives.org/OAI/openarchivesprotoc
    ol.html
  • http//www.openarchives.org/OAI/2.0/guidelines-mar
    cxml.htm
  • OAI tutorial http//www.oaiforum.org/tutorial
  • OAI repository testing http//re.cs.uct.ac.za/

50
Some example records
  • http//catalogue.wellcome.ac.uk/recordb1465521
  • http//catalogue.wellcome.ac.uk/recordb1580232
    http//catalogue.wellcome.ac.uk/recordb1313568 
  • http//catalogue.wellcome.ac.uk/recordb1613633
  • http//catalogue.wellcome.ac.uk/search/o?SEARCHPP
    CRI2FA2F12F22F8

51
Excellent XML systems interoperability at the
Wellcome Library
  • Thanks for your attention
  • Margaret Savage-Jones
  • Library Systems Administrator
  • m.savage-jones_at_wellcome.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com