Extracting XML from Unicorn with OAI and SRU - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Extracting XML from Unicorn with OAI and SRU

Description:

cgi-bin/OAI20/catalog? verb=Identify _ http://www.biomedcentral.com/ oai/1.1/bmcoai.asp? ... http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 37
Provided by: cte92
Category:
Tags: oai | sru | xml | bin | cgi | extracting | unicorn

less

Transcript and Presenter's Notes

Title: Extracting XML from Unicorn with OAI and SRU


1
Extracting XML from Unicorn with OAI and SRU
  • European Unicorn User Group Conference
  • Glasgow Caledonian University
  • September 7th 8th, 2006

Benoit PAUWELS Université Libre de Bruxelles
(ULB) Brussels
2
Agenda
  • Introduction Unicorn interfaces
  • Part 1 An OAI frontend for Unicorn
  • Part 2 An SRU frontend for Unicorn
  • Short description of OAI and SRU protocols
  • Overview of technical implementation
  • Use cases and demos

3
Introduction
  • OAI and SRU are open protocols that permit
    exchange of metadata between information systems
  • Well-known Unicorn interfaces
  • Unicorn API server
  • Unicorn Webcat/iBistro/iLink server
  • Unicorn Z39.50 server
  • All comply to the philosophy of request/response
    sequences

4
Unicorn interfaces API server
Catalogue database Records and indexes
API server
TCPIP/Socket API request
  • SirsiDynix
  • Character client
  • C Workflows client
  • Java Themes client

TCPIP/Socket API response API datacodes/values
Client system
Unicorn server
Communication protocol TCPIP/Socket Information
exchange protocol proprietary SirsiDynix API
requests/responses Returned record
structure proprietary SirsiDynix format
(data-codes and -values)
5
Unicorn interfaces iLink
Catalogue database Records and indexes
iLink
Web Server
HTTP iLink request (URL)
  • Any Web browser

HTTP HTML page HTML
Client system
Unicorn server
Communication protocol HTTP Information exchange
protocol URL requests / HTML responses Returned
record structure HTML
6
Unicorn interfaces Z39.50
Catalogue database Records and indexes
Z39.50
Z39.50 Z39.50 request
  • Any Z3950 client

Z3950 Z3950 response MARC21
Client system
Unicorn server
Communication protocol Z39.50 specific Informatio
n exchange protocol Z39.50 specific Returned
record structure typically MARC21
7
Unicorn interfaces
  • API Proprietary
  • low interoperability level
  • HTML Record data not well structured
  • low reusability level
  • Z39.50 Protocol specific
  • more difficult to implement (high learning curve)
  • Z39.50 is statefull
  • ?Difficult to integrate into todays web services
    environments
  • ?communication use HTTP
  • ?information exchange use open protocols (like
    OAI and SRU)
  • ?record data structure use XML (according to
    well-defined XML Schema)

8
2 new Unicorn interfaces
  • HTTP / Open / XML
  • OAI-PMH Open Archives Initiative Protocol for
    Metadata Harvesting
  • SRU Search and Retrieve via URL

9
OAI-PMH the protocol
Document Archive
OAI Frontend
Web Server
HTTP embedded OAI requests
HTTP embedded OAI responses
Service Provider
Data Provider
10
OAI-PMH the protocol
  • Harvester collects metadata from archives
  • Stateless protocol sequence of OAI
    requests/responses over HTTP
  • Just harvesting -- NOT searching

11
OAI-PMH the protocol
  • OAI requests
  • HTTP GETPOST requests
  • Syntax
  • BASE URL
  • host port path of OAI request handler
  • keyvalue pairs
  • Examples
  • http//www.cible.ulb.ac.be80/cgi-bin/OAI20/catal
    og?verbIdentify _
  • http//www.biomedcentral.com/oai/1.1/bmcoai.asp?
    verbGetRecordidentifieroaibmc1471-2105-1-1me
    tadataPrefixoai_dc

12
OAI-PMH the protocol
  • OAI responses
  • XML encoded bytestreams, containing the records
  • Record triplet
  • header (unique OAI identifier)
  • metadata
  • about
  • Metadata schemes
  • XML Schema
  • Minimum unqualified Dublin Core
  • Community specific
  • Example of a record (catkey 450000 from ULB
    catalogue)
  • oai_dc marc21 umods

13
OAI-PMH the protocol
  • Simple 6 OAI requests/responses
  • Identify
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?v
    erbIdentify _
  • ListMetadataFormats identifier
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?v
    erbListMetadataFormats _
  • ListSets
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?v
    erbListSets _
  • GetRecord identifier, metadataPrefix
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?
    verbGetRecordidentifieroaiulbcat245000metada
    taPrefixmarc21 _

14
OAI-PMH the protocol
  • Simple 6 OAI requests/responses
  • ListRecords metadataPrefix, from,until,set
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?
    verbListRecordsmetadataPrefixoai_dc _
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?
    verbListRecordsmetadataPrefixmhld21setelper
    _
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?
    verbListRecordsmetadataPrefixmarc21from2006-0
    8-01 _
  • ListIdentifiers metadataPrefix, from,until,set
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?
    verbListIdentifiersmetadataPrefixoai_dc _

15
OAI frontend for Unicorn
  • Implementation of the data provider functionality
    (2001)
  • http//www.openarchives.org/tools/tools.htmlpick
    a template and interface with Unicorn through
    Unicorn database tools
  • Our choice Object Oriented Perl frontend (H.
    Suleman Virginia Tech) _

16
OAI frontend for Unicorn
17
OAI frontend for Unicorn
  • Example implementation of the GetRecord request
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?
    verbGetRecordidentifieroaiulbcat245000metad
    ataPrefixoai_dc
  • 1. Get metadata from Unicorn for catkey 245000
  • record echo catkey catalogdump -of
    filtermarc -iALL -od -Ds _
  • _at_dates split(\,echo catkey selcatalog
    -iK -opr)
  • 2. Convert ANSEL character set into ISO-LATIN-1
  • 3. Map from MARC to oai_dc _
  • 4. Format into XML

18
OAI frontend for Unicorn
  • Example implementation of the set parameter of
    the ListRecords request
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?
    verbListRecordsmetadataPrefixoai_dcsetelper
  • Precompile set as a file of catkeys
  • name of file  name of set_catkeys 
  • einstein_albert_catkeys
  • elper_catkeys
  • sd_catkeys
  • all_catkeys
  • through periodic execution of  mkoaisets 
    custom report

19
OAI frontend for Unicorn
  • Example implementation of the from/until
    parameters of the ListRecords request
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?
    verbListRecordsmetadataPrefixoai_dcfrom2006-
    08-01until2006-08-31
  • BRS index on creation/modification date?
  • Every Unicorn record that gets created or
    modified is touched in the textedit and
    browsedit directories
  • Custom report cadutext
  • saves catkeys to ltudgt/Savedkeys/adutext/rptid
  • adds line rptiddatestatus to
    ltudgt/Lastruns/cadutext
  • Example  from2006-08-01until2006-08-31 
  • obtain report ids for all runs of cadutext after
    2006-08-01 and before 2006-08-31 from the file
    ltudgt/Lastruns/cadutext
  • for each of these report ids obtain catkeys from
    ltudgt/Savedkeys/adutext/rptid and save them to
    randomnumber_catkeys file
  • sort and uniq the randomnumber_catkeys file

20
OAI frontend for Unicorn
  • Limitations of implementation
  • ListRecords/ListIdentifiers
  • The from and until parameters are not permitted
    if the set parameter is given on the request
  • The from and until parameters are permitted if
    the set parameter is not given on the request,
    but their values should fall within a certain
    date range (at this moment arbitrarily set to
    today - 2 months and today)
  • Deleted records
  • Complete source code and documentation available
    on the API Repository (http//sirsiapi.org)

21
OAI frontend - use cases _at_ ULB
Use case 1 Vlink - OpenURL resolver
system joint project with Vrije Universiteit
Brussel (VUB)
22
(No Transcript)
23
OAI frontend - use cases _at_ ULB
  • Use case 1 Vlink - OpenURL resolver system
  • OpenURL sent from iLink
  • http//bibdev.vub.ac.be/cgi-bin/openurlulb?
    sidULBWebcatidoaiulbcat617924
  • This OpenURL does not contain enough metadata for
    the specific item gt Vlink does a fetch back to
    Unicorn through an OAI GetRecord request to
    obtain a full MARC21 bibliographic description
  • http//www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?
    verbGetRecordidentifieroaiulbcat617924metad
    ataPrefixmarc21

24
OAI frontend - use cases _at_ ULB
  • Use case 1 Vlink - OpenURL resolver system
  • Feed Vlink Knowledge Base through OAI harvesting

25
OAI frontend - use cases _at_ ULB
  • Use case 2 Unicat - Virtual Union Catalog of
    Belgium

26
SRU the protocol
SRU Frontend
Web Server
Catalogue database Records and indexes
HTTP SRU request
HTTP SRU response XML
Client System
Unicorn Server
Communication protocol HTTP Information exchange
protocol SRU Returned record structure XML
27
SRU the protocol
  • Client searches and retrieves metadata records
    from an archive
  • Stateless protocol sequence of SRU
    requests/responses over HTTP
  • Search and Retrieve (lt-gt OAI harvesting)

28
SRU the protocol
  • SRU requests
  • HTTP GET requests
  • Syntax
  • BASE URL
  • host port path of SRU request handler
  • keyvalue pairs
  • 3 possible requests (operations)
  • explain
  • serves to record facilities available at an SRU
    server
  • used by clients to self-configure
  • returned explain record is in XML and follows the
    ZeeRex Schema
  • Example http//z3950.loc.gov7090/voyager?version
    1.1operationexplain _
  • scan
  • allows the client to request a range of the
    available terms at a given point within a list of
    indexed terms
  • enables clients to present an ordered list of
    values and, if supported, how many hits there
    would be for a search on that term
  • searchRetrieve

29
SRU the protocol
  • searchRetrieve operation
  • searchRetrieve (principal) parameters
  • Version (of the request) current protocol
    version 1.1
  • query query expressed in CQL
  • startRecord position within the sequence of
    matched records of the first record to be
    returned
  • maximumRecords number of records requested to be
    returned
  • recordSchema schema requested for the records to
    be returned
  • stylesheet URL for an xml stylesheet. The client
    requests that the server simply return this URL
    in the response.
  • CQL
  •  Traditionally, query languages have fallen
    into two camps Powerful, expressive languages,
    not easily readable nor writable by non-experts
    (e.g. SQL, PQF, and XQuery)or simple and
    intuitive languages not powerful enough to
    express complex concepts (e.g. CCL and google).
    CQL tries to combine simplicity and intuitiveness
    of expression for simple, every day queries, with
    the richness of more expressive languages to
    accomodate complex concepts when
    necessary. (http//www.loc.gov/standards/sru/cq
    l)

30
SRU the protocol
  • searchRetrieve operation
  • Examples of CQL queries
  • dinosaurtitle "complete dinosaur"title exact
    "the complete dinosaur"dinosaur not reptile
    dinosaur and bird or dinobird publicationYear lt
    1980
  • title all "complete dinosaur"
  • title contains all of the words complete, and
    dinosaur
  • title any "dinosaur bird reptile"
  • title contains any of the words dinosaur,
    bird, or reptile
  • ribs prox/distancelt5 chevrons
  • a more specific proximity query ribs within 5
    words of chevrons

31
SRU the protocol
  • searchRetrieve operation -- examples
  • http//bib49.ulb.ac.be9000/Cible?version1.1oper
    ationsearchRetrievequeryauthoreinstein _
  • http//bib49.ulb.ac.be9000/Cible?version1.1oper
    ationsearchRetrievemaximumRecords10startRecord
    1queryauthoreinstein _
  • http//bib49.ulb.ac.be9000/Cible?version1.1oper
    ationsearchRetrievemaximumRecords10startRecord
    1queryauthoreinsteinrecordSchemadc _
  • http//bib49.ulb.ac.be9000/Cible?version1.1oper
    ationsearchRetrievemaximumRecords10startRecord
    1queryauthor all "einstein albert _
  • http//bib49.ulb.ac.be9000/Cible?version1.1oper
    ationsearchRetrievemaximumRecords10startRecord
    1querytitle all "einstein albert _
  • http//bib49.ulb.ac.be9000/Cible?version1.1oper
    ationsearchRetrievemaximumRecords10startRecord
    1querytitle all "einstein albertstylesheetht
    tp//bib49.ulb.ac.be/cibleCanevas.xsl _
  • http//bib49.ulb.ac.be9000/Cible?version1.1oper
    ationsearchRetrievemaximumRecords10startRecord
    1querytitle all "einstein albertstylesheetht
    tp//bib49.ulb.ac.be/cibleTypo3.xsl _

32
SRU frontend for Unicorn
SRU Frontend
Web Server
Catalogue database Records and indexes
HTTP SRU request
HTTP SRU response XML
Unicorn Server
Client system
33
SRU frontend for Unicorn
Z39.50 Frontend
Catalogue database Records and indexes
SRU/Z39.50 Gateway
Web Server
HTTP SRU request
Z3950 Z3950 request
HTTP SRU response XML
Z3950 Z3950 response
Unicorn Server
SRU/Z39.50
Client system
34
SRU frontend for Unicorn
  • SRU/Z39.50 Gateway YAZ Proxy (Index Data)
  • Implemented at ULB 7/2006 (2 days)
  • config.xml
  • lttarget name"cible" default"1"gt
  • lturlgtbib7.ulb.ac.be2200lt/urlgt
  • ltxiinclude href"explain.xml"/gt
  • ltcql2rpngtpqf.propertieslt/cql2rpngt
  • lt/targetgt
  • lttarget nameslavko" default"1"gt
  • lturlgtvelma.library.mun.ca2200lt/urlgt
  • ltxiinclude href"explain.slavko.xml"/gt
  • ltcql2rpngtpqf.slavko.propertieslt/cql2rpngt
  • lt/targetgt
  • explain.xml
  • ZeeRex XML record as response to explain
    operation
  • pqf.properties
  • specifies the mapping of various CQL indexes,
    relations, etc. into Type-1 query attributes

35
SRU frontend for Unicorn
  • YAZ Proxy
  • http//bib49.ulb.ac.be9000/Cible?version1.1ope
    rationsearchRetrievemaximumRecords10startRecor
    d1querytitle all "einstein albertstylesheet
    http//bib49.ulb.ac.be/cibleTypo3.xsl _
  • http//bib49.ulb.ac.be9000/Slavko?version1.1op
    erationsearchRetrievemaximumRecords10startReco
    rd1querytitle all "einstein
    albertstylesheethttp//bib49.ulb.ac.be/cibleTy
    po3.xsl _

36
SRU frontend use case _at_ ULB
  • Seamless integration of catalog searches in CMS
  • Typo3
  • Example
  • HTML page containing biography of famous belgian
    historian Henri Pirenne
  • frame pointing to the following URL
  • http//bib49.ulb.ac.be9000/Cible?
    version1.1operationsearchRetrievemaximumRecord
    s10startRecord1querypirenne20and20epub-dnu
    -stylesheethttp//bib49.ulb.ac.be/cibleTypo3.x
    sl
  • Project
  • Unicorn contains descriptions of databases,
    websites, etc with local thematic classification
    codes in 653
  • create thematic websites within our CMS,
    containing frames that list available databases
    per theme
Write a Comment
User Comments (0)
About PowerShow.com