Qualified Dublin Core Using RDF for SciTech Journal Articles DC2001 International Conference on Dubl - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Qualified Dublin Core Using RDF for SciTech Journal Articles DC2001 International Conference on Dubl

Description:

... scale, multi-publisher, markup-based full-text journal testbed. ... Data exchange, as with Open Archive Initiative Protocol for Metadata Harvesting (OAI PMH) ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 15
Provided by: thoma339
Category:

less

Transcript and Presenter's Notes

Title: Qualified Dublin Core Using RDF for SciTech Journal Articles DC2001 International Conference on Dubl


1
Qualified Dublin Core Using RDF forSci-Tech
Journal ArticlesDC-2001International
Conference on Dublin Core and Metadata
Applications, October 22-26, 2001National
Institute of Informatics, Tokyo,
Japanhttp//dli.grainger.uiuc.edu/Publications/D
C2001/
  • Thomas G. Habing (thabing_at_uiuc.edu)
  • Timothy W. Cole (t-cole3_at_uiuc.edu)
  • William H. Mischo (w-mischo_at_uiuc.edu)
  • University of Illinois at Urbana-Champaign

2
History and Objectives of the Testbed
  • Funded 1994-98 under DLI-I (NSF/NASA/DARPA).
  • Continued 1998-2001 under CNRIs D-Lib Test
    Suite.
  • Construct large-scale, multi-publisher,
    markup-based full-text journal testbed.
  • Investigate processing, indexing, normalization,
    retrieval, rendering and linking.
  • Study end-user searching behavior and needs.

3
Description of Testbed
  • Testbed contains 65,000 articles from 50
    journals.
  • Received from publishers as SGML (various DTDs).
  • Converted to well-formed XML.
  • Content support from AIP, APS, ASCE, IEE, ASM,
    ACM, Elsevier.
  • Additional support from IEEE, NRL, NTT Learning
    Systems.

4
Usage of Metadata in Illinois Testbed
  • Facilitate resource discovery across
    heterogeneous sources through normalization.
  • Common, easily displayable search results.
  • Add value to the original object reference
    linking, links to alternate formats and A I
    services.
  • Data exchange, as with Open Archive Initiative
    Protocol for Metadata Harvesting (OAI PMH).

5
(No Transcript)
6
(No Transcript)
7
Metadata Extraction Process
  • Metadata is derived from full-text using XSLT.
  • One-to-one mappings.
  • select//titlegrp/title maps to ltdctitlegt.
  • Complex mappings
  • Tables of Contents, Literal Markup such as
    MathML.
  • Advanced XSLT techniques
  • JavaScript functions are used for some
    formatting.
  • The document(url) function is used to merge XML
    from other sources, such as CrossRef, into the
    metadata.
  • See paper for sample XSLT code.

8
Other uses of XSLT
  • Dumb-down to unqualified DC.
  • Transform metadata to HTML for display.
  • Generate RDF triples for use in a RDBMS.

9
Dumb-down XSLT
  • ltxslvariable name"DCQ" select"document('dcq.rdf
    s')"/gt
  • ltxsltemplate name"dumb_down_dcq"gtltxslvariable
    name"SubPropertyOf select"DCQ//rdfProperty
    _at_rdfIDlocal- name(current())/rdfssubProper
    tyOf/_at_rdfresource"/gtltxslvariable name"DCTag"
    select"substring- after(SubPropertyOf,'xmlns
    _dc')"/gtltxslif test"DCTag"gt
    ltxslcall-template name"dumb_down"gt
    ltxslwith-param name"Tag" select"DCTag"/gt

10
Local Extensions to DCQ
  • Qualified DC was not adequate for our needs.
  • Various DC working groups provided some guidance.
  • We extended DCQ in three areas
  • Citation-related extensions.
  • Agent-related (creator) extensions.
  • Type and encoding scheme extensions.

11
Citation-related Extensions
  • ltuiLibcitationgtA. Author. "A Title" Some Jrnl.
  • ltdcidentifiergt ltuiLibOpenURL-OBJECT-METADATA-
    ZONEgt ltrdfvaluegtgenrearticleampaulastAu
    thor
  • ltdcqisPartOfgt ltrdfDescription
    rdfID"JournalIssue"gt ltdcidentifiergtltuiLib
    ISSNgt ltrdfvaluegt1234-5678lt/rdfvaluegt
    lt/uiLibISSNgtlt/dcidentifiergt
    ltdctitlegtSome Journallt/dctitlegt

12
Agent-related Extensions
  • Based on DC Agent Qualifiers, Working Draft.
    1999.
  • ltdccreatorgtltrdfSeqgtltrdfligt ltdcaPerson
    rdfID"AUTHOR-1"gt ltdcaagentnamegtltdcaFNFgt
    ltrdfvaluegtAuthor, A. N.lt/rdfvaluegt
    lt/dcaFNFgtlt/dcaagentnamegt
    ltdcaagentaffiliationgtBig
    Universitylt/dcaagentaffiliationgt
    ltdcaagentidentifier rdfresource"mailto
    ana_at_big.edu"/gt

13
Type and Encoding Extensions
  • Extensions to DCMI Type Vocabulary.
  • ltdctype rdfresource "http//dli.grainger.uiu
    c.edu/uiLibbook"/gt
  • ltdctype rdfresource "http//dli.grainger.uiu
    c.edu/uiLibinjrnl"/gt
  • Additional Encoding Schemes.
  • PACS, ACMCCS, ISSN, CODEN, ACM_JRNL_CODE.

14
Conclusions
  • Using DCQ/RDF for sci-tech journal articles is
    viable
  • Steep learning curve for RDF
  • Dumbing-down DCQ/RDF is complex
  • Cannot ignore non-DC tags, RDF Schema is required
  • DCQ is missing many properties and types required
    for complete serials descriptions
  • Utility of RDF remains uncertain
Write a Comment
User Comments (0)
About PowerShow.com