XML Technologies and Scholarly Communication - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

XML Technologies and Scholarly Communication

Description:

... will be viewed in a Web browser, cell phone, or read by a parser on a household ... parts: (the identifier and a directory system) and a third logical ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 49
Provided by: twc2
Category:

less

Transcript and Presenter's Notes

Title: XML Technologies and Scholarly Communication


1
XML Technologies and Scholarly Communication
  • William H. Mischo
  • w-mischo_at_uiuc.edu
  • Grainger Engineering Library Information Center
  • University of Illinois at Urbana-Champaign
  • The XML Workshop for Electronic Journals
  • March 9, 2001

2
Outline
  • The Digital Library.
  • Scholarly Information Environment.
  • Distributed Information Environment.
  • Illinois Collaborative projects.
  • XML Technologies.
  • Metadata and Linking Technologies.
  • Publishing Trends.
  • Future Issues.

3
Topics
  • Enabling technologies for scholarly
    communication.
  • Vision(s) of scholarly communication (roles of
    intermediaries).
  • Relationships between scholarly communication
    trends and XML technologies.
  • Middleware tools DOI, OAI, XML.
  • Portal and gateway tools, innovative search and
    discovery.

4
Overview
  • We now have the tools to pursue the grand
    challenges of Information Retrieval
  • Standard retrieval environment (Web) and
    interface/client (Browser).
  • Ability to perform standardized searching (HTTP
    Post/Get, SQL, Z39.50).
  • Standard language for describing and transforming
    content and metadata (XML, XSLT, DC, RDF,
    Schemas).
  • Standard transport mechanisms to connect
    heterogeneous content (HTTP, B2B, SOAP).

5
XML and Publishers
  • InfoWorld article on the Seybold Seminars
    Publishing 2000, Boston, February 2, 2000.
  • Tim Gill of Quark the use of XML could lead to
    a drop in the cost of Web publishing by 30 to
    50 and a significant reduction in the time it
    takes to produce sites.
  • Gill I dont believe that there is any
    innovation in print that is going to save us even
    10 in costs.
  • Issues and Challenges remain.

6
The Digital Library
  • Digital, Virtual, Electronic Library as
    network-based library without regard to place and
    time.
  • Tendency to apply term to collections and
    resources.
  • Digital Collections vs. Digital Library.
  • Emphasis on the integration of collections and
    services.
  • Application of standards and protocols is
    important.

7
Scholarly Communication Overview
  • Web-based and publisher-centric.
  • Heterogeneous Distributed Repositories.
  • Value-added services and branding of journals.
  • Reciprocal relationships between publishers.
  • Cooperation on linking standards (DOI, CrossRef).
  • Alternative publishing models - Academia,
    Preprint Servers, disintermediation.

8
Full-Text Technologies
  • Continuum of Web-Enabled technologies -- all
    presently being utilized.
  • Evolving technologies and standards.
  • Role and history of markup.
  • XML its role and importance.
  • The Smart Document.

9
(No Transcript)
10
Distributed Information Model
  • Diverse information environment in which we
    operate.
  • Multiple elements, relationships and nodes.
  • Need for gateway, interface, and navigation
    tools.
  • Need for document representation, transmission,
    linking, and retrieval middleware tools and
    standards.
  • Role of A I Services.

11
(No Transcript)
12
Distributed Repository Issues
  • Integration of discrete publisher repositories,
    locally loaded full-text, local and remote A I
    services, OPAC, Web resources, and local data.
  • Issues for user access
  • need to identify appropriate publisher
    repository, but presently interfaces are
    different and full-text and controlled vocabulary
    searching often not offered.
  • A Is not full-text but offer controlled
    vocabulary, no links to full-text repositories.

13
Distributed Repository - Needs
  • Integration of discrete publisher repositories,
    locally loaded full-text, local and remote A I
    services, OPAC, Web resources, and local data.
  • Support simultaneous searching of A I Services,
    Distributed Repositories, OPACs, Web search
    engines, local files. Integrate TOC, full-text.
  • Remote Reference 24 X 7.
  • Metadata harvesting, archiving.
  • Local Resolver services for locally loaded or
    Aggregator Resources.

14
Illinois Testbed Project
  • Funded under DLI-I by NSF, DARPA, and NASA,
    1994--1998. Awards made to 6 universities.
  • Large-scale Testbed, Distributed Repository
    models, evaluation, Web software.
  • Funded under CNRI D-Lib Test Suite Program,
    19982001.
  • Collaborating Partners Program. AIP, APS, ASCE,
    IEE, NRL, ASM, ACM, NTT Learning Systems,
    Elsevier.

15
Illinois Testbed
  • American Institute of Physics--APL, JAP, RSI
  • 16,000 articles, 1995--.
  • American Physical Society--PRL
  • 10,000 articles, 1995--, weekly updates.
  • ASCE Journals (25 titles)
  • 9,000 articles, 1995--.
  • IEE Proceedings and Electronics Letters
  • 8,500 articles, 1993--.
  • ASM (American Society for Materials) Handbook.
  • ACM (Association for Computing Machinery)
    Transactions.
  • Elsevier Science.

16
Project Issues
  • Evolution of the Document.
  • Distributed information environment.
  • Use of Metalanguages Transformations (SGML,
    XML).
  • Searching over full-text of journals vs. document
    surrogates in A I format.
  • Rendering and styling (SGML, XML, MathML).
  • Dynamic metadata for normalization, linking.
  • Breadth and depth of collections.
  • User needs.

17
Accomplishments
  • Process retrieve from multiple publishers
    heterogeneous DTDs.
  • Cross-repository searching (Testbed D-LIB Test
    Suite).
  • SGML to XML Conversion.
  • Use of XML, XSLT, CSS, Metadata Schemas.
  • Transformation rendering, including
    Mathematics.
  • Dynamic Linking Forward/Backward, from/to A I
    Services. Local Resolver.

18
Ongoing Investigations
  • Support simultaneous searching of A I Services,
    Distributed Repositories, enhanced navigation,
    expanded gateway functions.
  • Metadata harvesting replicative or distributed
    Approaches.
  • Z39.50 protocols, HTTP harvesting, Spider
    technology.
  • Archiving.
  • Local Resolution of resources.

19
XML (eXtensible Markup Language)
  • Like SGML, a Data Description Language
    (Metalanguage).
  • Subset/version of SGML.
  • Allows fine-granularity markup of content and
    structure. Author can create their own elements
    (extensible).
  • Tags define the structure of document not
    presentation format.
  • All XML must have well-formed document structure,
    but may be validated against a DTD or Schema.
  • Compatible with relational DBs.

20
XML Features
  • The milestones in document description and
    transmission ASCII, TCP/IP, HTTP and HTML, XML.
    Web Programmability.
  • DTD not required with XML. Needed if internal
    entities.
  • Use of Document Object Model (DOM).
  • Technology approach from Web developers
    standpoint XML data, CSS presentation layer,
    XSLT to transform the structure (view) of the
    data/document.

21
Role of XML
  • If you ask 20 people in the industry, what is
    XML? Youll get 20 different answers Dale
    Fuller, CEO, Inprise Corporation.
  • Vendor-Neutral, platform-independent structured
    information standard.
  • Document representation and interchange Standard.
  • Applications can externalize their data as XML.

22
XML Parser APIs Tree-Based and Event-Based
  • DOM (Document Object Model).
  • DOM Level 1 and Level 2 W3C recommendation.
    Widely implemented, Tree-Based. Hierarchy of
    nodes. Loads entire document into memory. Level 2
    adds namespace support, traversal, stylesheets,
    events, triggers. Level 3 working draft. DOM HTML
    candidate. Parsers allow developers to iterate
    through documents, change document content.
  • SAX (Simple API for XML).
  • Open-source, XML-DEV, not W3C. Event-based, fires
    events as it reads document, need not load entire
    document into memory. Good for single-pass
    processing. Xerces, XML4C, Sun Project X
    (Crimson).

23
XML Linking
  • XML Base http//www.w3.org/TR/xmlbase
  • Permits use of relative URI path prefixes. Can
    then shorten references.
  • XLink http//www.w3.org/TR/xlink/
  • Method for specifying navigational links. Allows
    enforcement of specific path order through links.
    xlinktypesimple corresponds to HTML ltagt or
    ltimggt tags.
  • XInclude http//www.w3.org/TR/xinclude
  • Copies entire XML documents or selected portions
    into current document. Candidate recommendation.
    Uses XPath and XPointer to specify document
    elements to include.
  • XPointer http//www.w3.org/TR/xptr
  • Uses XPath to identify portion of a document.
    Permits string searches and range specifiers.

24
XML Schema and Structure
  • DTD
  • Original schema representation, defines
    structural rules for a class of XML documents.
  • XML Schema http//www.w3.org/XML/Schema
  • Also sets out standardized structure for class of
    XML documents. Is coded in XML, can be parsed and
    edited with standard software. Two separate
    parts structures and datatypes.
  • Namespaces http//www.w3.org/TR/REC-xml-names/
  • Allows developers to qualify element and
    attribute names with unique URIs, avoids
    recognition errors.

25
XML Implementations
  • XHTML, SVG (Structured Vector Graphics), XForms
    (similar to HTML forms).
  • MathML http//www.w3.org/Math/
  • Markup language for describing mathematics, both
    presentation and content.
  • RDF http//www.w3.org/RDF/
  • Resource Description Framework. Defines
    structure for encoding object metadata.
    Facilitates metadata interchange harvesting.
  • Others DocBook, XML ISO12083, Open eBook,
    WAP/WML.

26
Searching and Transformation
  • XPath http//www.w3.org/TR/xpath
  • Defines pattern-matching syntax used by XSLT and
    XPointer. Method for selecting data in a
    document. MSXML 3.0 supports XPath. Supercedes
    XPatterns./descendant-or-selfnode()/childname
  • XSL
  • Includes transformative and FO formatting
    objects. FO will replace CSS for document
    formatting.
  • XSLT http//www.w3.org/TR/xslt
  • Mechanism for encoding style rules, ensures
    consistent rendering of XML documents of the same
    type.
  • XML Query http//www.w3.org/XML/Query
  • Response to limitations of XPath. Would bring
    database-style queries to XML documents.

27
Remote Object Access
  • SOAP (Simple Object Access Protocol)
  • Microsoft, IBM, Sun. Allows applications to
    invoke objects or functions residing on remote
    servers. Creates request block in XML.
  • XML-RPC http//www.xmlrpc.com/
  • Remote procedure calling using HTTP as the
    transport and XML as the encoding. Open, but not
    standard protocol widely adopted.
  • Web Services.

28
Remote Object Access
  • Web Services
  • Based on XML, SOAP, UDDI (Universal Description,
    Discovery, and Integration), and WSDL (Web
    Services Description Language). Applications are
    assembled on the fly in XML and accessed via the
    Web from different devices.
  • Supported by Microsoft .net, IBM WebSphere, SUN
    ONE.

29
XML, XSLT, and CSS
  • Use XML full-text articles as ordered hierarchy
    of content objects.
  • Generate item-level metadata in XML, using RDF
    and Dublin Core syntax and semantics.
  • XSLT and CSS used to present metadata and
    articles in either XML or HTML format depending
    on Browser.
  • Mathematics rendering using XML and CSS (MathML
    conversion beginning).
  • Real-time transformation between XML and HTML
    using XSLT (scalability issues).

30
XML Issues and Problems
  • Variation in interpretation of parsing technology
    and varying implementations of schemas and need
    for standardized Schemas from DTDs.
  • You don't know whether your data will be viewed
    in a Web browser, cell phone, or read by a parser
    on a household appliance (lttoaster_settinggtlight
    brown lt/toaster_settinggt)developer needs to
    comply with a common standard.

31
Schemas vs. DTDs
  • Both are systems of representing a data model
    that defines the datas elements and attributes,
    and the relationship among elements.
  • Schema addresses limitations of DTDs and the
    increasingly data-oriented role of XML.
  • Initial Arbortext, DataChannel, Inso, Microsoft,
    and Univ of Edinburgh proposal XML-Data.
  • W3C XML Schema Working Group two documents XML
    structures and datatypes.

32
Schema Justification
  • Description of document types structure should
    be in an XML document instead of written in
    special syntax (DTD).
  • Schema are in XML easier to edit and process
    using standard XML DOM manipulation tools.
  • DTD notation doesnt allow schema designers the
    power to impose strong data typing -- for
    example, the ability to say that a certain
    element type must always have a positive integer
    value, that it may not be empty, or that it must
    be one of a list of possible choices.

33
Metadata and Linking Standards
  • Digital Object Identifier (DOI) and Persistent
    Object Identifiers.
  • OpenURL and Value-Added Service Components (SFX).
  • Open Archives Initiative (OAI), Dublin Core and
    Qualifiers.
  • Local Resolver Servers.

34
Metadata in DLI
  • As Document (not Object) surrogate.
  • To normalize augment presentation.
  • To normalize searching (e.g. Names).
  • To store dynamic links.
  • Types of links
  • Articles referenced By item (Backward).
  • Articles that reference the item (Forward).
  • A I Records for references and items.
  • Other relationships (TOC, Other items by Author,
    Collaborative Data).
  • Known item and presumptive linking.

35
DLI Metadata Schema
  • Maintained as XML files using RDF and Dublin Core
    syntax and semantics.
  • Example
  • ltdcSourcegt
  • ltidlipublication typejournal articlegt
  • ltidlijournal_titlegtApplied Physics
    Letterslt/idlijournal_titlegt
  • ltidlivolumegt70lt/idlivolumegt
  • ltidliissuegtlt11lt/idliissuegt
  • ltidlipaginationgt1372-1374lt/idlipaginationgt
  • lt/idlipublicationgt
  • lt/dcSourcegt
  • Application of XML DOM for processing at DC or
    idli level.

36
(No Transcript)
37
Digital Object Identifier (DOI)
  • DOI is both a unique identifier of a piece of
    digital content AND a system to access that
    content digitally. Persistent object identifier.
  • The ISBN for the 21st Century -- Norman Paskin.
  • DOI system has two main parts (the identifier
    and a directory system) and a third logical
    component, a database.
  • Developed by AAP (Association of American
    Publishers), now managed by International DOI
    Foundation.

38
DOI Construction
  • First real open standard for content
    identification.
  • DOI is a number that identifies a digital object
  • 10.1063/S000369519903216
  • 10 Registration Agency Prefix
  • 1063 Publisher Prefix
  • S000369519903216 Suffix (Publisher-assigned
    ID)
  • Suffix can be SICI or PII.
  • The DOI and URL pointing to the digital object,
    is registered with the International DOI
    Foundation, e.g
  • 10.1063/333 http//www.pubsite.org/apr99/artl1.p
    df

39
Using a DOI
  • DOIs are resolved using the Handle System
    technology from CNRI (Corporation for National
    research Initiatives).
  • Retrieval of object is two step process link is
    sent to central directory where current Web
    address is stored, location is sent back to
    browser with special message to redirect to
    address, e.g
  • dx.doi.org/10.1063/333 redirects to
    www.pubsite.org/apr99/artl1.pdf

40
Reference Linking
  • Alternatives to DOI
  • PubMed/PubRef (National Library of Medicine)
  • PubSCIENCE (DOE/OSTI)
  • Proprietary Link Managers (AIP, APS)
  • CrossRef Project major Sci-Tech professional
    societies and commercial publishers.
  • System design calls for one URL for each DOI
    underlying technology can handle multiple URLs
    however.

41
Local Resolver
  • Issue Directing users to locally held or
    licensed version of Digital Object (locally
    loaded or from Aggregator).
  • Harvard problem, Appropriate Copy problem.
  • Additional desire to direct users to local
    value-added services local print holdings,
    interlibrary borrowing, other articles in A I
    Services.

42
Local Resolver
  • Local Resolver Servers
  • OpenURL Protocol, CookiePusher vs. IP Addresses.
  • Demonstration Project at Illinois, OhioLink (Ex
    Libris SFX), Los Alamos.
  • Localizing Name Resolution for AIP, ASCE,
    Elsevier, other publishers.
  • Use of CrossRef Metadata Database for identifying
    Publisher from DOI and linking to Local Copy, A
    I Services, Library Assistance.

43
Open Archives Initiative (OAI)
  • Released version 1.0 of metadata harvesting
    protocols. Frozen through second quarter 2001.
  • Mechanism for data providers to expose their
    metadata through an HTTP protocol and a mechanism
    for harvesting records containing metadata from
    repositories.
  • Roots in e-print archives.
  • Lightweight, low-barrier. Easy to implement Web
    server to handle OAI protocol requests need to
    develop procedures to access and extract your
    metadata.

44
OAI Continued
  • Requires repositories to support the Dublin Core
    elements.
  • Allows communities to expose metadata in other
    formats as long as records are structured as XML
    data with corresponding XML schema.
  • Registration mechanism provides publicly
    accessible list of OAI conformants.
  • Alpha testing phase completed.

45
Computer Technologies
  • XML Appliances Intel XML Accelerator.
  • Thin Desktops
  • Legacy-free PCs
  • Network appliances (Sun Rays).
  • Ubiquitous Computing
  • Pocket PCs, PalmPilots, appliance devices.
  • Wireless technologies.
  • Peer to Peer Computing.

46
Publishing Trends
  • Publishers will continue to add value to online
    journal articles.
  • Digital version will become version of record.
  • Virtual journals (both publisher-based and
    cross-publisher) will become common.
  • Next-generation knowledge environments will
    evolve. Multimedia, data exposed, live
    equations with in-place calculations.

47
Publishing Trends (Continued)
  • Personalized services will be available -- agent
    technology, alerting services.
  • Different economic and subscription models will
    be introduced.
  • Deconstruction of Journal (Bob Kelly, APS)
    article at a time publishing.
  • Journal branding or perhaps publisher branding.
  • Academia issues publishing, tenure.

48
Closing Issues
  • Role of Authors, Academic Institutions,
    Libraries, Publishers, Abstracting Indexing
    Services.
  • Disintermediation may affect both Libraries and
    Publishers.
  • Information as Function not Place.
  • Provide a Digital Library out of digital
    collections.
  • Role of XML technology.
  • Service mechanisms processing archiving,
    search and discovery, presentation, linking.
Write a Comment
User Comments (0)
About PowerShow.com