Proposals for a new flexible and extensible XML-model for exchange of research information - PowerPoint PPT Presentation


PPT – Proposals for a new flexible and extensible XML-model for exchange of research information PowerPoint presentation | free to download - id: 798ab8-ZGUwO


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Proposals for a new flexible and extensible XML-model for exchange of research information


Proposals for a new flexible and extensible XML-model for exchange of research information By Jens Vindvad, National Office for Research ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 31
Provided by: Jens134
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Proposals for a new flexible and extensible XML-model for exchange of research information

Proposals for a new flexible and extensible
XML-model for exchange of research information
  • By
  • Jens Vindvad,
  • National Office for Research Documentation,
    Academic and Special Libraries, Norway
  • Erlend Øverby,
  • Conduct AS, Oslo, Norway

Content of presentation
  • Assumptions, Objectives, Difference in structure
    and mapping between structures.
  • Validation and MicroSchema
  • Working model (vocabulary, internal structure)
  • Namespace
  • Testing of model

3 assumptions and 1 observation
  • Internet is the driving force and preferred
    medium for international information exchange.
  • Today and in the near future XML is the basic
    Internet standard for information exchange.
  • Research Information Systems are based on
    relation database technology for storing.
  • Validation of data exchange against structure and
    allowed values are often scarified.

  • We want to exchange Research Information between
    different CRIS-systems, and other systems as
    well, by use of Internet and XML-technology.
  • We want to have the possibility to validate the
    exchanged data according to defined structure and
    allowed values.
  • Agreeing upon a common XML exchange model, which
    can be used for information exchange, can help to
    achieve this objective.

How should the exchange model work
  • To be able to exchange information, the existing
    data needs to be transformed/mapped into the
    structure of the exchange model. To receive
    information, data in the exchange model has to be
    transformed/mapped into the receivers data
    model. This will ease information exchange when
    sender and receiver do not share or have the same
    data model, and will also ease data exchange
    between different communities.

Different structures for CRIS and the exchange
  • The CERIF standard is based on relational
    database technology, which is table oriented.
  • The exchange model is based on the XML standard
    by W3C, which has a hierarchical tree structure.

Exchange between table structure and hierarchical
tree structure illustrated
Table structure Relational databases
Hierarchical tree structure XML-document
Information mapping between the exchange model
and CRIS (I)
  • When mapping information between different
    structures we do not have a one to one solution.
    We have to make some choices when defining the
    exchange model, and no final answer or perfect
    solution exists.
  • When building the exchange model we have used the
    following guidelines
  • Data should be placed in elements, not attributes.

Information mapping between the exchange model
and CRIS (II)
  • Characteristics and properties of data should be
    placed in attributes.
  • Be as explicit at possible, not implicit. The
    implicit path can lead to situations where users
    of the model have to make assumptions about how
    to handle the information in the model.

The validation problem
  • Normally two alternatives exist to describe and
    define the information structure or model in an
    XML document, the first is a DTD (ISO 8879) and
    the second is an XML-schema. Both these
    approaches currently have the disadvantages that
    in order to validate and check the structure of
    the information, description of the whole
    structure and all its possibilities and
    constraints must be in existence in one large and
    inflexible model, making it harder to establish
    an efficient validation of data exchange between
    different systems

  • The idea of a MicroSchema is that it should only
    describe a very small piece of information, and
    only such information as is relevant to the
    specific description. Information that is not
    relevant to the specific context is described in
    another MicroSchema.
  • To be able to express the relevance and the
    connection between the MicroSchemas, we need to
    develop a standard method of enhancing the schema
    specification in order to address the valid
    elements in the specific context. Using
    namespaces, introducing the term
    "Allow-schema-namespaces", will do this.

Working model
  • A working model limited to documentation produced
    by researchers has been build.
  • Successful communication requires common wordings
    and definitions to achieve this a vocabulary
    has been defined.
  • With vocabulary and guidelines for building an
    exchange model in place an internal structure has
    been established.
  • Based on vocabulary and internal structure the
    XML exchange model is proposed in terms of
    MicroSchemas and Namespaces.

Vocabulary output and results
  • The collection of all types of information
    produced by the researchers is called output.
    Outputs are divided into four subgroups results,
    communication, documentation and art.
  • With the exception of art, results are taken to
    mean the results of research produced by the
    researcher in person. Examples of results are
    publications, patents and products.

Vocabulary - communication
  • By communication we want to label the forms of
    communication that researchers use in their work.
    Researchers often need to or whish to discuss
    their ideas and views their form for
    communication is not a result of their work, but
    represents interesting and important steps in the
    process of producing results. Examples of
    communications are conference presentations,
    workshops, broadcasting and interview in the

Vocabulary - documentation
  • A researcher has to carry out administrative
    tasks and produce documentation, which cannot be
    classified as results or forms of communication.
    This can be pure administration or high level of
    professional work. Examples are reports to
    founding institution, computer programs,
    manuscripts, (thesis).

Vocabulary - art
  • Art is not necessary output of a researchs work
    but it can be. Art can be seen as a result in
    itself, a form of communication or type of
    documentation, or all of these. Art needs and
    deserves a classification based on standards used
    and accepted in the art community. Examples of
    art are works of art, exhibitions and

Vocabulary publication and five-point test.
  • Publication is a commonly used word, which in
    daily use does not have a precise and distinct
    definition. To establish a vocabulary and a
    namespace, we need the word publication and
    have to give it a precise and distinct
    definition. To do this we have established a
    five-point test, which involve addressee,
    copies, location, readability and time. The test
    must be taken in the following order test
    against publication, then communication and
    finally against documentation.

Internal structure (I)
  • Based on vocabulary and guidelines for building
    an exchange model an internal structure has been
  • The basic elements of the model are HEAD,
    contents and EXTENSIONS. The core of the model is
    content HEAD and EXTENSIONS can be left out.

Internal structure - HEAD
  • Each schema can consist of one and only one HEAD.
    HEAD can be left out. In HEAD, all administrative
    data should be placed. Administrative data such
    as when the information object was created, by
    whom, and who revised or edited the information,
    should be placed in HEAD. In case of transaction
    between systems, all transactional administration
    data should be placed in HEAD. HEAD may also
    contain EXTENSIONS.

Internal structure - content
  • Contents make up the core elements, on which the
    model is built. All the content elements could
    have been put into one element e.g. BODY. This is
    not necessary when we know that all elements,
    which are not HEAD or EXTENSIONS are part of the
    core model.

Internal structure - EXTENSIONS
  • The elements in the basic model should be
    understood and managed by all who want to
    exchange information. A basic model with a high
    degree of certainty will not satisfy all needs.
    To accomplish these needs the model is made
    extensible. With this construction, everybody can
    easily see what is part of the core model and
    what belongs to a specific extension. All that
    make use of an extension will have to supply a
    working namespace for the extension.

Example MicroSchema ArticleInJournal
General identifier Occurrence Content model
HEAD Zero or one mSchema HEAD
TitleInfo One mSchema TitleInfo
Author One or more mSchema Person mSchema OrgUnit
RefInToJournal One mSchema RefInToJournal
URI Zero or one Uri
Abstract Zero or more Text
Namespace - examples
Element name NS-abbr. NS-Uri
Output out root/Outputs.msc
Results res root/output/Results.msc
Publications pub root/output/results/Publications.msc
Journal jour root/output/results/publications/Journal.msc
ArticleInJournal aij root/output/results/publications/ArticleInJournal.msc
Person pers root/level1/Person.msc
OrgUnit org root/level1/OrgUnit.msc
HEAD HEAD root/misc/HEAD.msc
TitleInfo ti root/output/TitleInfo.msc
Test of exchange model
  • We have so far tested the model against CRIS data
    from BIBSYS FORSKDOK and data from the library
    system BIBSYS.
  • To perform the tests we have developed two XSLT
    program (XSL-stylesheets), which maps/ transform
    the input data into the proposed exchange model.

Test example Input (I)
  • ltpublikasjongt
  • ltf001gt
  • ltf001bgtr00015557lt/f001bgt
  • ltf001dgtA12lt/f001dgt
  • ltf001igtflt/f001igt
  • ltf001jgtFO02RBRUlt/f001jgt
  • ltf001ngt2000-04-03lt/f001ngt
  • ltf001ogt2000-04-03lt/f001ogt
  • lt/f001gt
  • ltf008gt
  • ltf008cgtenglt/f008cgt
  • lt/f008gt
  • ltf020gt
  • lt/f020gt
  • ltf022gt
  • lt/f022gt
  • ltf100gt
  • ltf100agtHeimdal,
  • ltf100bgt02013300lt/f100bgt
  • ltf100agtAarstad,
  • ltf100bgt02013300lt/f100bgt
  • ltf100agtOlofsson,
  • ltf100bgt02013300lt/f100bgt
  • lt/f100gt
  • ltf245gt
  • ltf245agtPeripheral Blood T-Lymphocyte and
    Monocyte Function and Survival in Patients with
    Head and Neck
  • lt/f245gt
  • ltf260gt
  • lt/f260gt

Test example Input (II)
  • ltf300gt
  • ltf300agt402 - 407lt/f300agt
  • lt/f300gt
  • ltf507gt
  • lt/f507gt
  • ltf509gt
  • ltf509agtLaryngocopelt/f509agt
  • ltf509cgt2000lt/f509cgt
  • ltf509fgt110lt/f509fgt
  • ltf509hgt3lt/f509hgt
  • ltf509xgt0023-852Xlt/f509xgt
  • lt/f509gt
  • lt/publikasjongt

Test example Output (I)
  • ltoutResults xmlnsres"http//
  • ltresPublications xmlnspub"http//

  • ltpubArticleInJournal xmlnsaij"http//
  • ltaijHEAD xmlnsHEAD"http//

  • ltHEADSourceNamegtBIBSYS
  • ltHEADIdNumbergtr00015557lt/HEADIdNumbergt
  • ltHEADClassificationCodegtA12lt/HEADClass
  • ltHEADDescriptiongtArtikkel i
    internasjonalt vit. tidsskrift uten

  • ltHEADCreatedgt2000-04-03lt/HEADCreatedgt
  • ltHEADUpdatedgt2000-04-03lt/HEADUpdatedgt
  • lt/aijHEADgt

Test example Output (II)
  • ltaijTitleInfo xmlnsti"http//
  • lttiMainTitle Language"eng"gt
  • lttiTitlegtPeripheral Blood
    T-Lymphocyte and Monocyte Function and Survival
    in Patients with Head and Neck
  • lt/tiMainTitlegt
  • lt/aijTitleInfogt

Test example Output (III)
  • ltaijAuthor xmlnspers"http//
  • ltpersPersongt
  • ltpersFamilyNamesgtHeimdallt/persFam
  • lt/persPersongt
  • ltpersPersongt
  • ltpersFamilyNamesgtAarstadlt/persFam
  • lt/persPersongt
  • ltpersPersongt
  • ltpersFamilyNamesgtOlofssonlt/persFa
  • lt/persPersongt
  • lt/aijAuthorgt

Test example Output (IV)
  • ltaijRefInToJournal xmlnsRIn2J"http//
  • ltRIn2JJournal xmlnsjour"http//www.rb

  • ltjourTitleInfo xmlnsti"http//ww"gt
  • lttiMainTitle Language"eng"gt
  • lttiTitlegtLaryngocopelt/ti
  • lt/tiMainTitlegt
  • lt/jourTitleInfogt
  • ltjourISSNgt0023-852Xlt/jourISSNgt
  • lt/RIn2JJournalgt
  • ltRIn2JPublishingYeargt2000lt/RIn2JPublis
  • ltRIn2JVolumgt110lt/RIn2JVolumgt
  • ltRIn2JIssuegt3lt/RIn2JIssuegt
  • ltRIn2JPagesgt402 - 407lt/RIn2JPagesgt
  • lt/aijRefInToJournalgt
  • lt/pubArticleInJournalgt
  • lt/resPublicationsgt
  • lt/outResultsgt