PANGAEA - PowerPoint PPT Presentation

About This Presentation
Title:

PANGAEA

Description:

WDC-MARE World Data Center for Marine Environmental Sciences ... WDC-MARE with its information system PANGAEA provides data portals for several ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 10
Provided by: pan81
Learn more at: https://www.panfmp.org
Category:
Tags: pangaea | mare

less

Transcript and Presenter's Notes

Title: PANGAEA


1
Data portal based on Open Archives Initiative
Protocols and Apache Lucene Uwe Schindler,
uschindler_at_wdc-mare.orgMichael Diepenbroek,
mdiepenbroek_at_wdc-mare.org MARUM, University of
Bremen, Germany EGU 2006, Vienna, 2006-04-03
2
Data Portals
  • WDC-MARE with its information system PANGAEA
    provides data portals for several
    EU/international projects
  • CARBOOCEAN, EUR-OCEANS, IODP
  • Problem
  • Not all data are stored centralized, so all
    datasets provided in portals must be consolidated
    from different sources!

3
Example CARBOOCEAN data portal
  • Data stays at the data providers
  • Metadata is harvested by the portal
  • Search queries are handled by the centralized
    catalogue
  • Scientist gets link to data at the provider

4
Open Archives Protocol
  • The Open Archives Initiative Protocol for
    Metadata Harvesting (OAI-PMH) is a protocol
    developed by the Open Archives Initiative.
  • uses it during web crawling (
    Scholar)
  • Almost all digital libraries support it (most
    famous ones arXiv and the CERN Document
    Server)
  • Very simple to implement (XML over HTTP based)
  • Repository software for databases or file system
    metadata providers is widely available

5
Current OAI-PMH software
  1. Limited to Dublin Core metadata (libraries)!
  2. Limited full text search functionality due to
    relational databases in the background!
  3. No geographic retrievals (because of Dublin Core
    limitation)!
  4. End user interface is part of the software, this
    limits usability in CMS systems

???
6
Requirements for portal software
  1. Open for any XML metadata format
  2. Any mappings to document fields should be done by
    XPath
  3. Possibility to map incompatible XML schemas
    during harvesting by XSL
  4. No relational database, only a full text search
    engine, that contains everything needed for
    operation
  5. Range queries for specific fields (date/time or
    numeric)
  6. Web service interface for the end user software
    that is accessible from any language (Java/JSP,
    PHP, Perl,...)

7
MetadataPortal Java Package
Lucene
OAI- PMH
OAI- Harvester
OAI protocol in HTTP
Virtual Index
Apache Axis
Lucene
XSL
OAI- PMH
OAI- Harvester
OAI protocol in HTTP (specific set)
Virtual Index
Lucene
XSL
XML- Files
Filesystem- Harvester
filesystem directory, FTP,
Mini PanHTTP Server Jetty HTTP Server Tomcat
Portal 1(Webserver, PHP)
Portal 2(Webserver, JSP)
Stored xmldata (same format everywhere, XSL
before indexing), identifier, lastModified,
sets Searchable field1 /oai_dcdc/dcauthorfi
eld2 /oai_dcdc/dctitlefield3
javaorg.test.LatLon.parse(/oai_dcdc/dccoverage
) default . ) xmlnsjavahttp//xml.apache
.org/xalan/java
!!!
8
CARBOOCEAN Data Portal
  • Metadata standard harvested for search DIF v9.4
  • Searchable fields Bounding box, date/time,
    parameters, authors, investigators, title
  • Data centers

World Data Center for Marine Environmental
Sciences (WDC-MARE), University of Bremen and
Alfred-Wegener-Institute in Bremerhaven, Germany
French National Oceanographic Data Centre, SISMER
(Systèmes d'Informations Scientifiques pour la
Mer) at the Ifremer in Brest, France
Carbon Dioxide Information Analysis Center
(CDIAC), Environmental Sciences Division at Oak
Ridge National Laboratory, USA
9
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com