A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS - PowerPoint PPT Presentation

About This Presentation
Title:

A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS

Description:

The data required for a core interpretation session can be very large. ... established databases such as JANUS, LacCore Vault, dbSEABED, and PaleoStrat. ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 18
Provided by: wmpk7
Category:

less

Transcript and Presenter's Notes

Title: A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS


1
A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR
COREWALL APPLICATIONS
  • Bill Kamp, Lumnilogical Research Center, Univ of
    Minnesota,

2
The Corewall
3
Overview
  • The data required for a core interpretation
    session can be very large.
  • An individual IODP core's data can be in the 10
    to 100 gigabyte range.
  • To compound this problem, many users will be
    interpreting at locations with slow internet
    connections.
  • Finally users may be interpreting data from
    databases that are often designed as read-only
    archives and not designed to hold works in
    progress' of investigators.
  • Our goal is to provide a very smart clipboard.

4
The Data Requirement Demand a Database
  • Workflow Oriented
  • Large Throughput
  • Internet Aware
  • Accept all data types
  • Locally and Remotely Connect to Geowall
  • Integrate with legacy Tools
  • And most Importantly Transparent
  • Little or no CWD work by the Researcher
  • Automatic, automatic, automatic

5
Legacy Tools
  • Core Log Integration Platform from Lamont-Doherty
    Earth Observatory (LDEO)
  • Splicer Provides interactive depth-shifting of
    multiple holes of core data to build composite
    sections
  • Sagan Allows the composite sections output by
    Splicer to be mapped to their true stratigraphic
    depths, unifying core and log records

6
Sample Plot
7
Interfaces
  • We will provide interfaces that enable the CWD
    (Computer Workflow Database) to retrieve user
    selected data from established databases such as
    JANUS, LacCore Vault, dbSEABED, and PaleoStrat.
  • We hope to also pull data through the emerging
    portals such as CHRONOS.
  • The result is fast cached access to multiple data
    sources.

8
Features
  • The CWD captures the results of analyses and
    interpretations.
  • As the workflow is captured it can be accessed by
    other collaborators locally or remotely.
  • In a high bandwidth environment, such as a core
    lab or a university office, a group of
    collaborators could track the work of one-another
    as they work on the same cores.
  • In a low-bandwidth environment we will cache the
    data locally upon first access.
  • In a zero-bandwidth environment, the CDW can be
    copied to a portable mass storage device All
    pointers are relative to the location of the CWD.

9
Coordinate Systems
  • Co-registration across coordinate systems, e.g.
    wire length, geologic boundary, and/or geologic
    age.
  • We use the standard algorithms from SAGAN and
    SPLICER for this purpose.
  • We intend to take advantage of existing
    technologies such as the Storage Resource Broker
    and Meta-data Catalog SRBMDC to facilitate the
    locating of replicated data-sets
  • We will use SESAR identifiers to uniquely and
    automatically identify the sample and the author
    and the experiment when the data is loaded.

10
Database Design
  • The paradigm for the metadata is
  • Author
  • Experiment
  • Raw Data
  • Presentation
  • Data type is missing We support all mime data
    types
  • XML and Text stored in the database
  • All other data stored in the Bin Cache

11
The Data Diagram
12
Caches
  • Uploading requires a caching system
  • Upload Cache, accessed
  • Directly
  • FTP
  • HTTP upload
  • Archive Cache All data is stored in raw form in
    an archive that is permanent
  • Staging A temporary holding place for data while
    it is examined and transformed
  • Bin Cache The location of the binary data
    managed by the database
  • The complete uploading process, including
    automatic recognition of the data type, is
    available as a single script, called ForceUpload.
  • It is the best way when you have multiple data
    sets of the same data type.

13
Data Access
  • All raw data is available via URLs.
  • The author has the option of refining the
    automatically generated presentation, i.e. the
    HTML page that shows the data.
  • Presentations can be dynamically built using
    database data. Tools are provided.
  • If data is not local, it is transferred to the
    local bin cache, and the CWD is updated.
  • If you are not on the internet you need to bring
    with you the database (small) and the bin cache

14
Sample Presentations
  • 9.134.readme.txt.html
  • 9.137.cwilocs.zip.html
  • 1.195.logo.bmp.html
  • 1.148.kamp_1218c_021x_07.jpg.html
  • 1.7.MOLE-JUAN03-1A.Geotek.and.L-a-b.data.xls.html
  • 7.122.GLAD4-HVT03-4B-9H-1.BMP.html
  • 7.123.GLAD4-HVT03-4C-1H-1.BMP.html
  • 7.93.GLAD4-HVT03-4B-1H-1.BMP.html

15
Replication
  • The data base is replicated to multiple sites on
    the internet automatically via TCP/IP. This is a
    MySql feature.
  • The URL of the data is sent to the replicated
    database.
  • If upon the first access, if the data is not
    local, it is fetched to the bin cache via a URL,
    and the pointers in the local CWD are updated.
  • Currently we have a parent-child relationship
    All data is first uploaded to the main CWD.
  • When we complete the integration of SESAR
    identifiers, the design will support peer-to-peer
    relationships.

16
Database Access
  • Data uploaded via a web site
  • Data pulled out the CWD via Corewall
  • Data will automatically cross load to other DBs
    such as Chronos when there is a meta-data match
  • The latter will be enforced via XSLTs

17
Current State
  • Test versions are on the web
  • Currently at http//www.iagp.net/LRC/LrcVault
  • Soon to be at http//burnout.geo.umn.edu
  • Documented at http//mm/html/iagp/LRC/LrcVault/
  • Currently holds 10 GByte of test data
Write a Comment
User Comments (0)
About PowerShow.com