Data Management Practices and Challenges in Geosciences Today - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Data Management Practices and Challenges in Geosciences Today

Description:

Data Management Practices and Challenges in Geosciences Today – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 25
Provided by: chai58
Category:

less

Transcript and Presenter's Notes

Title: Data Management Practices and Challenges in Geosciences Today


1
Data Management Practices and Challenges in
Geosciences Today
  • Chaitan Baru
  • San Diego Supercomputer Center (SDSC)
  • California Institute for Telecommunications and
    Information Technology (Calit2)

2
Data Management
DATA COLLECTION
DATA PUBLICATION
DATA ACCESS
3
Geosciences Data Management
  • Atmospheric Sciences
  • Meteorological data provides community focus
    real time as well as archived data
  • Common field of interest (e.g. weather)
    continental scale
  • Ocean Sciences
  • Ship cruises and real time data from moorings.
    Increasingly, integration with more diverse data
    (including biological)
  • Field of interest is in regions (e.g. extent of
    cruises)
  • Earth Sciences
  • Broad range of data types sensor data (e.g.
    seismic, GPS), field data collections (e.g.
    geologic data), remote sensing (e.g. LIDAR),
    analytical data (e.g. geochem, geochronology)
  • Broad coverage study area within a small region
    (e.g. watershed), to continental and tectonics
    settings
  • Also, managing model outputs
  • need to manipulate and visualize very large
    outputs from models

4
GEON A Platform for Data Integration Example
GEONsearch
www.geongrid.org
5
GEONsearch and GEONworkbench
Search Condition(s) spatial temporal
concept
GEON Catalog
GEONsearch
Log
Gazetteer
Geologic Age
Web services
extracted information/indexes
GEON Datasets
6
GEON Registration
Ontology Registration
Dataset Registration (hosted)
Data Item (Schema) Registration (hosted /
non-hosted)
Data Item Detail Registration (values)
Service Registration
Resource Registration
7
CUAHSI Hydrologic Information Systemcuahsi.sdsc.e
du
  • Integrated access to federal data sources
  • Web services for accessing each source
  • Need to map to common metadata (ontology)
  • Private workspace
  • Ability to store data and derived products in
    personal digital library
  • Integrated search
  • Ability to search federal data sources as well as
    digital library, with a single search command
  • Scientific workflows
  • Access to modeling and analysis tools via
    scientific workflow software, e.g. Kepler,
    ModelBuilder, D2K

8
Data Integration in CUAHSI HIS
From Chapter 4 System Architecture, by
Chaitan Baru, Ilya Zaslavsky, Reza
Wahadj, Hydrologic Information Systems A Status
Report, edited by David Maidment,
http//www.ce.utexas.edu/prof/maidment/CUAHSI/HISS
tatusSept15.pdf
9
ROADNet Real-time Observatories, Applications
Data management Networks (courtesy John Orcutt,
Frank Vernon, SIO)
10
SDSCStorage Resource Broker
11
(No Transcript)
12
USArray Background
  • Overview
  • 12 year project part of EarthScope
  • Continental-scale seismic observatory for
    lithosphere and deep Earth structure
  • Record local, regional and teleseismic
    earthquakes
  • Major Components
  • A transportable array of 400 portable, unmanned
    three-component broadband seismometers deployed
    on a uniform grid that will systematically cover
    the US
  • A flexible component of 400 portable,
    three-component, short-period and broadband
    seismographs and 2000 single-channel high
    frequency recorders
  • A permanent array of high-quality,
    three-component seismic stations, coordinated as
    part of the US Geological Survey's Advanced
    National Seismic System (ANSS), to provide a
    reference array spanning the contiguous United
    States and Alaska.
  • URLs
  • http//www.earthscope.org/usarray/
  • http//anf.ucsd.edu/index.html
  • http//www.earthscope.org/usarray/usarray_assets/U
    SArray6.mov

Courtesy Frank Vernon, SIO, Tony Fountain, SDSC
13
USArray Existing Infrastructure
  • Infrastructure / Data Flow
  • Seismic sensors connected to dataloggers
  • Dataloggers stream data to central collection
    facility at SIO via IP (and other)-based
    networking
  • New sites initially stream data into Prelim ORB
    (object ring buffer) for QA/QC
  • Operational sites stream data into Production ORB
  • Production data streams sent from SIO to IRIS for
    archiving and dissemination (www.iris.edu)
  • Uses BRTTs Antelope sensornet middleware
    throughout
  • Scale
  • Up to 400 sites deployed at any given time
  • Thousands of channels of real-time streaming data
  • Status
  • Currently in first wave deployment ( 100 sites)
  • Between 5-20 new sites (physically) installed per
    week
  • Transportable array sites will move every 18
    months

Courtesy Frank Vernon, SIO, Tony Fountain, SDSC
14
SOA Architecture Instantiated for USArray
Courtesy Tony Fountain, Neil Cotofana Cyberinfras
tructure Lab for Environmental Observing Systems
(CLEOS), SDSC
15
KEPLER ROADNet Real-Time Scientific Workflows
Architecture
Straightforward Example
Seismic Waveforms
Laser Strainmeter Channels in Scientific
Workflow Earth-tide signal out
Images
other types of data
ORBserver
Real-time Packet Buffer
Near-real-time database
Courtesy John Orcutt, SIO
Scientific Workflow
16
LOOKING
Laboratory for Ocean Observatory Knowledge
INtegration Grid
NSF ITR Grant
Cyberinfrastructure for Ocean Observatories
Initiative
Courtesy John Orcutt, SIO
17
CHRONOS Federated Databases
  • Create a dynamic, interactive and time-calibrated
    framework for Earth history
  • Develop a network of chronostratigraphy databases
  • Federated Database Design
  • The following databases are part of the CHRONOS
    Federated Database at SDSC, based on IBMs DB2
    Information Integrator
  • Neptune
  • PaleoStrat
  • PaleoBiology
  • Janus
  • TimeScale
  • FAUNMAP
  • MIOMAP

Courtesy Doug Greer, SDSC
18
Top-Level View of a Federated Database
Applications
Federated Database
Data Source A
Data Source D
Data Source C
Data Source B
19
Federated Data Sources
  • Geographically Distributed
  • Heterogeneous
  • Relational Databases most common
  • Spreadsheets
  • Non-relational Sources
  • Web Pages / Web Services
  • Flat Files
  • Global Views
  • Views may be virtual, or contain data
    (materialized views)
  • Views define data in a uniform way across the
    data sources
  • Applications can access data through these global
    views, using SQL

20
Example Chronos Hole_Desc View
  • Uniform global-view for hole/taxa description for
    Age Depth Plots application
  • CHRONOS Hole_Desc
  • Database Name
  • Hole_ID
  • Elevation
  • Meters_of_Section
  • Taxa_Count

Courtesy Doug Greer, SDSC
21
Challenges
  • Efficient access to remote data
  • Service interfaces to allow subsetting of data at
    remote end
  • Efficient access to very large data
  • Parallel I/O, manipulation of 10sTB of viz
    output, long term storage of 100sTB of model
    output
  • Versioning of data and metadata, and providing
    provenance
  • Managing access to regular users vs power
    users (or, privileged users)

22
More Challenges
  • Distributed versus centralized storage
  • Warehousing vs federation
  • Or should it really be
  • Distributed Curation and Centralized storage?
  • Long-term preservation of digital data

23
Opportunities
  • Standardize on Web service interfaces for tools,
    applications, and data
  • E.g. Web Mapping Services for map image services,
    services for accessing geologic maps, gravity
    data, sensor data,
  • Develop community standards for knowledge
    representation
  • Schemas, controlled vocabularies, ontologies
  • Choose common representation system, e.g. OWL
  • Meta-workflow frameworks
  • Support inter-operation among different
    scientific workflow systems
  • There may be an opportunity to work through new
    GSA Division on Geoinformatics and AGU working
    group on IT

24
Thank You!
  • Chaitan Baru
  • baru_at_sdsc.edu
Write a Comment
User Comments (0)
About PowerShow.com