OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments

About This Presentation

Title:

OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments

Description:

OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC – PowerPoint PPT presentation

Number of Views:206

Avg rating:3.0/5.0

Slides: 23

Provided by: opendapOr

Category:

more less

Transcript and Presenter's Notes

Title: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments

1
OPeNDAP in the CloudOptimizing the Use of
Storage Systems Provided by Cloud Computing
Environments

OPeNDAP
James Gallagher, Nathan Potter
and
NOAA/NODC
Deirdre Byrne, Jefferson Ogata, John Relph
26 June 2013

2
Cloud Systems Now

Providers IBM, Microsoft, Amazon, Google,
Rackspace,
Microsoft Azure handles 100 petabytes of data
a day
Amazon hundreds of thousands of users
Netflix stopped building its own data centers
in 2008 all in Amazon by 2012
Snapchat 4000 pictures per second never owned
a computer server. (Google cloud)

Quentin Hardy, Google Joins a Heavyweight
Competition in Cloud Computing, NY Times, 3
December 2013
3
Why use OPeNDAP?

TheOPeNDAP request smaller and is just the data
the person wants
In cloud systems cost is a function of data
transfer, in addition to to data stored, so
smaller targeted requests reduce costs

4
NOAA Environmental Data Management Conceptual
Cloud Architecture

Aadapted from NOAA Environmental Data Management
Framework Draft v0.3
Appendix C - Dr. Jeff de La Beaujardière, NOAA
Data Management Architect

Potential locations of cloud-enabled OPeNDAP
instances
5
Constraints

No vendor lock-in!
No Stovepipes! - flexible storage method
What will be the client of 2020?
Hierarchical/human browsable

6
Data stores S3 and Glacier

S3
Spinning disk with a flat file system
Designed to make web-scale computing easier
Glacier
Near-line device with 4-hour (or gt) access times
Secure and durable storage
EC2
EC2 was used to run the OPeNDAP data server
Linux

7
Using S3 as a Data Store
HTTP GET HEAD requests
8
Web requests
S3
Catalog, or data request
XML or data file
9
OPeNDAP Catalog requests
EC2
User catalog Request
S3
catalog cache
Catalog Access
OPeNDAP Server
data cache
XML File
THREDDS catalog or HTML
To enhance performance, data were accessed from
S3 only when not already cached.
10
OPeNDAP Data requests
EC2
User data Request
S3
catalog cache
Data Access
OPeNDAP Server
data cache
Data File
Data Slice
To enhance performance, data were accessed from
S3 only when not already cached.
11
Observations

S3FS Amazon's APIs vendor lock-in
XML catalogs were flexible
Support both direct web and
Subsetting server access
Likely adaptable to other use-cases
Easily support hierarchical structure
Catalogs didn't need to be stored in S3

12
Glacier and Asynchronous Responses

To use Glacier, a web service protocol must
support asynchronous access! Glacier is a
near-line device not a spinning disk.
Support via protocol is not enough typical use
cases cannot be met without caching metadata
To support web interfaces/clients DAP metadata
objects should be cached
To support smart clients, may need domain data in
cache

13
Glacier Implementation

Caching
Catalog
DAP metadata
Support for programmatic and web clients
Web clients are the primary user of the DAP
metadata because of their click and browse
behavior
XML with an embedded XSL style sheet
Single response (XML)
Multiple target clients smart and browser

14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Comparison S3 and Glacier

Glacier provides secure and durable storage
S3 is designed to make web-scale computing
easier
These graphs A tiny part of complex cost model.
They do not include the cost to move data out of
the Amazon cloud, EC2 instances, etc.

http//calculator.s3.amazonaws.com/calc5.html
22
Summary