Title: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments
1OPeNDAP in the CloudOptimizing the Use of
Storage Systems Provided by Cloud Computing
Environments
- OPeNDAP
- James Gallagher, Nathan Potter
- and
- NOAA/NODC
- Deirdre Byrne, Jefferson Ogata, John Relph
- 26 June 2013
2Cloud Systems Now
- Providers IBM, Microsoft, Amazon, Google,
Rackspace, - Microsoft Azure handles 100 petabytes of data
a day - Amazon hundreds of thousands of users
- Netflix stopped building its own data centers
in 2008 all in Amazon by 2012 - Snapchat 4000 pictures per second never owned
a computer server. (Google cloud)
Quentin Hardy, Google Joins a Heavyweight
Competition in Cloud Computing, NY Times, 3
December 2013
3Why use OPeNDAP?
- TheOPeNDAP request smaller and is just the data
the person wants - In cloud systems cost is a function of data
transfer, in addition to to data stored, so
smaller targeted requests reduce costs
4NOAA Environmental Data Management Conceptual
Cloud Architecture
- Aadapted from NOAA Environmental Data Management
Framework Draft v0.3 - Appendix C - Dr. Jeff de La Beaujardière, NOAA
Data Management Architect
Potential locations of cloud-enabled OPeNDAP
instances
5Constraints
- No vendor lock-in!
- No Stovepipes! - flexible storage method
- What will be the client of 2020?
- Hierarchical/human browsable
6Data stores S3 and Glacier
- S3
- Spinning disk with a flat file system
- Designed to make web-scale computing easier
- Glacier
- Near-line device with 4-hour (or gt) access times
- Secure and durable storage
- EC2
- EC2 was used to run the OPeNDAP data server
- Linux
7Using S3 as a Data Store
HTTP GET HEAD requests
8Web requests
S3
Catalog, or data request
XML or data file
9OPeNDAP Catalog requests
EC2
User catalog Request
S3
catalog cache
Catalog Access
OPeNDAP Server
data cache
XML File
THREDDS catalog or HTML
To enhance performance, data were accessed from
S3 only when not already cached.
10OPeNDAP Data requests
EC2
User data Request
S3
catalog cache
Data Access
OPeNDAP Server
data cache
Data File
Data Slice
To enhance performance, data were accessed from
S3 only when not already cached.
11Observations
- S3FS Amazon's APIs vendor lock-in
- XML catalogs were flexible
- Support both direct web and
- Subsetting server access
- Likely adaptable to other use-cases
- Easily support hierarchical structure
- Catalogs didn't need to be stored in S3
12Glacier and Asynchronous Responses
- To use Glacier, a web service protocol must
support asynchronous access! Glacier is a
near-line device not a spinning disk. - Support via protocol is not enough typical use
cases cannot be met without caching metadata - To support web interfaces/clients DAP metadata
objects should be cached - To support smart clients, may need domain data in
cache
13Glacier Implementation
- Caching
- Catalog
- DAP metadata
- Support for programmatic and web clients
- Web clients are the primary user of the DAP
metadata because of their click and browse
behavior - XML with an embedded XSL style sheet
- Single response (XML)
- Multiple target clients smart and browser
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21Comparison S3 and Glacier
- Glacier provides secure and durable storage
- S3 is designed to make web-scale computing
easier - These graphs A tiny part of complex cost model.
They do not include the cost to move data out of
the Amazon cloud, EC2 instances, etc.
http//calculator.s3.amazonaws.com/calc5.html
22Summary
- OPeNDAP server with minimal changes
- Data stored in S3 and Glacier
- Solution widely applicable Web Smart clients
- Complexity of the cost model ? combination of
both S3 and Glacier likely - Modeling Monitoring use required