P1253814649GaZAp - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

P1253814649GaZAp

Description:

The SDSS telescope array is systematically mapping of the entire sky ... Earthscope/Incorporated Research Institutions for Seismology (IRIS) ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 13
Provided by: willia265
Category:

less

Transcript and Presenter's Notes

Title: P1253814649GaZAp


1
source Alex Szalay
2
Example Sloan Digital Sky Survey
  • The SDSS telescope array is systematically
    mapping ¼ of the entire sky
  • Discoveries are made by querying the database,
    not through a zero-sum wrestling match for
    telescope time
  • Managed by an RDBMS
  • (MS SQL Server), equipped with a hierarchical
    triangular mesh index, among other customizations
  • 15 TB in the final release in 2007
  • 818 GB in the RDBMS (13.6B tuples)

3
source Alex Szalay
4
Drowning in data starving for information
Empirical X ? Analytical X ? Computational X
? X-informatics
  • Acquisition eventually outpaces analysis
  • Medicine Online publishing, digital charts
  • Astronomy Big telescopes (more in a bit)
  • Genetics PCR, Shotgun Sequencing
  • Oceanography ??
  • Marine Microbiology ??

Increase Data Collection Exponentially in Less
Time, with FlowCAM
5
Cyber-Observatories
  • Arctic Observing Network (AON)
  • Ocean Observing Initiative (OOI)
  • National Ecological Observatory Network (NEON)
  • The Waters Network
  • The Long-Term Ecological Research (LTER) network
  • The Geosciences Network (GEON)
  • Earthscope/Incorporated Research Institutions for
    Seismology (IRIS)
  • Virtual Solar-Terrestrial Observatory (VSTO)
  • Linked Environments for Atmospheric Discovery
    (LEAD)

6
source Alex Szalay
7
source Jim Gray
8
Relational Databases (In Codd we Trust)
At IBM Almaden in 60s and 70s, Ted Codd worked
out a formal basis for tabular data
representation, organization, and access1. The
early systems were buggy and slow (and sometimes
reviled), but programmers only had to write 5 of
the code the previously did. Key Idea Programs
that manipulate tabular data exhibit an algebraic
structure proposed a relational algebra to
manipulate these data sets in their logical form,
indpendently of their physical representation
phsyical data independence
logical data independence
1 E. F. Codd, A Relational Model of Data for
Large Shared Data Banks, Communications of the
ACM 13(6), pp 377-387, 1970
9
source Raghu Ramakrishnan
10
Characteristicsof Cloud Computing
  • Virtual Physical location and underlying
    infrastructure details are transparent to users
  • Scalable Able to break complex workloads into
    pieces to be served across an incrementally
    expandable infrastructure
  • Efficient Services Oriented Architecture for
    dynamic provisioning of shared compute resources
  • Flexible Can serve a variety of workload types
    both consumer and commercial

11
Cloud Computing as Hosted Data Management Services
  • Yahoo
  • Yahoo Distributed Hash Tables Key/value pairs
  • Yahoo Distributed Ordered Tables Ordered ranges
  • PNUTS Relational-style storage, indexing and
    query
  • Amazon
  • S3 Simple Storage
  • SimpleDB Quasi-Relational features
  • Google
  • APIs for Storage, Visualization, Document
    processing, Images, Mail
  • Microsoft
  • CloudDB Relational-style features

12
Workflow at CMOP
Washington University
Cloud
PNNL
OHSU
Cloning/ cDNA/
Sequencing
plates
FASTA files
Inspection
FASTA files
e.g., trim bad reads at the end
Cleaning
BLAST
Post processing
Hit tables
Link
Analyze
Hit tables metadata
synopsis
Shared Knowledge
Write a Comment
User Comments (0)
About PowerShow.com