A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D' DeWitt, J' Naughto - PowerPoint PPT Presentation

About This Presentation

Title:

A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D' DeWitt, J' Naughto

Description:

... the plane to the sales meeting in Atlanta last spring? ... Most of the information in my PetDB isn't unique to me: magazine article, web page, stock quote. ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 29

Provided by: dewittmaie

Learn more at: http://archive.dimacs.rutgers.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D' DeWitt, J' Naughto

1
A Petabyte in Your PocketDavid MaierOregon
Graduate Institutewith help fromD. DeWitt, J.
Naughton, L. Delcambre, K. Tufte, V. Papadimos,
P. Tucker
2
Your PetDB

Its 2015.
For 300 a year, you can have a personal petabyte
database (PetDB).
You can talk to it from anywhere.
Organizes any kind of digital data.
Doesnt lose structure, can restructure
Queryable
Handles streams
Organized by type, content, associations,
multiple categorizations and groupings
Locate items by
How or where you encountered them
What youve done with them
Where you were when you accessed them

3
What Would I Put in a Petabyte?

A lot.
Fill my office floor to ceiling with books ? 100
GB
What do I do with 10,000? as much?
Many possibilities
Contents of every book and magazine I read
Every web page I visit
All email I send or receive
Every TV program I watch
Every version of every piece of software I use
Maps of everywhere I go
Notes from every class or seminar I attend
All the telephone calls I make
My Lifestream (Freeman and Gerlernter)

4
Streams and Restructuring

Can incorporate streamed data on the fly.
MD Vital signs from patients in ICU
Factory supervisor status, output rate of all
machines finished products rejects
Can restructure data if desired.
Combined list of conferences in my area
Info sheets on autos Im considering buying
Comparable salaries of faculty at my rank in
similar departments

5
Anything I Might Want to Refer Back to

Personally indexed for me.
Can be located in a thousand different ways.
What is the company in Massachusetts I read about
in the article on factory tours when I was on the
plane to the sales meeting in Atlanta last
spring?

6
Or Things I Might Want in the Future

Histories of news groups and mailing lists
Parts of the web I might want to browse,
including past snapshots
Descriptions and prices for any item I might want
to buy
Papers Ive been meaning to read
Historical data on stocks Im interested in
Functions as a personal web portal

7
Database Not Completely Apt

Didnt have to define a scheme for it
Doesnt need to know the datatypes I want to
store in advance
Doesnt chop data into rows and columns
Unless I ask
Can query over information streams
Dont need to write and run applications to add
data
Anything Ive touched is there
Or expressed an interest in
Not on a particular computer
Doesnt have an outside

8
My PetDB is Good to Me

I dont move data between environments
Im never on the wrong machine
Never go back to my office to grab a paper, never
have the wrong folder at a meeting
Dont worry a lot about filing systemsPetDB
organizes itself by ways I like to look for
information
Anticipates what data Ill be using

9
How to Do This?

On 300/year
Plan A Pack my office floor to ceiling with disk
drives.
About a 1 million.
Plan B Be clever.
Share
Stage
Reconstitute

10
Share

Most of the information in my PetDB isnt unique
to me magazine article, web page, stock quote.
Store one copy.
Information Paradox Whats too expensive for one
may be affordable for all.

Others PetDBs
My PetDB
11
Stage

Not all data has to be at my current point of
connection.
Mainly resides in shared and private servers on
the Internet.
Staged to me on a series of data managers.
Access time depends on context, likely use
Current itinerary 1 second
Upcoming trips 5 seconds
Past trips 30 seconds

12
Reconstitute

If I found it once, PetDB can find it again
Remember what procedure or search constructed or
located data originally.
Use the same method to get it again.
Need to ensure base data is archived.
Plus a small amount of unique content
Stuff Ive created
Foreground information that superimposes my
personal perspective selections, annotations,
responses, manipulations, groupings

13
What Infrastructure Do I Need?

Net Data Managers
Network-centric vs. disk-centric
Data movement vs. data storage
Work on lives streams as well as stored data
Deal with data of arbitrary types
Run queries of thousands of sites
Locate data by external contexts as well as
internal content
Large-scale monitoring

14
Data Management Space
15
Why Net Data Managers?

File systems wont work
No queries, disk centric
Web Servers wont work
No structural query, no combining of data
No support for optimization and execution of
high-level queries spanning 1000s of sites
No support for triggers
In reality, nothing more than page servers

16
Limitations of Current DBMSs

Schema-first
Load then query
Data in the box
Scale
Search by content, not by context

17
Key Elements of NDM

Self-describing data (e.g., XML)
NetQueries
Algebraic basis
Stream-processing components
Oil refinery vs. book-order warehouse
Want to do for net-centric, data-intensive
applications what relational DBs did for business
data processing
Reduce the coding effort to produce such
applications, while improving performance,
scalability and reliability.

18
Codds Contribution

Whats the most important aspect of the
relational model?
Calculus?
Algebra?
Equivalence?
My opinion Observing that BDP programs only do
about 6-7 different things
scan files remove fields
select records remove duplicates
combine records aggregate records
concatenate files
What are the building blocks of net data
management?

19
Without NDMs
Data Sources
Users
20
With NDMs
21
Kinds of Components

Stream-based query processors
Alerters
Accumulators
Remote monitoring/indexing
Semantic Routers
Replicators lazy, eager, just-in-time
Semantic caches
Splitters
Access-mode adapters
Partial evaluators

22
Alerting vs. Querying
23
Access Modes Who Decides
When DataMoves
Post
Push
Producer
Poll
Pull
Consumer
Producer
Consumer
What Data Moves
24
Assembling Applications from Components

Akamai FreeFlow (see NASDAQ site)
Splitting Replication Merge Adapters

Browser
Merge
Base Server
Pull
Web Content
Pull
Text
Field Server
Push
Replicate
Split
Graphics
Field Server
Field Server
25
NIAGARA Project

Initial investigation of NDM based on XML
University of Wisconsin and OGI
Stream-oriented XML-QL evaluator
Text-in-context search
NiagaraCQ
Merge operator (and rest of algebra)
XML Firehose

26
Use of NDM for PetDB

NetQueries encode procedures for reconstituting
data
Monitoring sources of interest
Replication, splitting, push, accumulators,
semantic routing for staging data
NetQuery to inform an archive server what to save
Archives, semantic caches express what they
already hold with a NetQuery

27
Building the PetDB System
Context Mgr.
Stager
Petster
Task Analyzer
Profiler
Stager
Private Archive
Pet DB
Stager
Replicate Server
IP Server
Secure Local Cache
Back Quote
Data Kennel
WebSnap
Indexer
Stream Processor
Internet Monitor
Public Archives
28
What Else is Needed?