A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D' DeWitt, J' Naughto - PowerPoint PPT Presentation

About This Presentation
Title:

A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D' DeWitt, J' Naughto

Description:

... the plane to the sales meeting in Atlanta last spring? ... Most of the information in my PetDB isn't unique to me: magazine article, web page, stock quote. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 29
Provided by: dewittmaie
Category:

less

Transcript and Presenter's Notes

Title: A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D' DeWitt, J' Naughto


1
A Petabyte in Your PocketDavid MaierOregon
Graduate Institutewith help fromD. DeWitt, J.
Naughton, L. Delcambre, K. Tufte, V. Papadimos,
P. Tucker
2
Your PetDB
  • Its 2015.
  • For 300 a year, you can have a personal petabyte
    database (PetDB).
  • You can talk to it from anywhere.
  • Organizes any kind of digital data.
  • Doesnt lose structure, can restructure
  • Queryable
  • Handles streams
  • Organized by type, content, associations,
    multiple categorizations and groupings
  • Locate items by
  • How or where you encountered them
  • What youve done with them
  • Where you were when you accessed them

3
What Would I Put in a Petabyte?
  • A lot.
  • Fill my office floor to ceiling with books ? 100
    GB
  • What do I do with 10,000? as much?
  • Many possibilities
  • Contents of every book and magazine I read
  • Every web page I visit
  • All email I send or receive
  • Every TV program I watch
  • Every version of every piece of software I use
  • Maps of everywhere I go
  • Notes from every class or seminar I attend
  • All the telephone calls I make
  • My Lifestream (Freeman and Gerlernter)

4
Streams and Restructuring
  • Can incorporate streamed data on the fly.
  • MD Vital signs from patients in ICU
  • Factory supervisor status, output rate of all
    machines finished products rejects
  • Can restructure data if desired.
  • Combined list of conferences in my area
  • Info sheets on autos Im considering buying
  • Comparable salaries of faculty at my rank in
    similar departments

5
Anything I Might Want to Refer Back to
  • Personally indexed for me.
  • Can be located in a thousand different ways.
  • What is the company in Massachusetts I read about
    in the article on factory tours when I was on the
    plane to the sales meeting in Atlanta last
    spring?

6
Or Things I Might Want in the Future
  • Histories of news groups and mailing lists
  • Parts of the web I might want to browse,
    including past snapshots
  • Descriptions and prices for any item I might want
    to buy
  • Papers Ive been meaning to read
  • Historical data on stocks Im interested in
  • Functions as a personal web portal

7
Database Not Completely Apt
  • Didnt have to define a scheme for it
  • Doesnt need to know the datatypes I want to
    store in advance
  • Doesnt chop data into rows and columns
  • Unless I ask
  • Can query over information streams
  • Dont need to write and run applications to add
    data
  • Anything Ive touched is there
  • Or expressed an interest in
  • Not on a particular computer
  • Doesnt have an outside

8
My PetDB is Good to Me
  • I dont move data between environments
  • Im never on the wrong machine
  • Never go back to my office to grab a paper, never
    have the wrong folder at a meeting
  • Dont worry a lot about filing systemsPetDB
    organizes itself by ways I like to look for
    information
  • Anticipates what data Ill be using

9
How to Do This?
  • On 300/year
  • Plan A Pack my office floor to ceiling with disk
    drives.
  • About a 1 million.
  • Plan B Be clever.
  • Share
  • Stage
  • Reconstitute

10
Share
  • Most of the information in my PetDB isnt unique
    to me magazine article, web page, stock quote.
  • Store one copy.
  • Information Paradox Whats too expensive for one
    may be affordable for all.

Others PetDBs
My PetDB
11
Stage
  • Not all data has to be at my current point of
    connection.
  • Mainly resides in shared and private servers on
    the Internet.
  • Staged to me on a series of data managers.
  • Access time depends on context, likely use
  • Current itinerary 1 second
  • Upcoming trips 5 seconds
  • Past trips 30 seconds

12
Reconstitute
  • If I found it once, PetDB can find it again
  • Remember what procedure or search constructed or
    located data originally.
  • Use the same method to get it again.
  • Need to ensure base data is archived.
  • Plus a small amount of unique content
  • Stuff Ive created
  • Foreground information that superimposes my
    personal perspective selections, annotations,
    responses, manipulations, groupings

13
What Infrastructure Do I Need?
  • Net Data Managers
  • Network-centric vs. disk-centric
  • Data movement vs. data storage
  • Work on lives streams as well as stored data
  • Deal with data of arbitrary types
  • Run queries of thousands of sites
  • Locate data by external contexts as well as
    internal content
  • Large-scale monitoring

14
Data Management Space
15
Why Net Data Managers?
  • File systems wont work
  • No queries, disk centric
  • Web Servers wont work
  • No structural query, no combining of data
  • No support for optimization and execution of
    high-level queries spanning 1000s of sites
  • No support for triggers
  • In reality, nothing more than page servers

16
Limitations of Current DBMSs
  • Schema-first
  • Load then query
  • Data in the box
  • Scale
  • Search by content, not by context

17
Key Elements of NDM
  • Self-describing data (e.g., XML)
  • NetQueries
  • Algebraic basis
  • Stream-processing components
  • Oil refinery vs. book-order warehouse
  • Want to do for net-centric, data-intensive
    applications what relational DBs did for business
    data processing
  • Reduce the coding effort to produce such
    applications, while improving performance,
    scalability and reliability.

18
Codds Contribution
  • Whats the most important aspect of the
    relational model?
  • Calculus?
  • Algebra?
  • Equivalence?
  • My opinion Observing that BDP programs only do
    about 6-7 different things
  • scan files remove fields
  • select records remove duplicates
  • combine records aggregate records
  • concatenate files
  • What are the building blocks of net data
    management?

19
Without NDMs
Data Sources
Users
20
With NDMs
21
Kinds of Components
  • Stream-based query processors
  • Alerters
  • Accumulators
  • Remote monitoring/indexing
  • Semantic Routers
  • Replicators lazy, eager, just-in-time
  • Semantic caches
  • Splitters
  • Access-mode adapters
  • Partial evaluators

22
Alerting vs. Querying
23
Access Modes Who Decides
When DataMoves
Post
Push
Producer
Poll
Pull
Consumer
Producer
Consumer
What Data Moves
24
Assembling Applications from Components
  • Akamai FreeFlow (see NASDAQ site)
  • Splitting Replication Merge Adapters

Browser
Merge
Base Server
Pull
Web Content
Pull
Text
Field Server
Push
Replicate
Split
Graphics
Field Server
Field Server
25
NIAGARA Project
  • Initial investigation of NDM based on XML
  • University of Wisconsin and OGI
  • Stream-oriented XML-QL evaluator
  • Text-in-context search
  • NiagaraCQ
  • Merge operator (and rest of algebra)
  • XML Firehose

26
Use of NDM for PetDB
  • NetQueries encode procedures for reconstituting
    data
  • Monitoring sources of interest
  • Replication, splitting, push, accumulators,
    semantic routing for staging data
  • NetQuery to inform an archive server what to save
  • Archives, semantic caches express what they
    already hold with a NetQuery

27
Building the PetDB System
Context Mgr.
Stager
Petster
Task Analyzer
Profiler
Stager
Private Archive
Pet DB
Stager
Replicate Server
IP Server
Secure Local Cache
Back Quote
Data Kennel
WebSnap
Indexer
Stream Processor
Internet Monitor
Public Archives
28
What Else is Needed?
  • Superimposed Information
  • Much of my unique content is an organizational
    overlay on base data
  • Small-footprint data managers
  • Presentation model of stream data
  • Authorization and Authentication
  • QoS control, content scaling
  • Intelligent prediction, learning
  • Secure staging areas
Write a Comment
User Comments (0)
About PowerShow.com