The DataCentric Revolution in Networking - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

The DataCentric Revolution in Networking

Description:

Protocol delivers data from one host to another. unicast: conceptually trivial ... PIER (talk later today in session A9!): joins, aggregation, recursive and ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 50
Provided by: camp206
Category:

less

Transcript and Presenter's Notes

Title: The DataCentric Revolution in Networking


1
The Data-Centric Revolution in Networking?
  • Scott Shenker
  • International Computer Science Institute
  • U. C. Berkeley

Liberally stealing the insight and work of
others, particularly Hari Balakrishnan, Deborah
Estrin, Ramesh Govindan, Joe Hellerstein, and Ion
Stoica
2
Two Communities Apart
  • Networking (Internet) researchers
  • dont know and dont care about databases
  • Vast gap between communities
  • much more overlap with other systems communities
  • But data-centrism has narrowed the gap
  • metaphors and algorithms
  • This talk will tell that story in reverse order
  • Internet, then sensornets

3
Our Central Mission
Get data from here to there
from here to there
4
Host-centric Protocols
  • Protocols defined in terms of IP addresses
  • Unicast IP address host
  • Multicast IP address set of hosts
  • Destination address is given to protocol
  • Protocol delivers data from one host to another
  • unicast conceptually trivial
  • multicast address is logical, not physical

5
Host-centric Applications
  • Classic applications destination is intrinsic
  • telnet target machine
  • FTP location of files
  • electronic mail email address turns into mail
    server
  • multimedia conferencing machines of participants
  • Destination is specified by user (not network)
  • Usually specified by hostname not address
  • DNS translates names into addresses

6
Domain Name System (DNS)
  • DNS is built around recursive delegation
  • Top level domains (TLDs) .com, .net, .edu, etc.
  • TLDs delegate authority to subdomains
  • berkeley.edu
  • Subdomains can further delegate
  • cs.berkeley.edu
  • Hierarchy fits host administrative structure
  • Local decentralized control
  • Crucial to efficient hostname resolution

7
Network Research in Early 90s
  • Consumed by a few obsessions
  • Quality of service for streaming media
  • Multicast
  • Congestion control
  • But nobody questioned host-centricity
  • assumed to be the only way to build Internet

8
Surprise 1 The web catches on!
  • But we dont....

9
The Web
  • Web URLs have host-name/path format
  • Essentially the same information as FTP
  • Early web
  • browsers basically a GUI for FTP
  • URLs were easily transmitted pointers
  • Early web was host-centric
  • and largely ignored (but used) by net researchers

10
Modern Web
  • URLs often function as names of data
  • users think of www.cnn.com as data, not a host
  • Fact that www.cnn.com is a hostname is irrelevant
  • Users want data, not access to particular host
  • The web is now data-centric

11
Data-centric App in Host-centric World
  • Data still associated with host names (URLs)
  • administrative structure of data same as hosts
  • weak point in current web
  • Key enabler search engines
  • Searchable databases map keywords to URLs
  • Allowed users to find desired data
  • Networkers focused on technical problems
  • HTTP, persistence (URNs), replication (CDNs), ...

12
We Missed the Point!
  • We thought
  • web was an aberration
  • search engines were a sufficient hack
  • No networker (except Jacobson) articulated that
  • web had gone from host-centric to data-centric
  • it was a harbinger of future applications

13
Surprise 2 Stolen Music is Popular!
  • And we finally get the message...

14
The P2P Filesharing Phenomena
  • Napster Fastest growing Internet application
  • Music sharing is intrinsically data-centric
  • data never associated with hosts
  • Centralized searchable database
  • listed IP addresses where content could be found
  • analogous to GoogleDNS in the web
  • Legal problems forced decentralization
  • Led to Gnutella and other distributed programs

15
Gnutella-style File Sharing
  • Gnutella nodes form an overlay network
  • each node has a few neighbors in a virtual
    network
  • virtual link node knows others IP address
  • do app-level networking on this graph

16
Gnutella-style Searching
  • Keyword queries are flooded (within scope)
  • query is processed locally at each node
  • all nodes having hits respond to source
  • many variations on this theme (freenet, etc.)
  • Clearly not scalable
  • P2P traffic now sizable fraction of overall load
  • We finally realize that we need a scalable way to
    find data for data-centric applications

17
Is there life outside the Internet?
  • Yes, and we should have been listening!

18
Sensornets (predating P2P)
  • Vision
  • Many sensing devices with radio and processor
  • Enable fine-grained measurements over large areas
  • Huge potential impact on science, and society
  • Technical challenges
  • untethered power consumption must be limited
  • unattended robust and self-configuring
  • wireless ad hoc networking

19
Conceptual Challenge
  • Sensornets are inherently data-centric
  • Users know what data they want, not where it is
  • Estrin, Govindan, Heidemann (2000, etc.)
  • Centralized database infeasible
  • vast amount of data, constantly being updated
  • small fraction of data will ever be queried
  • sending to single site expends too much energy

20
Flood-then-Aggregate
  • General class of methods
  • Flood query to all nodes (or in region)
  • Nodes with data matching query respond
  • Responses are aggregated as appropriate
  • Examples
  • Directed diffusion reinforce based on data
  • TAG tree for flood and return-path aggregation
  • Etc....

21
Scaling Problems
  • This approach suffers as
  • systems get bigger
  • queries more frequent and more specific
  • For current deployments, not an issue
  • systems are small, queries primitive
  • But if technology progresses as hoped
  • want to get relevant data without flooding
  • similar to situation in Internet

22
Is Data-centric Flooding Necessary?
  • The initial decentralized data-centric designs
    (in both Internet and sensornets) used flooding
  • unscalable and unsustainable
  • Since data-centrism is here to stay, we cant
    ignore this problem
  • We had to broaden our research charter

23
Our Revised Mission
Get data from here to there
Get data from here to there
24
A DNS for Data?
  • Can we map data names into addresses?
  • a data-centric DNS, distributed and scalable
  • doesnt alter net protocols, but aids data
    location
  • not just about stolen music, but a general
    facility
  • A formidable challenge
  • Data does not have a clear administrative
    hierarchy
  • Likely need to support a flat namespace
  • Can one do this scalably?
  • Data-centrism requires scalable flat lookups

25
Distributed Hash Tables (DHTs)
  • The latest networking fad....

Presented from the Internet perspective but
applies to sensornets as well
26
An Internet-scale Distributed Index
  • Interface put(key,object), get(key)
  • DHTs form a structured overlay network
  • nodes choose particular neighbors
  • all objects have keys, usually hash(name)
  • each node responsible for range of keys
  • puts/gets routed to appropriate node

27
Example Design Chord
?
?
?
  • Node and object keys
  • random location around a circle
  • Neighbors
  • nodes 2-i around the circle
  • found by routing to desired key
  • Routing greedy
  • pick nbr closest to destination
  • Storage own interval
  • node owns key range betweenher key and previous
    nodes key

?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Ownership range
?
?
28
Key Properties
  • Large aggregate capacity O(n) storage/bwidth
  • Scalable
  • O(log n) routing hops and state
  • O((log n)2) update costs for node join/leaves
  • Robust self-configuring and resilient to
    failures
  • Nonproperty strict guarantees when failures

29
Our Version of Data Independence
  • DHT interface allows us to get data by name
  • We no longer care where data is
  • A radical transition in databases
  • perhaps it will be one in networking as well
  • Apologies to Joe Hellerstein...
  • see latest SIGMOD Record for his article

30
Caveat!
  • DHTs are a work-in-progress
  • A flurry of research activity on
  • security
  • replication
  • proximity
  • real operational experience
  • .....
  • For rest of talk, we put these worries aside...

31
Why Not Centralized Solutions?
  • Ugh! (and infeasible for sensornets)
  • Fault tolerance avoid single point of failure
  • Economic
  • DNS donated machines, scales organically
  • Centralized solutions require business model
  • Issue still open....but irrelevant to
    data-centrism
  • need to support interface
  • DHTs allow us to choose between cent. and decent.

32
Multiple Roles for DHTs
  • Application-specific
  • rolled into P2P application, run on peers
  • General-purpose service
  • run on managed nodes
  • Intrinsic part of Internet architecture
  • run on managed nodes

33
Multiple Roles for DHTs
  • Application-specific
  • rolled into P2P app, run on peers
  • General-purpose service
  • run on managed nodes
  • Intrinsic part of Internet architecture
  • run on managed infrastructure nodes

34
Some Applications using DHTs
  • Partial list
  • File sharing
  • Storage repositories and file systems
  • Backup systems
  • Event notification systems
  • Electronic mail
  • App-layer multicast and streaming media
  • .....
  • Useful substrate for many (not all) large
    distributed applications because HTs are useful

35
Multiple Roles for DHTs
  • Application-specific
  • rolled into P2P app, run on peers
  • General-purpose service
  • run on managed nodes
  • Intrinsic part of Internet architecture
  • run on managed infrastructure nodes

36
Internet-scale Query Processing
  • Superficial motivation
  • Joins can be implemented with hash tables so...
  • Distributed joins can be implemented with DHTs
  • Scaling latency O(log n) while computation O(n)
  • PIER (talk later today in session A9!)
  • joins, aggregation, recursive and continuous
    queries
  • Intended targets
  • data in the wild (filesharing, net monitoring,
    etc.)
  • schema provided by standardized protocols
  • no need for ACID semantics

37
More Complex Queries
  • Range search
  • using prefix hash table
  • no need to walk tree
  • Keyword search
  • engineering the boolean approach
  • Active research on DHT-based distributed data
    structures for search (net and db communities)

38
Multiple Roles for DHTs
  • Application-specific
  • rolled into P2P app, run on peers
  • General-purpose service
  • run on managed nodes
  • Intrinsic part of Internet architecture
  • run on managed infrastructure nodes

39
Cleaning Up the Architecture
  • Making URNs a reality
  • webNG based on flat and opaque DHT keys
  • enables persistence and eliminates branding
  • Host identifiers versus routing information
  • IP addresses currently (and stupidly) serve as
    both
  • DHT key host id, resolves to routing address
  • Architectural challenge for basic protocols

40
Subverting the Architecture
  • Use DHT for forwarding, not just lookup!
  • e.g., Internet Indirection Infrastructure (i3)
  • similar in spirit to multicast (logical
    addressing)
  • transcends current naming/addressing structures
  • Make overlay the real network layer
  • turn IP into a link layer technology
  • Leverages, not limited by, current infrastructure
  • New network layer is still simple, but not IP

41
New Generation of Networking?
  • Current Internet relies on hierarchies to scale
  • DNS naming, IP addressing, etc.
  • Hierarchies limit flexibility
  • addresses and names have to fit given structure
  • need to care where data/machines are
  • Scalable flat lookup avoids hierarchy
  • network would be structure independent
  • Less of a distinction between hosts and data

42
Do DHTs Apply to Sensornets?
  • Can we build them?
  • Do they help?

43
Finding Sensornet Data w/o Flooding
  • Extract high-level features or events
  • Temperature spikes, toxins, animal sightings
  • Name these events
  • Store/Access events with DHT-like structure
  • Can later get detailed data from specific nodes
  • Call this data-centric storage (DCS)
  • Good for frequent specific queries
  • Not good for long-running or aggregate queries
  • But how do you build a sensornet DHT?

44
Geographic Routing
  • Nodes know own and neighbors positions
  • Packets routed to geographic destination
  • Greedy forwarding, when possible
  • If greedy fails at a void, use the right hand
    rule to navigate around the void

B
(x,y)
A
45
Geographic Hash Table (GHT)
  • Keys hashed to random coordinates
  • Likely no node exists at that location!
  • Forwarding ends at node closest to destination
  • Closest node stores the data

A
(x,y)
46
Additional Algorithms
  • Caching and replication
  • Cache around perimeter, replicate independently
  • Structured replication (SR)
  • Hierarchical decomposition of key space
  • Tree of mirror images

Mirror Images
Hash(event)
47
More Complex Queries
  • Using GHTSR (which has spatial structure)
  • Range searches in space and value
  • Wavelet analysis
  • New data structures
  • Higher-dimensional range searches
  • Active research in distributed data structures
    for sensornet queries (in net and db communities)

48
We are finally on our way to the land of data
independence...
  • We ask for your guidance....

49
Areas of Common Interest
  • Algorithmic
  • distributed data structures for search
  • Metaphoric
  • thinking about data independence
Write a Comment
User Comments (0)
About PowerShow.com