EECE 411: Design of Distributed Software Applications (or Distributed Systems 101) - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

EECE 411: Design of Distributed Software Applications (or Distributed Systems 101)

Description:

Start thinking of computer networks from the perspective of a networked-application ... FastTrack (KaZaA, KazaaLite, iMesh), Gnutella (LimeWire, Morpheus, BearShare) ... – PowerPoint PPT presentation

Number of Views:126
Avg rating:3.0/5.0
Slides: 40
Provided by: adria164
Category:

less

Transcript and Presenter's Notes

Title: EECE 411: Design of Distributed Software Applications (or Distributed Systems 101)


1
EECE 411 Design of Distributed Software
Applications(or Distributed Systems 101)
  • Matei Ripeanu
  • http//www.ece.ubc.ca/matei

2
Todays Objectives
  • Class mechanics
  • http//www.ece.ubc.ca/matei/EECE411/
  • Understand real-world applications in terms of
  • Motivation and objectives
  • Resource requirements
  • compute/storage/network resources
  • Architecture (distributed systems part)
  • Examples P2P applications
  • Start thinking of computer networks from the
    perspective of a networked-application

3
P2P Definition(s)
  • Def 1 A class of applications that takes
    advantage of resources storage, cycles,
    content, human presence available at the
    edges of the Internet.
  • Edges often turned on/off, without permanent IP
    addresses
  • Def 2 A class of decentralized,
    self-organizing distributed systems, in which
    all or most communication is symmetric.
  • Lots of other definitions that fit in between
  • Lots of (P2P?) systems that fit nowhere

4
P2P Impact Widespread adoption
  • Skype
  • 560M registered users (Q210)
  • 120M active, 8M paying
  • 15M user online
  • Number of users for file-sharing applications
    (estimate www.slyck.com, Sept 06)
  • P2P design techniques
  • are now mainstream!
  •  

5
P2P Impact (2) Huge resource users
  • P2P generated traffic now dominates the Internet
    load (30-50 of the traffic)
  • Cornell.edu (March 02) 60 P2P
  • Internet2 traffic statistics

6
P2P Impact (3) Demonstrate that small,
volatile, non-proprietary resources can be
efficiently harnessed
  • Resources CPU, storage space,
  • Also network bandwidth, availability, user
    attention and expertise
  • Boinc statistics

7
P2P Impact (4) Social / Business
  • Ability to aggregate resources at large scale is
    disruptive
  • May force companies to change their business
    models
  • Digital content production and distribution
  • Telecommunications companies
  • New collaboration models
  • Crowd-sourcing!

8
Roadmap
  • Definitions
  • Impact
  • Applications
  • Mechanisms
  • A case study

9
Applications (1) Number crunching
  • Examples Boinc, Folding_at_Home, Seti_at_Home, etc.
  • Application characteristics
  • Massive parallelism
  • Low bandwidth/computation ratio
  • Error tolerance
  • Users do donate real resources
  • However
  • Centralized.
  • Cheating!
  • Approach suitable for a particular class of
    problems.
  • How to extend the model to problems that are not
    massively parallel

1.5M / year for power only
10
Applications (2) Online content distribution
(files, streaming)
  • The killer application to date
  • Too many to list them all
  • BitTorrent, FastTrack (KaZaA, KazaaLite, iMesh),
    Gnutella (LimeWire,BearShare)
  • Two independent problems
  • Distributed index
  • Fast content download
  • Environment unreliable, non-cooperative

11
Applications (3) Performance evaluation
  • Poor online performance is costly
  • 25 billion per year (Zone Research)
  • 28 of attempted online purchases fail (BCG)
  • Slow page download is the primary reason for
    abandoning a transaction
  • User expectations for page download are around 4
    seconds
  • Performance evaluation monitoring requires
    multiple vantage points
  • Connectivity statistics
  • Routing errors
  • Evaluate Web-site performance form end-user
    perspective

12
Measurements The Performance Blind Spot
Back-end Infrastructure
NetworkLandscape
Web server
ISP
Backbone
Enterprise Provider
Database
T1
Firewall
Corporate User
CorporateNetwork
ISP
App server
Backbone
MajorProvider
3rd partycontent
RegionalNetwork
Local ISP
ComponentTesting
Datacenter Monitoring
  • BMC
  • Mercury Interactive
  • Tivoli
  • ProactiveNet
  • HP OpenView
  • Computer Associates

Consumer User
  • Keynote Systems
  • Mercury Interactive
  • BMC/SiteAngel
  • Service Metrics

Critical to estimate end-to-end performance

Slide source www.porivo.com
13
Measurements End-to-end Performance
Back-end Infrastructure
NetworkLandscape
Web server
ISP
Backbone
Enterprise Provider
Database
T1
Firewall
Corporate User
CorporateNetwork
ISP
App server
Backbone
MajorProvider
3rd partycontent
RegionalNetwork
Local ISP
ComponentTesting
Datacenter Monitoring
Consumer User
End-to-endWeb PerformanceTesting
Slide source www.porivo.com
Slide source www.porivo.com
14
More applications
  • Backup storage (HiveNet, OceanStore)
  • Crowd-sourcing
  • Spam filtering
  • Anonymous email
  • Censorship-resistant publishing systems
    (Ethernity, Freenet)

15
Roadmap
  • Definitions
  • Impact
  • Applications
  • Mechanisms
  • A Case Study

16
Mechanisms
  • To obtain a resilient system
  • use redundancy for data and services
  • integrate multiple components with uncorrelated
    failure curves.
  • To reduce cost and improve the QoS delivered
  • move service delivery closer to the user
  • integrate multiple clients with uncorrelated
    demand curves
  • (lower over-provisioning at resource providers)

17
Example (I) Cooperative Web serving
Origin Server
www.matei.com
Problem Flash-crowds!
dnssrv
DNS Query
Resolver
Browser
www.matei.com
216.165.108.10
18
Example (I) Cooperative Web serving
Origin Server
httpprx
?
httpprx
dnssrv
Fetch data from nearby
DNS Redirection Return proxy, preferably one
near client
Cooperative Web Caching
Resolver
Browser
akamai.cnn.com
216.165.108.10
19
Roadmap
  • Definitions
  • Impact
  • Uses and Examples
  • Mechanisms
  • A case study
  • File sharing The Gnutella Network BitTorrent

20
Basic Primitives for File Sharing
  • Join How do I begin participating?
  • Publish How do I advertise my file(s)?
  • Search How do I find a file?
  • Fetch How do I retrieve a file?
  • Lots of different solutions for each of these
    four primitives.

21
What makes these systems interesting?
  • Large scale
  • Self-organizing networks
  • Fast growth
  • Gnutella more than 50x during first half of
    2001 50x again 2001 to 2006
  • Open architecture, simple and flexible protocols
  • Interesting mix of social and technical issues

22
Gnutella search mechanism
Boston
Chicago
MIT
UBC
Beatles Yellow Submarine
QBeatles
Calgary
  • Search steps
  • Initiates search for Yellow Submarine
  • Sends message to all neighbors
  • Neighbors forward message
  • Initiate reply message
  • Reply message is back-propagated
  • File download

23
Gnutella Overview
  • Join on startup, client contacts a few other
    nodes these become its neighbors
  • Publish no need
  • Search
  • Flooding pass query to neighbors, who pass the
    query in turn to their own neighbors, and so
    on...
  • Back-propagation in case of success
  • Fetch get the file directly from peer (HTTP)
  • Note this was the original design. Later the
    network moved to a two-layer structure

24
BitTorrent
  • Ingredients
  • A seed node that has the file
  • A .torrent meta-file is built for the file
  • A web-sever (usually) to index torrents
  • A tracker node is associated with each file
  • Identified in the .torrent
  • File is split into fixed-size segments (e.g.,
    256KB)

25
How does it work
26
Overview system components
27
Overview system components
28
Overview system components
29
Overview system components
30
Overview system components
31
Overview system components
32
BitTorrent Overview
  • Join nothing
  • just find a server/community
  • Publish create tracker, spread .torrent file
  • Search
  • for file (not included in the protocol)
  • the community is supposed to provide search tools
  • for segments exchange segment IDs maps with
    other peers.
  • Fetch exchange segments with other peers (HTTP)

33
Gnutella vs. BitTorrent Discussion
  • System properties
  • Reliability?
  • Scalability?
  • Fairness?
  • Overheads?
  • Quality of Service
  • Search coverage for content?
  • Ability to download content fast?
  • Ability to survive flash crowds?

The rest of this course How to build
(distributed) systems with desirable properties.
34
Assignment 0
  • To do Subscribe to mailing list

35
(No Transcript)
36
Gnutella -- Network Resilience
Topology
Random 30 die
Targeted 4 die
from Saroiu et al., MMCN 2002
37
Gnutella Query distribution
  • Highly heterogeneous distribution for query
    popularity
  • similar to Web pages popularity
  • ? caching will work well


from Kunwadee et al., 2002
38
Gnutella Topology issues (1)
39
Gnutella Topology Mismatch
Write a Comment
User Comments (0)
About PowerShow.com