Planet Scale Software Updates - PowerPoint PPT Presentation

About This Presentation
Title:

Planet Scale Software Updates

Description:

Windows Update System. System Description: Update servers: Query for new updates ... for Windows XP SP2 distribution and full history of updates in XP source tree ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 30
Provided by: marume
Learn more at: http://alumni.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Planet Scale Software Updates


1
Planet Scale Software Updates
  • Christos Gkantsidis
  • Thomas Karagiannis
  • Pablo Rodriguez
  • Milan Vojnovic

SIGCOMM 2006
2
Patches
  • Patches (program snippets) upgrade existing
    software with the intention to
  • Fix security vulnerabilities
  • Update drivers
  • Distribute new virus definitions
  • Release new functionality
  • Patches are crucial to maintain high level of
    protection, since they offer updated service and
    applications
  • _

3
Problems with Dissemination of software updates
  • Large scale and fast dissemination of software
    updates to millions of Internet users
  • During certain period of times, patch
    distribution can account for a large fraction of
    the traffic on servers and across Internet
  • Growing popularity of software update requires us
    to know more about
  • Process of creating and
  • releasing pathes
  • Traffic characteristics of
  • patch distribution
  • Potential of alternative
  • distribution strategies
  • _

4
Motivation and Contribution
  • Goal Find general principles and properties that
    can be used as guidelines to design better
    architect fast and cost effective planet scale
    patch dissemination
  • Contributions
  • Clustering patches into groups improves
    effectiveness of system
  • Approximately 80 of IPs appear during the first
    day of a patch release
  • Percentage of machines that are always online is
    approximately 20
  • Computers using update service seem to be highly
    updated
  • Using an existing cache decreases workload on
    servers by 25 to 35, while full cache by ISP
    would result in almost 70
  • P2P can considerably reduce the load on the
    server and P2P locality can reduce inter-ISP
    traffic while disseminating patches

5
Outline
  • Introduction
  • System and Data Description
  • Characteristics of Patches
  • User Characterization
  • Patch Dissemination Strategies
  • Conclusion

6
Windows Update System
  • System Description
  • Update servers Query for new updates
  • Distribution servers Download updates
  • _

7
Traffic on Update and Distribution servers
Number of update queries and corresponding
downloads over three days
8
Dataset Characteristics
  • Some definitions
  • Knowledge Base Set of patches to fix a
    vulnerability
  • Service packs large collection of knowledge
    bases
  • Datasets
  • _

9
Characteristics of Patches
  • Two ways to update a file
  • Replacing actual file
  • Download and install the files that are
  • required to be updated
  • Using a Patch file
  • Patch file(delta) decribes difference between the
    existing version in the user computer and the
    newest version
  • More efficient to send patches instead of file
    itself
  • XP SP2 Mean File size73.2 KB, while Mean Patch
    Size32.9 KB

10
Characteristics of Patches
  • Designing efficient mechanisms requires
  • Number and size of files affected
  • Frequency of update release
  • Relation between the individual patches
  • Problem
  • User machine can have a large set of
    configuration states due to patching at different
    times
  • Solution
  • Examine relationships between updates for
    individual files and cluster the ones which are
    updated together

11
Clustering of Files
  • Used traces of requests for Windows XP SP2
    distribution and full history of updates in XP
    source tree (Data Set-II and Set-IV)
  • To quantify set of files that are clustered
    together, compute cosine correlation between any
    pair of files i and j using
  • Assumed that files are correlated if
    gt0.9
  • _

12
Clustering of Files
  • Dataset has requests for 2029 files
  • 26 non-overlapping groups identified for 2003
    files
  • The 5 largest groups account for 1877 files and
    responsible for 95 of the requests
  • _

Number of requests satisfied by publishing an
increasing number of groups for File Clustering
Group size distribution for File Clustering
13
Clustering of patches
  • Repeated same analysis with patch dataset
  • For 3379 patches, identified 125 groups for 3188
    patches
  • Analysis indicates that
  • Decreasing clustering efficiency
  • Increasing complexity of system
  • _

Group size distribution for Patch Clustering
Number of requests satisfied by publishing an
increasing number of groups for File Clustering
14
User Characterization
  • Traffic Properties
  • Extract arrival patterns of update queries
  • Queries arrive from two types of machines
  • Always online machines(AOM) Have an automatic
    update service that periodically queries for
    updates
  • Non-AOM Go ON and OFF and stay offline for a
    period greater than pre-specified query interval

15
Distinct IPs over time
  • Examined aggregate volume of user queries with
    respect to distinct IPs using Set-I
  • Approximately 80 of observed IPs appear within
    the first day
  • Number of fresh IPs within a day decreases
    abruptly with number of days since initial
    observation day
  • _

Rate of distinct IPs observed over three
days (Peaks are due to the local time differences)
16
Time-of-day effects
  • Europe exhibits the largest query arrival rates
    at the beginning and at the end of the day, due
    to inititial observation time
  • Peak of North America is within interval 1300 to
    1600 hours

  • _

Fraction of distinct IPs per continent observed
within the first day versus time
17
Uniformity and burstiness of queries
  • For 50 of the ASes that account for more than
    90 of distinct IP population, uniformity of
    query arrival is not obtained
  • Burstiness is in many cases larger if the
    AS-aggregate query rates are uniform in time
  • _

Number of queries for two European ISPs in the
same country over time. Two geographically
collocated ISPs results in dissimilar query
arrival pattern within the day because of
different profiles of subscribers.
(Residential?ISP1 vs. Corporate?ISP2)
18
Estimated Always on-line Machines
  • Important since they can be instantaneously
    patched using an ideal push patching system
  • Approximately 20 of the population is always
    online thus could be patched immediately.
  • _

Estimated percentage of distinct IPs classified
as AOM per country versus the total number of
distinct IPs per country
19
Frequency of computer updates
  • How up-to-date computers are kept around the
    world
  • US and Japan users(90) keep their machines
    highly updated
  • Percentage in China or France is 50-70
  • _

Distribution of the number of requests for
different delta sizes across different countries
20
Frequency of computer updates
  • 90 of the population is highly updated.
  • Importance of automatic patching schemes
  • During SP2 distribution, the number of users that
    were updated with most recent updates is less
    than 5, with 22 of users updating from SP1
    versions and 60 from XP RTM versions
  • _

Requests per delta included in SP2
21
Patch Dissemination Strategies
  • Alternative update delivery strategies
  • Caching
  • Peer-to-Peer
  • Peer-to-Peer with locality
  • To evaluate the alternative strategies,
  • assumed that hosts are partioned into
  • groups called subnets
  • Effects of alternative policies in reducing
  • Server load
  • Inter-subnet traffic
  • _

22
Caching-Web Caching
  • Currently deployed web caches are used for update
    dissemination
  • Load reduction at the server(a)
  • Experimental results
  • _

Fraction of updates needed in subnets covered
by caches
Mean number of updates per subnet over subnets
that deploy caches
23
Caching-Full Deployment
  • GoalAn ideal caching deployment (µ1)
  • If every subnet has a cache, the server needs to
    serve at most as many copies of an update as the
    number of subnets
  • Using the dataset Set-II (manual downloads),
  • For file distribution
  • S4.18 ? a 76
  • For delta distribution
  • S3.01 ? a 67
  • _

24
Peer-to-Peer
  • P2P is attactive to disseminate update
  • Advantages
  • Self scalable
  • Capacity increases with number of users
  • Copes well with flash-crowds
  • Challenges
  • Average patch size is small
  • Potentially large set of patches
  • Multiple versions per patch
  • Secure and Timely patch delivery
  • Protecting user privacy
  • Only if a large number of peers target the same
    version of the same patch at the same time, then
    significant savings for content provider and
    end-users

25
Peer-to-Peer with Locality
  • Augment peer matching algorithm to give
    preference to local connections
  • Experiments
  • Estimate amount of data downloaded from remote
    subnets and amount of data uploaded to other
    subnets
  • performed trace-driven simulations to estimate
    the workload reduction
  • Findings
  • Locality decreases the amount of data uploaded
    per subnet by a factor that decreases
    exponentially with the mean number of active
    users per subnet
  • With locality, the ratio of uploads to downloads
    per subnet increases as a function of the size of
    the subnet

26
Comparison of Strategies
  • Aggregate server load for the distribution of one
    patch with (i) client-server (ii) caching (iii)
    p2p with upload time equal to download time (iv)
    p2p with upload time twice as download time and
    (v) p2p with uploaded data as much as
    downloaded(full).

27
Conclusion
  • Characterized a large commercial update
    service(WU)
  • Patch distribution systems
  • Use a near-push functionality
  • Have distinct traffic patterns
  • Require minimun delivery time
  • Automatic software updating is one of the
    prominent architectures
  • To reduce complexity, patches can be clustered
    into groups
  • Evaluated applicability of caching and P2P to
    disseminate patches
  • P2P have great potential for fast and effective
    patch delivery

28
Comments
  • Pros
  • Includes a good motivation, experiments,
    practical usage for companies, theory and math
  • Emphasizing how important having a good dataset
    is
  • Well organized
  • The ideas are summarized before getting deeper
    into the details
  • Figures explain most of the written stuff
  • Cons
  • Some of the formulas given were too abstract
  • Too many references to technical report
  • Except appendix, no difference between paper and
    technical report

29
Happy End! Thank you for your patience!
Write a Comment
User Comments (0)
About PowerShow.com