CorrelationAware Object Placement for MultiObject Operations - PowerPoint PPT Presentation

About This Presentation
Title:

CorrelationAware Object Placement for MultiObject Operations

Description:

... Aware Object Placement. for Multi-Object ... derive only optimal placement for a few important objects ... over random placement is between 73 ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 13
Provided by: kais4
Category:

less

Transcript and Presenter's Notes

Title: CorrelationAware Object Placement for MultiObject Operations


1
Correlation-Aware Object Placement for
Multi-Object Operations
  • Ming Zhong Kai Shen Joel Seiferas
  • University of Rochester

2
Problem Overview
  • Multi-object operations in data-intensive
    applications
  • multi-word searches in full-text keyword search
    engines
  • aggregation queries in distributed databases.
  • Operations involving multiple distributed objects
    incur communication and synchronization overhead.
  • Correlation between two objects
  • probability that they are requested together in
    an operation.
  • Correlation-aware object placement
  • intuitive to place highly correlated objects
    together
  • goal reduce communication overhead subject to
    per-machine capacity constraint (load balance).

3
Realistic Object Correlation Patterns
  • Skewed correlations ? sufficient benefit
  • Stable correlations ? low adjustment overhead
  • Illustration of skewness and stability in real
    keyword search engine traces (at Ask.com)

Most correlated pair 177 times more correlated
than the 1000th correlated pair.
Only 1.2 pairs whose correlations changed at
least a factor of two after one month.
4
Problem Context
  • Many large applications are data-intensive and
    distributed.
  • Multi-object operations are increasingly common
  • Yu et al. 2006 studied availability of
    multi-object operations
  • we study comm/sync cost of multi-object
    operations in distributed systems
  • Data skewness in large real-world data sets
  • e.g., web object popularity follows Zipf
    distribution
  • we identify and utilize similar skewness of
    object correlations in multi-object operations

5
Analytical Problem Formulation
  • Input parameters
  • r(i,j) correlation between objects i and j
  • w(i,j) cost between objects i,j when they are
    placed away
  • s(i) capacity usage of object i
  • c(k) capacity constraint at machine k
  • Object placement variables
  • x(i,k) 1 if object i is placed at machine k 0
    otherwise
  • z(i,j) 0 if objects i,j are placed together 1
    otherwise
  • Optimization target ? Minimize ?i,j r(i,j)
    w(i,j) z(i,j)
  • Capacity constraint at each machine k ? ?i x(i,k)
    s(i) c(k)
  • Problem can be reduced to minimum n-way cut (n is
    the number of machines)
  • NP-hard

6
Simplified Variant Linear Programming
  • Optimization target ? Minimize ?i,j r(i,j)
    w(i,j) z(i,j)
  • Constraints k ? ?i x(i,k) s(i) c(k)
  • Relax to fractional object placement
  • x(i,k) proportion of object i placed at machine
    k
  • 0.0 x(i,k) 1.0 ?k x(i,k) 1.0
  • ? Problem become linear programming solvable in
    polynomial time
  • Fractional solution x(i,k)s must be rounded to
    integers
  • rounding is very coarse-grain
  • naïve rounding may dramatically inflate the
    minimization target (cost of multi-object
    operations)

7
Probabilistic Rounding
  • Loop until all objects are placed
  • pick a random machine k
  • pick a random probability r in 0,1
  • check every un-placed object i place i at k if r
    x(i,k)
  • Probabilistic results
  • object i is placed at machine k with probability
    x(i,k)
  • expected cost (after the rounding) is at most
    twice the cost of the original fractional
    solution
  • expected capacity need (after the rounding) at
    each machine does not change
  • Strength polynomial-time 2-approximation is a
    strong result compared to relevant NP-hard
    problems.
  • Weakness probabilistic expectation is not a
    guarantee.

8
Important-Object Partial Optimization
  • Overhead
  • number of variables/constraints in linear
    programming is at least O(objects X machines)
    can be too large!
  • Important-object partial optimization
  • derive only optimal placement for a few important
    objects (incurring most cost in multi-object
    operations)

Example dominance of important keywords in
multi-word search trace at Ask.com.
9
Trace-driven Performance Evaluation
  • Trace-driven keyword index placement to minimize
    the communication cost of multi-keyword searches
  • Compare three data placement approaches on the
    overhead of multi-object operations
  • Linear programming probabilistic rounding
  • Random placement
  • Greedy placement place most correlated object
    pairs together subject to per-machine capacity
    constraint
  • Load balance Per-machine capacity constraint is
    twice the average per-machine load

10
Result Comm. Overhead Reduction
  • Overhead reduction compared to random placement
    with varying optimization scope (number of most
    important keywords subject to optimization)

11
Result Varying System Sizes
  • Cost reduction over random placement is between
    7386
  • Small variations of the cost reduction at
    different system sizes

12
Conclusion
  • Results
  • analytical aspect polynomial-time linear
    programming and probabilistic rounding to produce
    2-approximation solution
  • systems aspect important-object partial
    optimization to control overhead evaluation
    using real application traces
  • Big picture skewed stable data distributions
    motivate per-object adaptation in distributed
    system management
  • adapt co-placement of correlated data objects for
    efficient multi-object operations ICDCS08
  • adapt object replication degrees for high
    availability EuroSys08
  • adapt Bloom filter object hash numbers for low
    false-positive rate PODC08
Write a Comment
User Comments (0)
About PowerShow.com