AutoLoop: Automated Action Selection in the ObserveAnalyzeAct Loop for Storage Systems - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

AutoLoop: Automated Action Selection in the ObserveAnalyzeAct Loop for Storage Systems

Description:

John Palmer (IBM), Randy Katz (U.C. Berkeley), Gul Agha (UIUC) ... ROC [UC Berkeley/Stanford] AutoAdmin [Microsoft] Manifestations of the Observe-Analyze-Act Loop ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 33
Provided by: IBMU431
Category:

less

Transcript and Presenter's Notes

Title: AutoLoop: Automated Action Selection in the ObserveAnalyzeAct Loop for Storage Systems


1
AutoLoop Automated Action Selection in the
Observe-Analyze-Act Loop for Storage Systems
Li Yin (U.C. Berkeley), Sandeep Uttamchandani
(IBM) John Palmer (IBM), Randy Katz (U.C.
Berkeley), Gul Agha (UIUC) Presented by David
Pease (IBM Almaden Research)
2
  • Jim Gray's Turing award speech What next? - A
    dozen IT research goals, 1999
  • Build a system
  • used by millions of people each day
  • administered and managed by a ½ time person.
  • On hardware fault, order replacement part
  • On overload, adjust automatically

Observe system state, Analyze behavior, Activate
corrective actions
  • Self-management a necessity and a key value
    differentiator today
  • Autonomic Computing IBM
  • Self- Systems CMU
  • ROC UC Berkeley/Stanford
  • AutoAdmin Microsoft

- Impact on Total Cost of Ownership - Scarce
skilled administrators - Growing number of system
protocols, users, application requirements - One
admin per 1-10 TB of storage Gartner00
  • Manifestations of the Observe-Analyze-Act Loop
  • Databases Harvard Margo Setlzer
  • Networks UC Berkeley Randy Katz
  • Storage systems CMU Greg Ganger

3
The Observe-Analyze-Act Loop in Storage Systems
(I)
E-mail
Data Warehousing
Web-server
. . .
SLO Goals
Storage Virtualization (Mapping Application-data
to Storage Resources)
. . .
4
The Observe-Analyze-Act Loop in Storage Systems
(II)
Adding Heterogeneous Hardware
DAS
NAS
Workload access variations
iSCSI
Failures
- Hardware failures - Software bugs - Operator
errors
Observe
Load Surges
Analyze
Request size
Act
IOPS
SPC OLTP
Indeterminate Goals
Harvard Campus
Time
  • - Imprecise information
  • - Changes in number of users, business models,
    over-provisioning thresholds, performance
    requirements, etc.

Time
5
Problem Statement Automation of the
Observe-Analyze-Act Loop
Workload access characteristics
. . .
E-mail
Data Warehousing
Web server

Application priorities/Utility functions
Storage Virtualization (Mapping Application data
to Storage Resources)
SLO Goals Latency-bound Throughput-bound
. . .
Component properties
  • Short-term Actions
  • Throttling
  • Painkillers Low cost
  • parameter tuning
  • Long-term Actions
  • Migration, Replication
  • Vitamins High cost
  • modifying the allocation of
  • resources to data

Permanent Changes Addition of new
hardware Surgery
6
Outline
  • Motivation
  • Birds eye view of Automated Storage Management
  • AutoLoop Automated Action Selection
  • Action parameters selection
  • Action selection
  • Conclusion

7
The Management Loop
Describe aspects of system behavior Component
modelsWorkload models Action models
Specify system goals Minimize the number of
workloads Violating SLOs
Decide what to do The system should invoke
throttling how to do it workload 1 throttled
to 500 I/Opswhen to do it. It needs to start
now
Observe system behaviors and triggeranalyze
engine
Current State
Execute action(s)
8
Goal
  • Automate the observe-analyze-act Loop
  • Focus on the key functionality of the analyze
    part
  • Automated Action Selection
  • Make decisions on short-term action or
    long-term ones

9
The Analyze Engine
Step 1 Generate possible corrective options
Step 2 Compare and pre-filter corrective options
Step 3 Select the action and schedule it
Option ltAction, Action Type, Invocation valuesgt

10
The Management Loop
Describe system behavior Component
modelsWorkload models Action models
Specify system goals Minimize the number of
workloads Violating SLOs
Decide what to do The system should invoke
throttling how to do it workload 1 throttled
to 500 I/Opswhen to do it. It needs to start
now
Observe system behaviors and triggeranalyze
engine
Current State
Execute action(s)
11
Knowledge Base Component Models
  • Objective Predict service time for a given load
    at a component (For example storage controller).
  • Service_timecontroller L( R1, , Rn)
  • Where Ri is the load characteristics from
    workload i, including read/write ratio, request
    size, request rate and random/sequential ratio
  • An example of component model
  • Single block level workload, Request Size 10KB,
    Read/Write Ratio 0.8, Random Access
  • Hardware configuration FAStT900 storage
    controller, 8 disks, RAID0

12
Knowledge Base Workload Models
  • Objective Predict the load on component i as a
    function of workload js features

Component_loadi,j Wi,j( workload j
characteristics)
1000 reqs/s
10MB/s
500 Iops
13
Knowledge Base Action Models
  • Objective Predict the effect of corrective
    action on component load and workloads
  • Example
  • Workload with 20KB request size, 0.642
    read/write ratio and 0.026 sequential access
    ratio

Workload J request Rate Aj(Token Issue Rate for
Workload J)
WorkloadRequest Rate
Token Issue Rate
14
Knowledge Base Interpolation Functions
  • Objective Describe workload/system trends and
    patterns
  • Example
  • Trend Workload increases 5 every month
  • Pattern The request rate peaks at 2-4pm everyday
  • Figure shows the load pattern on www.ibm.com for
    one week duration Chase et al. SOSP 2001
  • Time series analysis to predict pattern and
    trends
  • Example ARIMA (Auto Regressive Integrated Moving
    Average) algorithm

15
The Management Loop
Describe system behaviors Component
modelsWorkload models Action models
Specify system goals Minimize the number of
workloads Violating SLOs
Decide what to do The system should invoke
throttling how to do it workload 1 throttled
to 500 I/Opswhen to do it. It needs to start
now
Observe system behaviors and triggeranalyze
engine
Current State
Execute action(s)
16
The Analyze Engine
Step 1 Generate possible corrective options
Step 2 Compare and pre-filter corrective options
Step 3 Select the action and schedule it

17
Step 1 Generate Possible Corrective Actions
  • Use throttling and migration as examples for
    short-term and long-term actions
  • Throttling
  • Token issue rate for each workloads Chameleon
    usenix05
  • Migration
  • Dataset
  • Migration target
  • Migration speed

18
Corrective Action Generation Intuitions
  • Formulated as constrained optimization problems
  • Two parts
  • Performance prediction for given parameters
  • Input Invocation parameters
  • Output Expected Performance
  • Constrained optimization techniques to select
    optimal parameters

19
Part 1 Performance Prediction
  • Chain all models together to predict action
    result
  • Example throttling

Action Model
Workload 1
Component Model
Workload n
20
Part 2 Optimal invocation parameters
selectionThrottling
  • Formulated as constrained optimization problem
  • Throttling
  • Formulation
  • Variable Token issue rate for each workload (ti)
  • Objective Function
  • Minimize the weighted throughput distance to SLOs
  • Example
  • Constraints
  • Workloads should meet their SLO latency goals
  • Each components latency is based on the model
    chaining
  • Latency is summed over along the path

21
Part 2 Invocation Parameter Selection Migration
  • Step 1 Dataset selection
  • Filter high cost (size) and low benefit (load)
    datasets
  • Remove infeasible dataset

Size 10MB Load 500 I/Ops
Size 20MB Load 100 I/Ops
Size 500MB Load 1400 I/Ops
Size 500MB Load 500 I/Ops
Source
22
Part 2 Invocation Parameter Selection Migration
(cont.)
  • Step 2 Target selection
  • Variable Si 0,1 represents if component I is
    selected as the target or not
  • Objective Function Minimize load variance on the
    source and the target.
  • Constraints
  • Remaining workloads running on the target should
    meet their SLOs
  • The migrated dataset should meet its SLO
  • Step 3 Migration speed determination
  • Chameleon algorithm migration is another
    workload

23
The Analyze Engine
Step 1 Generate possible corrective options
Step 2 Compare and pre-filter corrective options
Step 3 Select the action and schedule it

24
Step 2 Compare and Pre-filter Corrective Options
  • Based on sky-line analysis
  • 2-dimentional (cost, benefit) sky-line graphs
  • Associate each candidate with (cost, benefit)
    values
  • Divide graphs into intervals according to benefit
    values
  • For each interval, eliminate all options not on
    the skyline

25
Cost/Benefit Definitions
  • Short-term actions
  • Cost number of workloads operating above or
    close to their SLOlatency
  • Benefit weighted sum of workloads throughput
    efficiency

26
Cost/Benefit Definitions
  • Long-term actions
  • Cost Size of data movement
  • Benefit Headroom
  • Step 1 MaxLoadj is the maximum allowed load for
    component j without violating any workloads SLO
    latency.
  • Step 2 AdditionLoadj is the additional load
    component j can accommodate without violating any
    workload SLOlatency
  • Step 3 Headroom is defined as

27
The Analyze Engine
Step 1 Generate possible corrective options
Step 2 Compare and pre-filter corrective options
Step 3 Select the action and schedule it

28
Step 3 Action Selection and Scheduling
  • Event types
  • Reactive Trigger
  • Analyze engine is triggered after system
    exceptions happened
  • Proactive Trigger
  • Analyze engine is triggered before system
    exceptions happens
  • Opportunity Window
  • System is lightly loaded

29
Step 3 Action Selection and Scheduling Flow Chart
Trigger From Observed Module
Reactive Trigger
Opportunity Window
Proactive Trigger
No
No
Yes
No
Yes
Yes
30
Take away points
  • Automated system management is a necessity
  • Actions differ in the operational semantics
    cost, benefit, lead-time for invocation
  • AutoLoop Selects corrective strategies along the
    entire spectrum of available actions
  • Currently building a prototype for deploying in
    real-world systems

31
Questions?
32
Related Publications
  • Chameleon a self-evolving fully-adaptive
    resource arbitrator for storage systems. Usenix
    2005.
  • MonitorMining A Gray-box approach for
    knowledge-base creation. IM 2005.
  • AutoLoop Automated Action Selection in the
    Observe-Analyze-Act Loop for Storage Systems.
    POLICY 2005.
  • Polus Growing Storage QoS Management beyond a
    4-Year Old kid. FAST 2004.
  • DecisionQoS an adaptive, self-evolving QoS
    arbitration module for storage systems. POLICY
    2004.
  • EoS An Approach of Using Behavior Implications
    for Policy-based Self-management. DSOM 2003.
  • Contact Information
  • yinli AT eecs.berkeley.edu sandeepu AT
    us.ibm.com
Write a Comment
User Comments (0)
About PowerShow.com