Title: Generic Adaptive Control
1Generic Adaptive Control
- Contact Joe Hellerstein
- IBM Thomas J Watson Research Center
- hellers_at_us.ibm.com
- May 16, 2003
- http//www.research.ibm.com/PM
2Participants
- Research
- Joe Bigus (ABLE)
- Markus Debusman (University of Applied Science,
Wiesbaden Germany) - Yixin Diao
- Frank Eskesen
- Steve Froehlich
- Joe Hellerstein
- Alexander Keller
- Xue Lui (Univ. of Illinois)
- Sujay Parekh
- Lui Sha (Univ. of Illinois)
- Maheswaran Surendra (team lead)
- Dawn Tilbury (Univ. of Michigan)
- DB2
- Randy Horman
- Matt Huras
- Ed Lassettre
- Sam Lightstone
- Kevin Rose
- HVWS
- Noshir Wadia
- Eric Ye
- Server Group
- Lisa Spainhower
3Example Configuration Optimization in WebSphere
Web Servers
Application Servers
End Users
4Project Goals
- Develop a formal basis for resource management
problems with dynamics (especially policy
enforcement) - Demonstrate the practical value of the approach
- Evangelize the approach
- Book, tutorials, classes
- Methodology and tools
5Agenda
- Basics of Control Theory
- Regulating concurrent users in Lotus Notes pole
placement design - Regulating utilizations in Apache
- Optimizing response times in Apache
- Throttling DB2 utilities
- DB2 self-tuning memory
- Regulating service levels in a multi-tiered
eCommerce system (HotRod) - Educational efforts (book, tutorials)
- Summary
6Control of Lotus Notes eMail Server
Workload generator
RPCs
Administrator
MaxUsers
Lotus Notes Server
Target Queue Length
Measured Queue Length
7System IdentificationEstimate Transfer Function
8Controller Design
9Control of Apache Server
Contribution Multiple Input, Multiple Output
10Apache Control Enablements
OS (procfs)
Web Server
Master
CPU util Mem util
External Controller
GET/SET
SPAWN
KILL
mod_controller
Worker Procs
RT info
External RT Probe
11Model Structure
The Transfer Function Relationship
G11
KA
CPU
Two SISO models
G22
MEM
MC
Apache Server
12Model Comparison
Model Prediction
Two SISO Models
CPU SISO model fails because MC and KA both
affect CPU, MIMO model is able to capture this
relationship MEM Both models do a good job of
predicting system response
13Optimization of Apache Server
Workload generator
AutoTune Agent
Web Service requests
MaxClients
Apache System
Response Time
14Apache Operation
New Users
Close()
Timeout()
New conn
MaxClients
TCP Accept queue
Apache
Heuristic Find the smallest MaxClients that
eliminates TCP queueing
15Impact of MaxClients
Response Time
MaxClients
16AutoTune Using Fuzzy Rules
- Fuzzification
- Convert numeric variables to linguistic variables
- Characterized by membership functions
- Rule base
- IF-THEN rules
- Using linguistic variables
- Inference mechanism
- Activate the fuzzy rules (IF)
- Combine the rule actions (THEN)
- Defuzzification
- Convert linguistic variables to numeric variables
17Constructing Fuzzy Rules
Rule 3
Rule 1
- Decision making
- Increment direction
- Increment size
Response Time (RT)
Rule 4
Rule 2
MaxClients
- Rule 1 IF change-in-MaxClients is poslarge and
change-in-RT - is neglarge THEN next-change-in-MaxClients is
poslarge
- Rule 2 IF change-in-MaxClients is neglarge and
change-in-RT - is poslarge THEN next-change-in-MaxUsers is
poslarge - Rule 3 IF change-in-MaxClients is neglarge and
change-in-RT - is neglarge THEN next-change-in-MaxUsers is
neglarge
- Rule 4 IF change-in-MaxClients is poslarge and
change-in-RT - is poslarge THEN next-change-in-MaxUsers is
neglarge
18Apache default
Optimized setting
AutoTune Controlling MaxClients on Apache
19New optimized setting
Old optimized setting
AutoTune Response to a new workload
Workload changes
20DB2 UDB Utilities Throttling (SMART Project)
Target Utilization
Backup
Disk, CPU Utilizations
Restore
UDB Engine
Re-Balance
Sleep Delay
Server
21(No Transcript)
22Success Is
Small Effect on User Throughput
High System Utilization
Gap due to reduced utilization in sleep periods
1
Utilization
Time
Note This is a longer-time averaged value than
on slide 5.
23Throttling a Single Utility
Parameters characterizing DB2
Control error
Max thruput from utility workload
Thruput degradation
- Standard PI controller tries to reach E0
- Assume linear effect of throttling on Y
24Baseline Measurement idling
P1
Time
P2
P3
- Start is perf output after all Pi have read new
control value. - End is from closest output to control change
Start1
End1
Start2
End2
Control Points
Loop Throughput
Other (Sleep) Throughput
25Baseline Estimation
- Over time, record sequence (ti, pi, si)
- t Time
- p Perf at time t
- s SleepPct at time t
- Fit a curve to this data, to get model M
- E.g., Over some fixed time interval of the past
26Control with disturbance
Large Disturbance
Small Disturbance
- Baseline estimation needs work
- Cannot adjust to large workload change
- Controller response still OK
27Dynamic Surge Protection
- Systems can go from steady state
Internet
- to overloaded without warning
28Resource Actions With Lead Times
- Definition of lead time
- Delay from request to action taking effect
- Examples
- From provision a server to its servicing
requesting - From de-provision a server to its being returned
to a free pool - From increase size of a buffer pool to pool is
filled with data
29Effect of Lead Times on WAS Provisioning
30Benefits of Proactive Provisioning
31Autonomic Computing Dynamic Surge Protection
32CeBit Press
Reuters IBM Software Can Predict Computer
Demand C/Net IBM offers details on autonomic
software InfoWorld IBM to show new autonomic
suite at CeBIT IDG News IBM to show off new
autonomic technology InformationWeek More
Autonomic Capabilities From IBM InternetNewsIBM
Spruces Up Autonomic Computing Offerings
cw360.com IBM to demo autonomic technology at
CeBIT
33Control Theory Book
- Feedback Control of Computing Systems
- Wiley-Interscience
- Intended audience
- Computer scientist with minimal math background
(geometric series) who want to apply techniques
to practical problems - Control theorist looking for new applications
- Status
- 10 of 11 chapters at a beta level
- Expected completion by end of June
- Publication in 2004
34Table of Contents
- Introduction (Qualitative control theory)
- Model construction (statistics)
- Z-Transforms and transfer functions (component
models) - Block diagrams (system models)
- First order systems
- Higher order systems
- State space models (multi-variate models)
- Proportional control (feedback basics)
- Other classical controllers (PID, tuning
controllers) - State space feedback control (MIMO)
- Advanced topics
35Progress Towards Project Goals
- Develop/identify a formal approach
- Control theory based
- Demonstrate value
- Lotus Notes control w/o instabilities
- Apache simple way to optimize tuning parameters
- DB2 Utilities Throttling HotRod handling
resource actions with dead times - HotRod prototype resource actions w/lead times
- Evangelize
- Feedback Control of Computing Systems,
Wiley-Interscience - Tutorials Almaden, Integrated Management,
Stanford/Berkeley - Classes Columbia?, University of Michigan?
- AC toolkit integration
36http//www.research.ibm.com/PM
- "Using Control Theory to Achieve Service Level
Objectives in Performance Management," S Parekh,
N Gandhi, JL Hellerstein, D Tilbury, TS Jayram, J
Bigus, Real Time Systems Journal, 2002. - "Feedback Control of a Lotus Notes Server
Modeling and Control Design," N. Gandhi, S.
Parekh, J. Hellerstein, and D.M. Tilbury,
American Control Conference, 2001. (Best paper in
session.) - "An Introduction to Control Theory With
Applications to Computer Science," JL Hellerstein
and S Parekh, ACM Sigmetrics, 2001. - Using MIMO Feedback Control to Enforce Policies
for Interrelated Metrics With Application to the
Apache Web Serve," Y Diao, N Gandhi, JL
Hellerstein, S Parekh, and DM Tilbury. Network
Operations and Management, 2002. (Best paper in
conference.) - "MIMO Control of an Apache Web Server Modeling
and Controller Design," Y Diao, N Gandhi, JL
Hellerstein, S Parekh, and DM Tilbury, American
Control Conference, 2002. (Best paper in
session.) - "Using Fuzzy Control to Maximize Profits in
Service Level Management," Y Diao, JL
Hellerstein, S Parekh. Accepted to the IBM
Systems Journal, 2002. - "A First-Principles Approach to Constructing
Transfer Functions for Admission Control in
Computing Systems," JL Hellerstein, Y Diao, and S
Parekh. Conference on Decision and Control, 2002. - "Generic On-Line Discovery of Quantitative Models
for Service Level Management," Y Diao, F Eskesen,
S Froehlich, JL Hellerstein, A Keller, L
Spainhower, and M Surendra, IFIP Symposium on
Integrated Management, 2003. - On-Line Response Time Optimization of An Apache
Web Server," Yixin Diao, Xue Lui, Steve
Froehlich, Joseph L Hellerstein, Sujay Parekh,
and Lui Sha. To appear in International Workshop
on Quality of Service, 2003.