Title: Feedback%20Control%20Real-Time%20Scheduling
1Feedback Control Real-Time Scheduling
- C. Lu, J.A. Stankovic, G. Tao, and S.H. Son,
Design and Evaluation of a Feedback Control EDF
Scheduling Algorithm, IEEE Real-Time Systems
Symposium (RTSS'99), December 1999.
2Motivation for Feedback control Scheduling
- Open-loop scheduling paradigms perform poorly in
unpredictable dynamic systems where the workload
cannot be accurately modeled - Many complex applications, e.g., robotics and
agile manufacturing, are dynamic and operate in a
non-deterministic environment where precise
workload is not known - Challenging to build real-time systems providing
predictable performance in a highly uncertain
environment - Feedback control can support the target
performance even when the workload varies
dynamically via graceful QoS degradation in a
closed-loop loop
3Motivation
- Apply control theoretic approaches to real-time
performance management - Feedback control is well known for its
robustness, e.g., cruise control or chemical
reactor control, in the presence of disturbances - Doesnt need a precise system model
- If the precise system model is known, feedback
control is not necessary - Dynamically adapt the system behavior to achieve
the targe performance (also called set point) in
the feedback loop
4Feedback Control Concepts
Measured Perf.
Control Signal
Setpoint
Error
Controller
Controlled RT System
-
- Set-point Target performance to achieve, e.g.,
1 deadline miss ratio - Measured perf Actual perf, e.g., actual
(deadline) miss ratio, measured at the current
sampling period - Error set-point measured perf target miss
ratio current miss ratio
5Feedback Control Loop
- Periodically measure and compare the perf to the
set point to determine the error - Controller computes the control signal based on
the error and controlled system model - Actuator, e.g., admission controller or QoS
manager, change the value of the manipulated
variable to control the system
6FC-EDF Architecture
7Miss Ratio Control Model
- At kth sampling instant, miss ratio is
- m(k) m(k-1) g(k) ?u(k-1) where
- m(k-1) miss ratio at the (k-1)th sampling period
- g(k) miss ratio gain
- ?u(k-1) utilization adjustment by admission
control and QoS adaptation at the (k-1)th
sampling period
8Miss Ratio Control Model
- Instead of considering time-varying miss ratio
gain g(k), they took G maximum (miss ratio/unit
load increase)
Miss Ratio
Miss ratio control is very challenging due to the
nonlinear nature of MR increase!!
Load
0.9 1 1.1 1.2 1.3 ...
9Miss Ratio Control Model
- Replace g(k) with G
- m(k) m(k-1) g(k)?u(k-1) ?
m(k) m(k-1) G?u(k-1) - Take z-transform to convert to frequency domain
- Convert from time domain to frequency domain
- You can do arithmetic manipulation rather than
solving (partial) differential equations -
10- Apply z-transform to m(k) m(k-1) G?u(k-1)
- M(z) z-1M(z) z-1?U(z)
- M(z) (G/z-1) ?U(z)
- Transfer function T(z) output/input M(z)/U(z)
G/z-1
11Utilization Control Model
- Miss ratio controller itself is not stable
- MR controller is saturated when utilization is
less than 1 if EDF is used - In their later work, they added utilization
controller - Utilization controller works when U 1, miss
ratio controller works when U gt 1 - Turn on/off util/MR controller when U 1
- Turn on/off MR/util controller when U gt 1
- Good idea?
12Controller Tuning
- Given the control model shown in the previous
slide, apply Root Locus model to graphically tune
the controller in Matlab to support the stability
specified transient performance such as the
overshoot and settling time
13Feedback performance control in software services
- T.F. Abdelzaher, J.A. Stankovic, C. Lu, R. Zhang,
and Y. Lu, Feedback Performance Control in
Software Services, IEEE Control Systems, 23(3)
74-90, June 2003.
14Overview
- SW systems become larger and bigger
- Performance guarantee required, e.g., in
web-based e-commerce - Control theory
- Promising theoretical foundation for perf control
in complex SW applications, e.g., real-time
scheduling, web servers, multimedia control,
storage mangers, power management, routing in
computer networks,
15Overview
- Software performance assurance problems
- Feedback control problems focused on web server
performance guarantee problems - Data centers
16SW performance control
- Less rigorous guarantees on perf and quality
- Most SW eng. research deals with the development
of functionally correct SW - Functional correctness is not enough!
- Timeliness in embedded systems
- Correct but delayed action can be disastrous
- Non-fucntional QoS attributes, e.g., timeliness,
security, availability,
17Traditional approaches for perf guarantees
- Worst case estimates of load resource
availability - Recall EDF, RM, DM, Priority Ceiling Protocol,
18New demand for performance assurance
- QoS guarantees required in a broader scope of
applications run in open, unpredictable
environments - Global communication networks enabling online
banking, trading, distance learning, - Points of massive aggregation suffering
unpredictable loads, potential bottlenecks, DoS
attacks, - -gt Precise workload/system model unknown a
priori - Failure to meet QoS requirements -gt loss of
customers or financial damages - Worst case analysis/overdeisgn could be overly
pessimistic or wasteful - Solid analytic framework for cost-effective perf
assurance required
19Challenges
- How to model SW architecture?
- How to map a specific QoS problem into a feedback
control system? - How to choose proper SW sensors and actuators to
monitor and adjust perf and workloads/resource
allocation? - How to design controllers for servers?
- -gt This paper focuses on web servers
20QoS metrics
- Delay metrics
- Proportional to time queuing delays, execution
latencies, service response time - Rate metrics
- Inversely proportional to time
- Connection bandwidth, throughput, packet rate
21Time-related perf attributes
- Can be controlled by adjusting resource
allocation - Queuing theory can predict perf given a
particular resource allocation or vice versa - Queuing theory only works for Poisson arrival
patterns - Queuing theory can only predict average perf even
if this assumption holds - Arrival patterns in web applications follow
heavy-tailed distribution -gt Bursty arrival
patterns
22Service architecture
Liquid task model
Fig. 1 Server architecture (a) computing model
(b) control-oriented representation
23Liquid task model
- Ci ltlt Di
- Takes Ci units of time to serve request i
- Di is the max tolerable response time
- Tolerable response time is finite
- Service times are infinitesimal
- Progress of requests through the server queues
Fluid flow - Service rate at stage k dNk(t)/dt where Nk is
requests processed by stage k
24Liquid task model
- Volume at time T requests queued at stage k
?T(Fin Fk) - Fk service rate at stage k
- Fin request arrival rate to this stage
- Valves points of control, i.e., manipulated
variables such as the queue length - Liquid model does not describe how individual
requests are prioritized - Control theory can be combined with queuing
theory or real-time scheduling
25Server modeling
- Difference equation to model web servers
- y(k) perf, e.g., delay or throughput, measured
at the kth sampling period - U(k) control input at the kth sampling period
- ARMA (Auto Regressive Moving Average) model
- y(k) a1y(k-1) a2y(k-2) any(k-n)
- b1u(k-1) b2u(k-2) bnu(k-n)
- n system order higher order model is usually
(not always!) more accurate but more complex - Transfer function can be derived
- Web proxy cache model 4
- TCP dynamics 5
26Transfer function
- Shows the relation between input and output
- Apply z-transform to y(k) in the previous slide
- Open loop transfer function vs. closed-loop
transfer function
27Resource allocation for QoS guarantees
- Allocate more/less resource open/close a valve
- Need actuators to control resource allocation or
QoS provided by the system
28SW system actuators
- Input flow actuators
- Admission control
- Control queue length, server utilization,
- Reject some requests under overload
29SW system actuators
- Quality adaptation actuators
- Change processing requirements to increase server
rate under overload - E.g., Return abbreviated web page under overload
- Tradeoff btwn delay quality
- Service level m in a range 0, M where 0 is
rejection
30Resource reallocation actuator
- Alter the amount of allocated resources
- Usually applicable to multiple classes of
clients, e.g., dynamically reallocate disk space
for differentiated web caching to support the
service delay ratio 12 between two service
classes 4,7
31QoS Mapping
- Convert common resource management SW perf
assurance problems to FC problems - Absolute convergence guarantee
- Relative guarantee
- Resource reservation guarantee
- Prioritization guarantee
- Statistical multiplexing guarantee
- Utility optimization guarantee
32Absolute convergence guarantee
- Convergence to the specified problem
- Overshoot Maximum deviation
- Settling time Time taken to recover the desired
perf -
33Absolute convergence guarantee
- Rate queue length control
- Result in linear FC
- (Flow) rate can be directly controlled by
actuators - Queue length can be linearly controlled by
controlling the flow - E.g., server utilization control loop
34Absolute convergence guarantee
- Delay control
- More difficult
- Delay is inversely proportional to flow
- Queuing delay d Q/r where Q is queue length r
is service rate - Nonlinear
35Relative guarantee
- For example, fix the delays of two traffic
classes at a ratio 31 - Hi measured perf of class i
- Ci weight of class i
- Relative guarantee specifies H1H2 13
- Set point 1/3
- Error e 1/3 H1/H2
36Relative guarantee in Apache web server
- Controlled variable relative delay ratio
- Manipulated variable allocated processes per
class to control connection delay - HTTP protocol summary
- A client, e.g., a web browser, establishes a TCP
connection with a server process - The client submits an HTTP request to the sever
over the TCP connection - The server sends the response back to the client
- Keep open the TCP connection for the Keep Alive
interval, e.g., 15s - -gt Claim connection delay dominates service
response time - -gt Scheduling can also significantly affect
relative delay ratio, but it is not considered
37Relative guarantee in Apache web server
- System identification based on the ARMA model
(Least square method) - Also called System Identification (SYSID) in
control theory - Randomly change per class process allocations
- Measure response time
38Relative guarantee in Apache web server
- Perf settings
- 4 Linux machines run the Surge web workload
generator - 1 Linux machine runs the Apache web server
- Suddenly increase premium clients by 100 at time
870s
39Relative guarantee in Apache web server
Open Loop
Stable?
Closed Loop
40Related work
- ControlWare
- CPU scheduling
- Storage management
- Network routers
- Power/heat management
- RTDB
41Conclusions
- Feedback control is applicable to managing
performance in SW systems - Future work
- Adaptive/robust control
- Predictive control
- Apply to other computational systems such as
embedded systems
42Adptive Control Self-Tuning Regulator
- Dynamically estimate a model of the system via
the Recursive Least Square method - Controller will accordingly set the actuators to
support the desired perf.
43References (HP Storage Systems Lab)
- Designing controllable computer systems, Christos
Karamanolis, Magnus Karlsson and Xiaoyun Zhu.
USENIX Workshop on Hot Topics in Operating
Systems (HotOS), June 2005, pp. 49-54, Santa Fe,
NM. - Dynamic black-box performance model estimation
for self-tuning regulators, Magnus Karlsson and
Michele Covell. International Conference on
Autonomic Computing (ICAC), pp. 172-182, June
2005, Seattle, WA.
44Autonomic Computing
- General, broader research issues regarding
self-tuning, self-managing, self- systems - Autonomic computing web site
- http//autonomiccomputing.org/
- IBM
- http//www.research.ibm.com/autonomic/index.html
- Adaptive Systems Department
45Some University Labs
- Tarek Abdelzaher http//www.cs.uiuc.edu/homes/zah
er/ - Chenyang Lu http//www.cse.wustl.edu/lu/
46Next class
- We will discuss papers from our RTES Lab on
feedback control of software system - K. D. Kang, J. Oh, Y. Zhou, "Backlog Estimation
and Management for Real-Time Data Services", 20th
Euromicro Conference on Real-Time Systems (ECRTS
'08), July 2-4, Prague, Czech Republic. - C. Basaran, K. D. Kang, M. H. Suzer, K. S. Chung,
H. R. Lee, K. R. Park, "Bandwidth Consumption
Control and Service Differentiation for Video
Streaming," 17th International Conference on
Computer Communications and Networks (ICCCN '08),
August 3 - 7, 2008, St. Thomas U.S. Virgin
Islands.
47Questions?