Title: Conclusions from the European Roadmap on Control of Computing Systems
1Conclusions from the European Roadmap on Control
of Computing Systems
Karl-Erik Årzén, Anders Robertsson, Dan
Henriksson LTH, Lund University, Sweden Mikael
Johansson, Håkan Hjalmarsson, Karl Henrik
Johansson Royal Institute of Technology ,
Sweden
FeBiD06, Vancouver, April 3, 2006
2Background
- Recent large research interest,
- (academically as well as industrially initiated)
in - Control-based methods for resource management
in real-time computing and communication systems - In most cases, allocation of memory, computing
and/or communication resources
3Examples
- Performance control of web-servers,
- Dynamic resource management in embedded systems,
- Traffic control in communication networks,
- Transaction management in database servers,
- Autonomic computing
- etc.
4eBusiness
- Multi-tier systems of Web browsers, business
logic and databases - Feedback at various levels
- Queue Control
- IBM, HP, Microsoft, Amazon, .
- Challenges
- Modeling formalisms (DES, ODEs, queuing theory,
) - Design of software and computing systems for
controllability
courtesy J. Hellerstein
5ARTIST2
- Roadmap outcome from ARTIST2-workshop in Lund,
Sweden, May 2005 - EU/IST FP6 Network of Excellence
- Embedded Systems Design
- NSF-supported workshop on
- Future trends in control of computer systems
- by Hellerstein, Tilbury Abdelzaher, May 2005
6(No Transcript)
7Roadmap
- Available for download at
- http//www.control.lth.se/user/karlerik/roadmap1.p
df - Experiment
- You have wireless network access try the
server! - or not.
8An admission (control) problem
9Report from Swed. Emergency Management Agency
10How to handle the overload problem?
- Overprovision
- (more capacity than needed on average)
- Admission control
- Some are denied access, but server continues to
operate. - Change service
- (sending text-only at high loads)
11Why is control of computing systems interesting?
-
- Multidisciplinary
- Several new challanges
- Not covered within one traditional research
domain (queueing theory, computer science,
systems and control) - Need systematic tools for design and analysis
- robustness to disturbances
- better performance
- Cost of operating computing systems is
raising/dominating (60-90) Hellerstein et al,
2005
12Outline
- Background Motivation
- Computer systems in a control theoretic framework
- Modeling issues
- Roadmap Research challenges in
- Control of server systems,
- Control of CPU resources,
- Feedback scheduling of control systems,
- Control of communication networks,
- Error control of software systems,
- Control middleware.
- - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - 4.15 pm Panel Top Three Challenges in Control
of Networks and Systems
13Contents of roadmap
- Six research areas
-
- Control of server systems,
- Control of CPU resources
- Feedback scheduling of control systems,
- Control of communication networks,
- Error control of software systems,
- Control middleware.
how flexibility, adaptivity, performance and
robustness can be achieved in a real-time
computing or communication system through the use
of control theory
14Modeling Formalisms
- Heuristic approach vs. Model based control
- Inherent robustness from feedback control
- One reason why many ad hoc stratergies work
- More can be gained (systematic design analysis)
- Basic principle Use simple enough models for
design and analysis - Model should capture essential dynamics and show
similar behavior as system for different
distributions and load cases.
15Modeling Formalisms
- Identification
- Sampling (SoH), noise, inherent nonlinearities
- First principles (conservation, queueing
theory) - Computing systems discrete-event dynamic systems
(DEDS) - real-time systems gt timed automata or
timed Petri nets - Risk of state-space explosion (does not scale
with arrival/service rates) - Well-suited for safetey and blocking properties,
but how does it relate to stability and
robustness?
16Modeling of queueing-systems
- Discrete event models
- Queue theoretic model (Markov chains etc.)
- Flow models (cont. time / average models)
- Discrete time models
17Modeling aspects
- Gain-scheduling (standard control principle)
- Choose among different control-parameters
depending on e.g., operating condition. - Good model structure of corresponding computing
system may change with work load (e.g., for
server systems) - Flow models OK for high loads
- DEDS-models feasible för low loads
- Interpolation between different model
structures?! - Transient vs steady-state behavior
18Actuator Mechanisms
- The difference between the service rate, µ, and
the arrival rate, ?, determines the delay
experienced by the requests. - Enqueue actuators (Changing the arrival rate)
- Admission control mechanism
- Change inter-arrival period of task upstream in
multitiered system - Dequeue actuator Changing the service rate
- Number of server threads
- Quality adaptation
- Dynamic voltage scaling
19Actuators - Implementation aspects
- Gate model
- Call gapping accept first u(kh) calls in
control interval - Percent blocking preserves distribution
20Related reseach areas
- Similarities/differences
- of the different domains
- Traffic flow control
- Manufacturing and supply chains
- Communication networks
- Power networks
- with respect to
- Where does the congestion appear?
- Routing?
- Available information (dest.)?
- Time/distance matters?
- Package dropping OK or not?
- Control action?
21Control of server systems
- Temporal control locally at server
- Direct or indirect objective
- (service provider vs. customer)
- Queue-management and load balancing
- Inherent nonlinearities
- Multi-tiered systems including large eCommerce
systems
22Example Admission control
- Objective
- Good transient behavior for traffic changes
- Preserve good performance for overload
situations - Measure of admission
- queue length
- average time
- utilization
- CPU load / energy consumption
- memory
-
23Example Feedforward feedback
24Control of server systems
- Prediction and state estimation based control
- State and actuator constraints
- Interestings region When do the flow-models
cease to be valid? - Changing models and criteria in different load
situations... - Very exciting new results on discrete-event based
estimation and control - DE-sampling vs. DT-sampling
- control ratio 1/5,
- bandwidth allocation 1/2
25Server systems - Research challenges
- Modeling issues (as discussed before)
- Control queueing theory ?
- Event-based control theory gap
- Control objectives
- References (load, utilization)
- Performance metrics and cost functions
- (upcrossing probabilities)
- Security, reliability, availability, efficiency
- Design patterns/Control patterns
- Software structure control structure and
analysis for software design - Well known in e.g., process control (ratio
control, cascade, midranging etc) - When should a queue problem be considered as
- an admission problem?
- an delay control problem?
- Large-scale distributed systems / multi-tier
systems - Distributed control, MPC,
26Control of CPU resources
- A large amount of feedback-based or adaptive
global QoS management systems have been proposed. - Early ad hoc schemes
- of multi-level feedback
- queue scheduling
- control-theoretical approaches
- using FC-EDF, EUCON
- Stancovic, Lu, Buttazzo,
The EDF-FC scheme (from Stankovic et al., 1999)
27Control of CPU resources The challenges and
research directions
- Multiprocessor systems
- Power-aware CPU scheduling
- Dynamic Voltage Scaling
- joint optimization problem of minimizing energy
while still meeting real-time constraints - already today receives a considerable attention
from the research community. - End-to-end resource management
- Resource management in distributed systems where
an activity spans multiple nodes - Hierarchical resource allocation schemes
- Cascaded structures with local allocation
- Efficient feedback scheduling mechanisms
- Scheduling algorithm overhead online
optimization doable?
28Feedback scheduling of control tasks
- Actuation
- Task period hi
- Solve two different problems
- Resource regulation
- Control the total utilization to avoid overloads
- Optimal resource distribution
- Assign individual task periods to optimize
performance
29Example Dynamic Real-Time Scheduling of Model
Predictive Controllers
- Based on on-line optimization of a cost function
- Convex optimization problem solved in each sample
- Iterative anytime algorithm
- Result gradually refined up to a certain bound
- Attractive control strategy
- Straightforward to use for multi-variable
processes - Ability to handle constraints
- Unattractive real-time properties
- High computational demands
- Very large variations in execution times
Henriksson et al. 2004
30Example Feedback scheduling of MPC control
tasksMain idea
- A process in stationarity may need less resources
than a process in a transient phase - Use feedback from the optimization algorithm to
determine - for each MPC task, when to terminate the
optimization and output the control signal, and - the optimization may be terminated early and
still produce acceptable results. - which of several ready MPC tasks that should be
scheduled for execution.
Henriksson et. al., 2004
31- Current values of the cost functions act as
dynamic task priorities - Constitutes an on-line QoS measure for the task
- Reflects the relative importance of the tasks
- Feedback scheduler distributes the computing
resources - Schedules MPC task with highest cost
- Invoked after each iteration
- Implemented as a separate task
32- Cooperative robot task under resource constraints
- Master and slave configuration
- Ball and beam application
33- Problems
- MPC tasks exhibit very large variations in
execution time - Traditional scheduling theory not applicable
- Solutions
- Premature termination of optimization
- Dynamic scheduling based on cost functions
34The challenges and research directions for
feedback scheduling of control tasks
- include all the challenges and research direction
of control of CPU resources. - Additionally, the following items are important
- Temporal robustness indices
- Formal performance guarantees
- open question whether it is possible to combine
the flexibility implied by feedback scheduling
with formal guarantees
35Control of Communication Networks
- Example
- Feedback control is embedded in the TCP protocol
in the form of a sliding window mechanism. - Introduced in the 80s to solve the congestive
failure problems that had brought down the
network. - We have not experienced system-wide congestive
failures again even though the network has grown
orders of magnitude. - This is a testament of the effectiveness of
feedback control in a highly dynamic,
decentralized, and fast changing environment. - Remark
- 9.00 Robust yet Fragile Intrinsic Tradeoffs in
Layered Architectures
36Control of Communication Networks
- Feedback control mechanisms are fundamental for
the separation of communication layers - Gives robustness and allows local optimization
and refinements
- Example
- Reliable data transfer over wireless link through
suitable feedback control of - transmission power
- modulation scheme
- channel coding
37Research Challenges in Control of Communication
Networks
- Architectures and model abstractions for network
control - Network models suitable for control and observer
design - Robustness of large scale and distributed systems
- Resource management in wireless networks
- Cross-layer adaptation for new services and
optimized performance
38Cross-layer adaptation for improved performance
of cellular and wired networks
- Bandwidth variations in radio link give
performance degradations due to large
end-to-end delay and improper transport protocol - Proxy between cellular and wired networks adapt
sending rate to bandwidth variations through
available radio link state information
TCP
App Server
3G-SGSN
RNC
BW variations
3G-GGSN
Internet
PROXY
BTS
3G Cellular Network
TCP
BTS
Terminal
39Proxy hybrid control law
- Controller in proxy regulates sending rate based
on - Events generated by bandwidth changes obtained
from RNC - Sampled measurements of queue length in RNC
Möller et al., 2005
40Experimental evaluation
- Improved time-to-serve-user and link utilization
compared to traditional end-to-end protocol
- Stability and robustness analysis of new protocol
- Ongoing experimental evaluation and testing with
Möller et al., 2005
41Network-aware control architecture
- Estimate network state
- Delay
- Data loss probability
- Bandwidth
- Adjust controller accordingly
42Network-aware controllers
- Control algorithms to cope with communication
imperfections - Control under network delay
- Control under data loss
- Control under bandwidth limitation
- Control under topology constraints
Characteristics depend on network technology
43Delay estimation
- Internet round-trip time (RTT) data are
noisy with piecewise constant
average - Complex network dynamics hard to model
- RTT estimation in TCP
- Improved estimation thru Kalman filter with
hypothesis test (CUSUM filter)
Jacobsson et al., 2004
44Control middleware
- Middleware
- a software abstraction layer that mediates the
interactions between a component or application - Commonly used in distributed system to provide
communication services. - Java-RMI, Microsofts .COM, and CORBA
- Networked embedded system applications,
- e.g., mobile systems and sensor systems.
- GAIA Romn et al., 2002, WSAMI Issarny et al.,
2005, and AURA
45Control middleware
- Research Directions
- The most important research item for control
middleware is to develop these systems from
research prototypes to something that may be used
more widely. - Middleware functionality
- Still an open question whether the middleware
should - be passive, i.e., provide sensing and actuation
services that the application can use to itself
implement the feedback control, or if it should
be - active, i.e., the middleware should be
responsible for the actual control loop. - Both of these approaches have advantages and
disadvantages.
46Error control of software systems L.Sha
- The idea behind error control of software is to
use ideas similar to the ideas used in feedback
control in order to detect malfunctioning
software components and, in that case fall back
on, a well-tested core software component that is
able to provide the basic application service
with guarantees on performance and safety. - Provide techniques and tools that support making
the semantic assumptions of each software
component explicit and machine checkable.
47- Simple and reliable core
- System remain in recoverable states
- SIMPLEX-architecture Sha
- High accurance vs high performance
- Need to stay in recoverable state
- Runs in parallell --- cmp bumpless transfer
- --------------------------------------------------
-------------------------- - ORTGA FeBID06
- Maximum stability region
- How to detect conditions for switches? (FDI)
- False alarm vs. Non-recovery risk of instability
48Roadmap
- Available for download at
- http//www.control.lth.se/user/karlerik/roadmap1.p
df
49(No Transcript)
50Conclusions
- Thank you for your attention!
- Questions?
- Panel debate
51(No Transcript)
52Proposed solutions for wireless TCP
- Split connection
- Destroys end-to-end semantics
- End-to-end protocols
- Deployment issues
- Link-layer improvements
- Performance limitations
- E.g., Balakrishnan et al., Ludwig and Katz,
Xylomenos et al., Huang et al., Hossain et al.,
RFC 3135 and 3366,