Applying Control Theory to Stream Processing Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Applying Control Theory to Stream Processing Systems

Description:

A flexible and scalable architecture for system log processing ... Also, performance of a node depends on SELECTIVITY of relational operator. Depends on input data ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 34
Provided by: abc99
Category:

less

Transcript and Presenter's Notes

Title: Applying Control Theory to Stream Processing Systems


1
Applying Control Theory to Stream Processing
Systems
  • Wei Xu (xuw_at_cs.berkeley.edu)
  • Bill Kramer (kramer_at_lbl.gov)
  • Peter Bodik (bodikp_at_cs.berkeley.edu)

2
Just for fun
Jan. 7, 2005
VOD system on an AirBus 330 We need 20 minutes
to reboot the system
3
Outline
  • System log as data streams
  • Applying control theory
  • Accurate data source
  • Controlling queue length
  • Lessons learned

4
Introduction
  • Goal of our project
  • A flexible and scalable architecture for system
    log processing
  • Explore general techniques of applying control
    theory to systems
  • Problem data rate up to 1 TB a day
  • data are very complex

5
Example of system log data
  • request data
  • Apache log, etc
  • performance data
  • CPU, mem etc.
  • failure data
  • Detected problems /error messages
  • reports from operators

6
Preprocessing
  • Sanitize the data
  • Remove/encrypt sensitive information before the
    data get into permanent storage
  • Sanitize in different levels
  • Put logs into common format
  • Merge information from various sources
  • Sampling

7
The big picture
8
Early Experiences
  • Ad-hoc Scripts
  • Tedious
  • Hard to change
  • Relational databases
  • Static schema
  • Hard to support temporal queries
  • Have to store all the data

9
Stream processing ?
  • system log data are data streams
  • preprocessing is a continuous query
  • Telegraph Continuous Query (TCQ)
  • data stream processing engine
  • SQL queries
  • sliding time window
  • adaptive execution optimized on-the-fly
  • performance doesnt depend on queries

10
Data preprocessing architecture
load splitter
combiner
SLT 1
SLT 2
11
Outline
  • System log as data streams
  • Applying control theory
  • Accurate data source
  • Controlling queue length
  • Lessons learned

12
Why do we need control?
  • Data source does not provide accurate data rate

13
Control Problems
  • Not accurate for various reasons
  • Scheduling
  • Time spent on I/O
  • Etc.
  • Providing an accurate data source using feedback
    control
  • By controlling the input of desired rate

14
The Control Architecture
1500
1900
1600
P Controller (with precompensation)
u(k)Kpe(k)
U(k)u(k-1)(KpKI)e(k)-Kpe(k-1)
15
Result An accurate data source
P Controller with Pre-compensation
PI Controller
16
Zoom In
A lot of small disturbance in a Java
program Incremental garbage collection
P Controller
PI Controller
17
Outline
  • System log as data streams
  • Applying control theory
  • Accurate data source
  • Controlling queue length
  • Lessons learned

18
Problem performance disturbance
  • Significant network traffic
  • Memory Leak
  • System Process Interference
  • Packets dropped during transferring stream
  • Other failures

Also, performance of a node depends on
SELECTIVITY of relational operator Depends on
input data
19
Description of the system
TCQ Complex internal structure
Input Buffer
Controlled Data Source
20
Why do we need control?
  • TCQ node drops tuples when result queue fill up

Source
Buffer
TCQ
Result Q
21
Control Problems
  • Regulate queue length on TCQ node
  • By controlling buffer output rate
  • Prevent dropping tuples
  • Maximize throughput
  • Tolerate disturbance

22
System with Control
23
Controller
U(k)u(k-1)(KpKI)e(k)-Kpe(k-1)
24
Result regulating queue length
Source
Buffer
TCQ
Result Q
25
Result Under CPU Contention
Source
Buffer
TCQ
Result Q
26
Outline
  • System log as data streams
  • Applying control theory
  • Accurate data source
  • Controlling queue length
  • Lessons learned

27
Why theory is useful?
  • One of my implementations .. What happened?

Source
Buffer
TCQ
Result Q
28
What is going on?
Controlled Output Thread(Code Reuse)
Queue Length Controller
Desired Queue length
Data Rate to TCQ
Actual Queue Length
29
Theory meets reality
Queue length
Time
30
Conclusion
  • Advantages of feedback control
  • Make system more robust under disturbance
  • Allows more time for failure detection
  • Treat complex systems as black boxes
  • Cope with the system characteristics instead of
    having to change it
  • Theoretical analysis
  • Implementation is easy
  • System statistics can also be used for SLT

31
Future Work
  • Load balancer
  • Load control across multiple tiers
  • Scheduling of multiple streams

32
Backup Slides
33
Tricky part of parameter estimation
Model evaluation Making the system operate in
desired range
Data rate vs free space
Free Space
Non-Linear range
Easy for data source, but queue length ..
Write a Comment
User Comments (0)
About PowerShow.com