Using Queries for Distributed Monitoring and Forensics

About This Presentation

Title:

Using Queries for Distributed Monitoring and Forensics

Description:

Using Queries for Distributed Monitoring and Forensics ... Performance debugging [Magpie, Causeway...] Configuration debugging for BGP, OSes [Time-travel... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 21

Provided by: intel156

Category:

more less

Transcript and Presenter's Notes

Title: Using Queries for Distributed Monitoring and Forensics

1
Using Queries for Distributed Monitoring and
Forensics

Atul Singh
Rice University

Peter Druschel Max Planck Institute for Software
Systems
Timothy Roscoe Intel Research Berkeley
Petros Maniatis Intel Research Berkeley
2
Building and monitoring a system

Building a distributed system is a complex
undertaking
Select properties
algorithms
implement, deploy
Switch to monitoring the system
Testing, debugging, profiling, tuning
Monitoring is hard, error-prone

Distributed state
Partial faults
Complex interactions
Asynchronous
External factors

3
Monitoring is hard!

Current state of the art
Manual insertion of printf
Bringing logs to one place
Parsing/processing of logs
Scripts (perl/python)
Queries (Astrolabe)
Offline by nature

Expose internal state
Ad-hoc, error-prone

Probe exposed state
Correlate events
Bridge the semantic gap

4
Declarative systems building systems via queries
Probe the state

Declarative specification via queries
Execution by a distributed query processor
P2SOSP05 a prototype declarative system
Concise specifications
Enables rapid prototyping
We present a monitoring framework for P2
Flexible introspection
Retains semantics of application
Online execution tracing

Expose internals
5
Overview

Introduction
P2 Background
Monitoring framework
Example applications/Performance
Conclusions

6
Example route operation in P2
action -
event,
precondition.
K
Rule strand
Join route.A nextHop.A
Select D K
Project

route(B,K) - route(A,K),
nextHop(A,D,B),
D K.

Application state
nextHop
7
Overview

Introduction
Background
Monitoring framework
Examples applications/Performance
Conclusions

8
Introspection and Logging

Introspection at three levels
Application state level
Rule level
Dataflow level
Systematic instrumentation
System is built using smaller, re-usable
components
Systematic insertion of logging statements
Logging data is in the form of tuples
Retains semantics of application logic
No need for translation

9
Tracing rule executions

We want to step through the execution
Each step corresponds to a rule
Do it in online fashion
For rule level tracing
Need to trace tuples
Match output tuple to input
Track tuples as they go over wire

Node A
Node B
r1
r0
x
y
w
z
10
(1) Tracing rule executions

Matching input and output tuples of a rule
Tap elements at the beginning and end of a rule
Execution tracer tracks rule executions
Execution records are stored as tuples in exec
table

x
y
input
output
Execution Tracer
exec
11
(2) Tracing tuples across wire

Each tuple has a locally unique ID
Tuple ID is sent along with the tuple
Upon receiving, a new tuple is created with
different ID
Hooks in the network in/out handling subsystem
A record is created
tuples local ID
tuples remote ID
Node from which it came from

x
Network Out
A
B
Network In
y
B tupleTable
12
Putting it all together
Node A
Node B
r1
r0
w
z
x
y
tupleTable
exec
tupleTable
exec

Of course in reality, its more complicated
Aborted rule executions
Pipelined rule executions

13
Overview

Introduction
Background
Monitoring framework
Example applications/Performance
Conclusions

14
Example applications (I)

Distributed watchpoints Trigger an event if true
Possibly trace back/forward
Oscillation of faulty/stale information (route
flaps)
Gossiping for stabilization or updates
Inconsistent routing in DHTs Pastry, Chord,
Each node is responsible for a unique region
Route using distinct paths and check Bamboo,
Secure Routing

15
Example applications (II)
r1

Online execution profiling
How much time is spent in each rule?
Where are the bottlenecks?
Which rule is costlier? What operation?
Consistent Snapshots Chandy-Lamport
Snapshot for the routing state
Queries on snapshots itself
What is the degree distribution?
How many node-disjoint paths?
No more than 16 rules for any of the above

r2
r3
16
Performance

21 node Chord overlay in P2
Monitored node on separate, unloaded machine
Overhead of introspection
CPU (0.98 1.3), Memory (8MB 13MB)
Consistent distributed snapshot
Other results in the paper

CPU Util.
Tx pkts(X1000)
Rate (1/sec)
Rate (1/sec)
17
Related Work

Management using database techniques Hy
Performance debugging Magpie, Causeway
Configuration debugging for BGP, OSes
Time-travel
Distributed debuggers WiDS, Pip, Replay
Debugging
Deep embedded monitoring IBM Websphere,
Adaptations

18
Conclusions

Declarative development of systems
Integrated approach to building and monitoring
Automatic execution tracing
Online, in-place monitoring
Step towards autonomic distributed systems
Fault-finding tasks evolve with the system
Interesting future directions
User interface
Trade-off between monitoring accuracy and
overhead
Questions? Thank You

19
Request to EuroSys

Please schedule my next talk on the first day
Move the submission deadline away from NSDI
(last year, NSDI submission (19th Oct), EuroSys
(20th))

20
Questions?

Thank You!

Write a Comment

User Comments (0)

About PowerShow.com

Using Queries for Distributed Monitoring and Forensics - PowerPoint PPT Presentation

Using Queries for Distributed Monitoring and Forensics

Using Queries for Distributed Monitoring and Forensics ... Performance debugging [Magpie, Causeway...] Configuration debugging for BGP, OSes [Time-travel... – PowerPoint PPT presentation