Connecting the Dots: Using Runtime Paths for Macro Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Connecting the Dots: Using Runtime Paths for Macro Analysis

Description:

Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen mikechen_at_cs.berkeley.edu http://pinpoint.stanford.edu – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 47
Provided by: MikeC261
Category:

less

Transcript and Presenter's Notes

Title: Connecting the Dots: Using Runtime Paths for Macro Analysis


1
Connecting the DotsUsing Runtime Paths for
Macro Analysis
  • Mike Chen
  • mikechen_at_cs.berkeley.edu
  • http//pinpoint.stanford.edu

2
Motivation
  • Divide and conquer, layering, and replication are
    fundamental design principles
  • e.g. Internet systems, P2P systems, and sensor
    networks
  • Execution context is dispersed throughout the
    system
  • gt difficult to monitor and debug
  • Lots of existing low-level tools that help with
    debugging individual components, but not a
    collection of them
  • Much of the system is in how the components are
    put together
  • Observation a widening gap between the systems
    we are building and the tools we have

3
Current Approach
Apache
Apache
Java Bean
Java Bean
Java Bean
Database
Database
4
Current Approach
A1
A2
  • Micro analysis tools, like code-level debuggers
    (e.g. gdb) and application logs, offers details
    of each individual component
  • Scenario
  • A user reports request A1 failed
  • You try the same request, A2, but it works fine
  • What to do next?

Apache
Apache
Java Bean
Java Bean
Java Bean
Database
Database
5
Macro Analysis
  • Macro analysis exploits non-local context to
    improve reliability and performance
  • Performance examples Scout, ILP, Magpie
  • Statistical view is essential for large, complex
    systems
  • Analogy micro analysis allows you to understand
    the details of individual honeybee macro
    analysis is needed to understand how the bees
    interact to keep the beehive functioning

6
Observation
  • Systems have a single system-wide execution paths
    associated with each request
  • E.g. request/response, one-way messages
  • Scout, SEDA, Ninja use paths to specify how to
    service requests
  • Our philosophy
  • Use only dynamic, observed behavior
  • Application-independent techniques

7
Our Approach
  • Use runtime paths to connect the dots!
  • dynamically captures the interactions and
    dependency between components
  • look across many requests to get the overall
    system behavior
  • more robust to noise
  • Components are only partially known (gray
    boxes)

Apache
Apache
Java Bean
Java Bean
Java Bean
Database
Database
8
Our Approach
  • Applicable to a wide range of systems.

9
Open Challenges in Systems Today
  • Deducing system structure
  • manual approach is error-prone
  • static analysis doesnt consider resources
  • Detecting application-level failures
  • often dont exhibit lower-level symptoms
  • Diagnosing failures
  • failures may manifest far from the actual faults
  • multi-component faults
  • Goal reduce time to detection, recovery,
    diagnosis, and repair

10
Talk Outline
  • Motivation
  • Model and architecture
  • Applying macro analysis
  • Future directions

11
Runtime Paths
  • Instrument code to dynamically trace requests
    through a system at the component level
  • record call path the runtime properties
  • e.g. components, latency, success/failure, and
    resources used to service each request
  • Use statistical analysis detect and diagnose
    problems
  • e.g. data mining, machine learning, etc.
  • Runtime analysis tells you how the system is
    actually being used, not how it may be used
  • Complements existing micro analysis tools

12
Architecture
request
  • Tracer
  • Tags each request with a unique ID, and carries
    it with the request throughout the system
  • Report observations (component name resource
    performance properties) for each component
  • Aggregator Repository
  • Reconstructs paths and stores them
  • Declarative Query Engine
  • Supports statistical queries on paths
  • Data mining and machine learning routines
  • Visualization

Aggregator
Developers/ Operators
Query Engine
Visualization
Path Repository
13
Request Tracing
  • Challenge maintaining an ID with each request
    throughout the system
  • Tracing is platform-specific but can be
    application-generic and reusable across
    applications
  • 2 classes of techniques
  • Intra-thread tracing
  • Use per-thread context to store request ID (e.g.
    ThreadLocal in Java)
  • ID is preserved if the same thread is used to
    service the request
  • Inter-thread tracing
  • For extensible protocols like HTTP, inject new
    headers that will be preserved (e.g. REQ_ID xx)
  • Modify RPC to pass request ID under the cover
  • Piggyback onto messages

14
Talk Outline
  • Motivation
  • Model and architecture
  • Applying macro analysis
  • Inferring system structure
  • Detection application-level failures
  • Diagnosing failures
  • Future directions

15
Inferring System Structure
  • Key idea paths directly capture application
    structure

2 requests
16
Indirect Coupling of Requests
  • Key idea paths associate requests with internal
    state
  • Trace requests from web server to database
  • Parse client-side SQL queries to get sharing of
    db tables
  • Straightforward to extend to more fine-grained
    state (e.g. rows)

Database tables
Request types
17
Failure Detection and Diagnose
  • Detecting application-level failures
  • Key idea paths change under failures gt detect
    failures via path changes.
  • Diagnosing failures
  • Key idea bad paths touch root cause(s). Find
    common features.

18
Future Directions
  • Key idea violation of macro invariants are signs
    of buggy implementation or intrusion
  • Message paths in P2P and sensor networks
  • a general mechanism to provide visibility into
    the collective behavior of multiple nodes
  • micro or static approaches by themselves dont
    work well in dynamic, distributed settings
  • e.g. algorithms have upper bounds on the of
    hops
  • Although hop count violation can be detected
    locally, paths help identify nodes that route
    messages incorrectly
  • e.g. detecting nodes that are slow or corrupt msgs

19
Conclusion
  • Macro analysis fills the need when monitoring and
    debugging systems where local context is of
    insufficient use
  • Runtime path-based approach dynamically traces
    request paths and statistically infer macro
    properties
  • A shared analysis framework that is reusable
    across many systems
  • Simplifies the construction of effective tools
    for other systems and the integration with
    recovery techniques like RR
  • http//pinpoint.stanford.edu
  • Paper includes a commercial example from Tellme!
    (thanks to Anthony Accardi and Mark Verber)

20
Backup Slides
21
Backup Slides
22
Current Approach
  • Micro analysis tools, like code-level debuggers
    (e.g. gdb) and application logs, offers details
    of each individual component

Apache
Apache
Java Bean
Java Bean
X 1 Y 2
X 2 Y 4
gdb
Java Bean
Java Bean
Java Bean
Java Bean
Java Bean
X 1 Y 2
X 5 Y 2
X 3 Y 2
Database
Database
Java Bean
Java Bean
X 2 Y 3
X 7 Y 1
23
Related Work
  • Commercial request tracing systems
  • Announced in 2002, a few months after Pinpoint
    was developed
  • PerformaSure and AppAssure focus on performance
    problems.
  • IntegriTea captures and playback failure
    conditions.
  • Focus on individual requests rather than overall
    behavior, and on recreating the failure
    condition.
  • Extensive work in event/alarm correlation, mostly
    in the context of network management (i.e. IP)
  • Dont directly capture relationship between
    events
  • Rely on human knowledge or use machine learning
    to suppress alarms.
  • Distributed debuggers
  • PDT, P2D2, TotalView, PRISM, pdbx
  • Aggregates views from multiple components, but do
    not capture relationship and interaction between
    components
  • Comparative debuggers Wizard, GUARD
  • Dependency models
  • Most are statically generated and are likely to
    be inconsistent.
  • Brown et al. takes an active, black box approach
    but is invasive. Candea et al. dynamically trace
    failures propagation.

24
1. Detecting Failures using Anomaly Detection
  • Key idea paths change under failures gt detect
    failures via path changes
  • Anomalies
  • Unusual paths
  • Changes in distribution
  • Changes in latency/response time
  • Examples
  • Error paths are shorter.
  • User behavior changes under failures
  • Retries a few times then give up
  • Implement as long running queries (i.e. diff)
  • Challenges
  • detecting application-level failures
  • comparing sets of paths

25
2. Root-cause Analysis
  • Key idea all bad paths touch root cause, find
    common features
  • Challenge a small set of known bad paths and a
    large set of maybes
  • Ideally want to correlate and rank all
    combinations of feature sets
  • E.g. association rules mining
  • May get false alarms because the root cause may
    not be one of the features
  • Automatic generation of dynamic functional and
    state dependency graphs
  • Helps developers and operators understand
    inter-component dependency and inter-request
    dependency
  • Input to recovery algorithms that use dependency
    graphs

26
3. Verifying Macro Invariants
  • Key idea violations of high-level invariants are
    signs of intrusion or bugs
  • Example Peer auditing
  • Problem A small number of faulty or malicious
    nodes can bring down the system
  • Corruption should be statistically visible in
    your behavior
  • look for nodes that delay or corrupt messages or
    route messages incorrectly
  • Apply root-cause analysis to locate the
    misbehaving peers
  • Some distributed auditing is necessary
  • Example P2P implementation verification
  • Problem are messages delivered as specified by
    the algorithms?
  • Detect extra hops, loops, and verify that the
    paths are correct
  • Can implement as a query
  • select length from paths where (length gt log2(N))

27
4. Detecting Single Point of Failure
  • Key idea paths converge on a single-point of
    failure
  • Useful for finding out what to replicate to
    improve availability
  • P2P example
  • Many P2P systems rely on overlay networks, which
    typically are networks built on top of the IP
    infrastructure.
  • Its common for several overlay links to fail
    together if they depend on a shared physical IP
    link that failed
  • Implement as a query
  • intersect edge.IP_links from paths

A
B
D
E
C
D
F
G
28
5. Monitoring of Sensor Networks
  • An emerging area with primitive tools
  • Key idea use paths to reconstruct topology and
    membership
  • Example
  • Membership
  • select unique node from paths
  • Network topology
  • for directed information dissemination
  • Challenge limited bandwidth
  • Can record a (random) subset of the nodes for
    each path, then statistically reconstruct the
    paths

29
Macro Analysis
  • Look across many requests to get the overall
    system behavior
  • more robust to noise

Macro Analysis
Request 1 Request 2 Request 3 Request 4
Component A X X X
Component B X
Component C X X
30
Properties of Network Systems
  • Web services, P2P systems, and sensor networks
    can have tens of thousands of nodes each running
    many application components
  • Continuous adaptation provides high availability,
    but also makes it difficult to reproduce and
    debug errors
  • Constant evolution of software and hardware

31
Motivation
  • Difficult to understand and debug network systems
  • e.g. Clustered Internet systems, P2P systems and
    sensor networks
  • Composed of many components
  • Systems are becoming larger, more dynamic, and
    more distributed
  • Workload is unpredictable and impractical to
    simulate
  • Unit testing is necessary but insufficient.
    Components break when used together under real
    workload
  • Dont have tools that capture the interactions
    between components and the overall behavior
  • Existing debugging tools and application-level
    logs only do micro analysis

32
Macro vs Micro Analysis
Macro Analysis Micro Analysis
Resolution Component. Complements micro analysis tools. Line or variable
Overhead Low. Can use it in actual deployment. High. Typically not used in deployment other than application logs.
33
Whats a dynamic path?
  • A dynamic path is the (control flow runtime
    properties) of a request
  • Think of it as a stack trace across
    process/machine boundaries with runtime
    properties
  • Dynamically constructed by tracing requests
    through a system
  • Runtime properties
  • Resources (e.g. host, version)
  • Performance properties (e.g. latency)
  • Arguments (e.g. URL, args, SQL statement)
  • Success/failure

request
Path
A
RequestID 1 Seq Num 1 Name A Host xx Latency
10ms Success true ..
A
B
D
C
D
E
E
F
34
Related Work
  • Micro debugging tools
  • RootCause provides extensible logging of method
    calls and arguments.
  • Diduce look for inconsistencies in variable
    usage.
  • Complements macro analysis tools.
  • Languages for monitoring
  • InfoSpect looks for inconsistencies in system
    state using a logic language
  • Network flow-based monitoring
  • RTFM and Cisco NetFlow classify and record
    network flows
  • Statistical and data mining languages
  • S, DMQL, WebML

35
Visualization Techniques
  • Tainted paths mark all flows that have a certain
    property (e.g. failed or slow) with a distinct
    color and overlay it on the graph
  • Detecting performance bottlenecks look for
    replicated nodes that have different colors
  • Detecting anomaly look for missing edges and
    unknown paths

36
Pinpoint Framework
Components
Requests
1
2
Communications Layer (Tracing Internal F/D)
3
Detected Faults
Logs
37
Experimental Setup
  • Demo app J2EE Pet Store
  • e-commerce site w/30 components
  • Load generator
  • replay trace of browsing
  • Approx. TPCW WIPSo load (50 ordering)
  • Fault injection parameters
  • Trigger faults based on combinations of used
    components
  • Inject exceptions, infinite loops, null calls
  • 55 tests with single-components faults and
    interaction faults
  • 5-min runs of a single client (J2EE server
    limitation)

38
Application Observations
  • large number of tightly coupled components that
    are always used together
  • of components used in a dynamic web page
    request
  • median 14, min 6, max 23

39
Metrics
  • Precision C/P
  • Recall C/A
  • Accuracy whether all actual faults are correctly
    identified (recall 100)
  • boolean measure

40
4 Analysis Techniques
  • Pinpoint clusters of components that
    statistically correlate with failures
  • Detection components where Java exceptions were
    detected
  • union across all failed requests
  • similar to what an event monitoring system
    outputs
  • Intersection intersection of components used in
    failed requests
  • Union union of all components used in failed
    requests

41
Results
  • Pinpoint has high accuracy with relatively high
    precision

42
Pinpoint Prototype Limitations
  • Assumptions
  • client requests provide good coverage over
    components and combinations
  • requests are autonomous (dont corrupt state and
    cause later requests to fail)
  • Currently cant detect the following
  • faults that only degrade performance
  • faults due to pathological inputs
  • Single-node only

43
Current Status
  • Simple graph visualization

44
Proposed Research
  • 3 classes of large network systems
  • Clustered Internet systems
  • Tiered architecture, high bandwidth, many
    replicas
  • Peer-to-peer (P2P) systems, including sensor
    networks
  • Widely distributed nodes, dynamic membership
  • Sensor networks
  • Limited storage, processing, and bandwidth.

45
P2P Systems Tracing
  • Trace messages by piggybacking the current node
    name to the messages
  • Tracing overhead
  • Assume 32-bit per node name and a very
    conservative log2(N) hops for each msg and
  • Data overhead is 40 for a 1500-byte message in a
    106-node system

46
P2P Systems Implementation Verification
  • Current debugging techniques lots of printf()s
    on each node and manually correlate the paths
    taken by messages
  • How do you know the messages are delivered as
    specified by the algorithms?
  • Use message paths to check for routing invariants
  • detect extra hops, loops, and verify that the
    paths are correct
  • Can implement as a query
  • select length from paths where (length gt log2(N))
Write a Comment
User Comments (0)
About PowerShow.com