Autonomous%20Recovery%20in%20Componentized%20Internet%20Application%20Candea%20et.%20al%20%20Vikram%20Negi PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Autonomous%20Recovery%20in%20Componentized%20Internet%20Application%20Candea%20et.%20al%20%20Vikram%20Negi


1
Autonomous Recovery in Componentized Internet
ApplicationCandea et. alVikram Negi
2
Introduction
  • Autonomic Problem
  • Approach
  • Results
  • Discussion

3
The Autonomic Problem
  • To allow the application to recover automatically
    from transient and intermittent software failure.

4
The Approach
  • Introduce the idea
  • Microanalysis (fault detection)
  • Microrebooting (rapid recovery)
  • External Management (recovery action)
  • Integrate and Test with JBOSS

5
Design Overview
  • Autonomous Process
  • Monitoring
  • Java probes
  • Fault detection
  • Generate Anomaly report
  • Recovery
  • Takes action
  • Total time to recovery.

6
J2EE Review
  • J2EE enterprise apps collection of reusable
    Java modules
  • JSPs / servlets invoke EJBs, which invoke other
    EJBs, ...
  • EJB Java component that complies to a certain
    interface and provides a service
  • Deployment descriptor (per-bean XML file) conveys
    run-time characteristics and dependencies used
    in deploying the application

7
JBoss Design
  • Open-source J2EE app server
  • Written entirely in Java
  • Microkernel with components held together by JMX
    (Mgmt Support)

8
JAGR ROC-ified JBoss with Application-Generic
Recovery
  • 3 Tier Architecture
  • Key Components
  • Macro analysis Engine
  • Microrebooting Hook
  • Recovery Manager

9
Pinpoint Detection and Localization
  • Store Observation
  • IP address of machine, timestamp
  • Globally unique request ID.
  • of calls/returns to EJBs
  • Association between sender and receiver.
  • Collect SQL Queries, update, read

10
Pinpoint Analysis
  • Analysis Engine
  • Centralized Engine
  • Plugin based architecture
  • Modeling Components
  • Assume both present component behavior and
    historical (normal) behavior have same
    probability distribution.
  • Ki square test to determine different probability
    distribution.

11
Recovery micro-reboot is not expensive
  • State Segregation
  • Store impt. state outside the application in
    database.
  • Persistent State
  • CMP (container managed persistence, J2EE) is a
    requirement for prototype.
  • Session State
  • Store in modified SSM(external session state
    store)
  • Containment and Reintegration
  • Microreboot transitive closure of all inter-EJB
    references
  • XML deployment descriptors to determine grouping
    for closure
  • Complete or micro reboot

12
Recovery
  • Enabling Micro reboot
  • Method in JBOSS EJB Container
  • Preserve Class Loader

13
Manage Recovery
  • Recovery Policy
  • Read failure report consider components gt 1.0
  • Micro-reboot(top n) or all gt1.0
  • Allow delay (30sec)
  • If error is present still try few time or reboot
    completely
  • Finally report it to sys admin

14
Evaluation Test Framework
  • Application
  • Petstore 1.1 (12 comp, 233 java file, 11K Loc)
  • Petstore 1.3.1(47 comp, 310 java file 10K Loc)
  • RUBiS (21 comp, 500 java file , 25K Loc)
  • Workload
  • Implement Simulators with Transition table.
  • 350 client (max utilization principle)
  • Faultload
  • Based on industry experience
  • No low level hardware or OS faults.

15
Evaluation Detection
  • Result similar to other detector
  • No discussion on absolute numbers?
  • Forced Java Runtime/Declared Exceptions, call
    emission and src code bug
  • 1 How well the fault was detected, 2how well
    major outage was detected ?

16
Evaluation Localization
  • Localization for a algorithm per fault type CIA
    gt 85
  • No absolute data again ?

17
Evaluation Recovery
  • Introduce faults in SSM-RUBiS.
  • Restart SSM-RUBiS or micro reboot component.
  • Observation from 10 trials per 350 concurrent
    client.

18
Full v/s Micro reboot
  • Injected a null reference fault in SB CommitBid,
    then a corrupt User-Item, SB BrowseCategories and
    SB CommitUserFeedback.
  • Microreboot maintains steady response.
  • 425 vs 3916 failed request
  • 61527 vs 56028 success request
  • What error condition did other trials had?

19
Total Recovery Time
  • Corrupt SB_ViewItem set it to NULL.
  • 19.4 sec TRT
  • 18.5 sec in analysis
  • Pinpoint is bottleneck in micro reboot.

20
Pinpoint is app generic ?
  • Upgrade to Petstore v.1.3.2
  • Works for the confidence interval
  • How different was the updated version??

21
Perfomance Overload
  • Results for 30min fault free run w/ 350 clients
  • In memory v/s Out memory (SSM)
  • Marshalling costs

22
Assumption
  • Well defined interface for components (.Net,J2ee)
  • Deterministic call path b/w component
  • No critical service request
  • Training data for statistical model
  • Guidelines (Crash Only Software)

23
Discussion
  • Overall one of the Good Papers maybe bit verbose
    in introduction !
  • Integrating framework for earlier work by Candea.
  • Limitation of the present statistical model.
  • Shared EJB state
  • Modify JIT, disable microreboots(ref, static var)
  • Application Global data not scrubbed.
  • Cost Benefit micro reboot v/s total reboot

24
Supplementary
  • Application server operating system for
    Internet applications (instantiates app
    components in containers, provides runtime system
    services, integrates with web server to make app
    webaccessible)
  • http//people.epfl.ch/george.candea
Write a Comment
User Comments (0)
About PowerShow.com