A SelfHealing Approach for Developing Complex Software Systems - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

A SelfHealing Approach for Developing Complex Software Systems

Description:

INVENTORY CONTROL APPS - PC. DPI/CPI. IC Batching. Inventory Adj/Count Correct ... Misc Accounting/Finance Apps - PC/NT. COBA (Corp office Budget Assistant) ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 31
Provided by: ibm76
Category:

less

Transcript and Presenter's Notes

Title: A SelfHealing Approach for Developing Complex Software Systems


1
The Shadows Project
  • A Self-Healing Approach for Developing Complex
    Software Systems

IBM Haifa Research Lab, Reliable
Systems Presented by Onn Shehory, Shadows
project coordinator
IBM Academy Conference, April 2006
2
Outline
  • Introduction
  • Technical overview
  • Organization
  • Shadows background technologies
  • ConTest Concurrency Testing
  • ATS Automated Threshold Setting
  • BCT Behavior Capture and Test
  • Contribution to standards
  • Summary

3
System Complexity
Actual Application Architecture for Consumer
Electronics Company
4
Shadows - Profile
  • Consortium formed to address challenges
    formulated by EU
  • EU 6th Framework RD Program, call no. 5
  • Strategic Objective 2.5.5 Software Services
  • Research proposal submitted to EU 9/2005
  • Members
  • IBM, Univ of Milan Bicocca, Univ of Potsdam,
    Univ of Brno, Artisys, Comverse Technologies, Net
    Technologies, Philips, Scapa Technologies,

Blue Technology Validator Green - Technology
Provider Pink Dissemination/Exploitation
5
Technical Overview
  • A paradigm for developing complex software
    systems with design-time and run-time
    self-healing (SH) capabilities
  • Goal mitigate the challenge of growing software
    complexity and its detrimental impact on software
    quality
  • Integration of several SH technologies across the
    system lifecycle
  • Mainly in middleware and applications

6
Shadows Technologies
  • The underlying set of SH technologies will
    include
  • Verification, and run-time amelioration, of
    Concurrent Systems (IBM)
  • Automatic Threshold Setting for Performance
    Management (IBM)
  • Behavioral Capture and Test (Univ Milan)
  • Formal Methods (Univ. Brno/Potsdam)

7
Technology Validation
  • The contribution of validators includes
  • Gap analysis
  • Requirement definitions
  • Technology evaluation
  • Validation environments include, e.g.
  • Real-time resource constrained embedded C
    software (Philips Nexpedia)
  • Server-side Java software for high-availability
    telco systems (Comverses MMS)
  • Avionics software (Artisys)

8
Methodology Flow
Requirements Definition
Analysis
SH-Oriented
Healing
Development
SH-Oriented
Assurance
Testing / Debugging
SH-Oriented
SH
System Deployment
System Design
9
Abstract Architecture
Integrated Model-Based Framework for Designing
and Managing Self-Healing Systems
System Design and Management Standards
Methodology and Tools
Concurrency Testing
Fault Prediction and Automatic Threshold Setting
Behavioral Capture and Test
Model-Based Technologies
Open Standards
CIM
TPTP
10
Shadows Solution Architecture
11
Background Technologies
  • ConTest Concurrency Testing
  • ATS Automated Threshold Setting
  • BCT Behavior Capture and Test

12
ConTest Testing Concurrent and Distributed
Applications
13
ConTest the Challenge
  • Finding bugs in parallel and multi-threaded
    software is challenging
  • Bugs depend on the program execution order
  • In lab environment only a small subset of
    possible execution orders occur
  • As a result, many problems/bugs are discovered
    only in the field

14
ConTest the Solution
  • ConTest runs existing tests multiple times
  • Using different scheduling orders created by
    ConTest.
  • ConTest increases the probability of revealing
    timing related bugs in Java programs
  • ConTest supports execution replay to reproduce
    the execution that caused the bugs
  • Replay and debugging aids to assist once a bug is
    found
  • Solution for Java done, C/C and C under
    development

15
ConTest Technology in Brief
16
ConTest Benefits and Future
  • Benefits
  • ConTest improves testing of concurrent and
    distributed applications for timing related bugs
    from early development stages
  • ConTest has minimal impact on the testing process
    and allows re-use of existing tests
  • Reduction of maintenance cost due to higher
    quality
  • Planned, or in the works
  • Automated fix of concurrency bugs
  • For some bug families, this already works

17
ATS Automated Threshold Setting
18
ATS Problem Statement
  • Given
  • A computer system, its components, applications
    running on the system
  • A service dependency of applications on
    components
  • When unknown must revert to correlation analysis
    (data mining, statistical)
  • Service-Level Objectives (SLOs) for the
    system/applications and indications of their
    violations
  • A monitoring infrastructure that
  • monitors operational parameters at the components
  • Generates/sends component alarms when
    measurements violate thresholds
  • Compute thresholds on operational values of each
    component metric, such that
  • Percentages of false alarms meet pre-specified
    levels
  • Adapt thresholds to changes in workload patterns,
    system configuration, and SLOs
  • The solution should be computationally efficient

19
ATS Motivation
  • In complex computer systems, manually-set
    thresholds are NOT
  • Indicative
  • Adaptive
  • Scalable
  • Sub-optimal and rigid performance management
  • Administrator overloaded
  • Automating threshold setting will allow more
    reliable use of component-level performance
    parameters and thresholds for system-level
    performance management

20
ATS Solution Approach
  • Use standard tools to measure operational
    parameters on components
  • Use SLOs set by administrators or policy
  • Automation of threshold computation procedure
  • Start with initial component level threshold
    values
  • Use histories
  • Of thresholds
  • Of SLO violations
  • Build a statistical model for PPV and NPV of the
    thresholds based on the SLO and threshold
    histories
  • Compute updated thresholds via the model to
    satisfy target PPV and NPV
  • Iterate the process to dynamically update the
    thresholds
  • Regular regression is inapplicable - we use
    logistic regression

21
ATS Status and plans
  • ATS algorithms formulated and successfully tested
    on a small laboratory system (2005)
  • Paper published and patent filed (2005)
  • Future versions will address large, complex
    systems
  • Multiple and compound SLOs
  • Suggest system reconfiguration to allow for
    better SLOs

22
BCT Behavior Capture and Test
23
Component-based software
  • Component reuse
  • Reduce costs
  • Increase productivity
  • Unexpected failures
  • Components areRobust and reliable, butDesigned
    without knowledge of the final system -gt
    integration problems
  • Integration testing problems
  • no source code
  • incomplete specifications

24
Integration problems
  • Inconsistent interpretation of parameters or
    values
  • Each component's interpretation may be
    reasonable, but incompatible (Martian lander,
    Sept. 1999)
  • Violations of value domains or of capacity or
    size limits
  • Implicit assumptions on ranges of values or sizes
  • Buffer overflow
  • Side effects on parameters or resources
  • Resources not explicitly mentioned in the
    interface
  • temporary files
  • Missing or misunderstood functionality
  • Underspecified functionality leads to incorrect
    assumptions
  • Hit counts

25
Verifying component-based systems
  • Testing
  • mutational analysis Ghosh, Mathur TOOLS 2000
  • Dynamic analysis
  • Only numeric data Raz, Koopman, Shaw ICSE 2002
  • Requires source code and focuses on data
    McCamant, Ernst ESEC/FSE 2003

26
Behavior Capture and Test (BCT)
  • Key idea
  • Integration analysis and test require information
    about components behavior
  • Extensive reuse of components produces a lot of
    information
  • Can we capture behavior information to test and
    analyze component integration?

27
BCT Main Steps
  • BCT
  • Capture Behavior Data
  • Monitor component execution
  • Capture run-time information
  • Distill Behavior Models
  • I/O models
  • Interaction models
  • Verify the Run-Time Behavior
  • Verify reused/replaced components with behavior
    models

28
Contribution to Standards
  • The Shadows project will be based on open
    standards for software lifecycle management
  • Enable true collaboration and interoperability
  • Faster adoption
  • Example the TPTP framework enabled by the
    Eclipse open-source standard IDE
  • Supports software modeling, testing, logging and
    profiling

29
Contribution to Standards cont
  • The Consortium seeks close and productive working
    connections with standards working groups
  • Potential Collaboration with DMTF
  • CIM enhancements and refinements
  • Automated Management models
  • Behavior and State models
  • Policy-Based Management
  • Self healing models

30
Summary
  • The Shadows initiative is an independent RD
    effort, which aims to improve state-of-art in
    system lifecycle and system management
  • Shadows will rely on its background technologies
  • Expand them to fix bugs of various types
  • Combine them
  • to cover a large variety of problems
  • for data sharing and mutual improvement
  • Shadows will build on open standards and
    influence them
  • The project entails collaboration with partners
    in Europe
  • Feedback Early Access Validation

31
Backup Material
Write a Comment
User Comments (0)
About PowerShow.com