Machine Learning in Performance Management - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Machine Learning in Performance Management

Description:

Classification naive Bayes (up to 87% accuracy) ... Lemma 2: Na ve Bayes is a 'good approximation' for 'almost-functional' dependencies ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 40
Provided by: ibm76
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning in Performance Management


1
Machine Learning in Performance Management
  • Irina Rish
  • IBM T.J. Watson Research Center
  • January 24, 2001

2
Outline
  • Introduction
  • Machine learning applications in Performance
    Management
  • Bayesian learning tools extending ABLE
  • Advancing theory
  • Summary and future directions

3
Learning problems examples
Pattern discovery, classification,
diagnosis and prediction
4
Approach Bayesian learning
Bayesian networks
Learn (probabilistic) dependency models
P(S)
P(BS)
P(CS)
Pattern classification P(classdata)?
P(XC,S)
P(DC,B)
Prediction P(symptomcause)?
Diagnosis P(causesymptom)?
Numerous important applications
  • Medicine
  • Stock market
  • Bio-informatics
  • eCommerce
  • Military

5
Outline
  • Introduction
  • Machine-learning applications in Performance
    Management
  • Transaction Recognition
  • In progress Event Mining
  • Probe Placement etc.
  • Bayesian learning tools extending ABLE
  • Advancing theory
  • Summary and future directions

6
End-User Transaction Recognition why is
it important?
?
End-User Transactions (EUT)
Remote Procedure Calls (RPCs)

Session (connection)
Server (Web, DB, Lotus Notes)
Client Workstation
Examples Lotus Notes, Web/eBusiness (on-line
stores, travel agencies, trading) database
transactions, buy/sell, search, email, etc.
  • Realistic workload models (for testing
    performance)
  • Resource management (anticipating requests)
  • Quantifying end-user perception of performance
    (response times)

7
Why is it hard? Why learn from data?
Example EUTs and RPCs in Lotus Notes
8
Our approach Classification Segmentation
(similar to text classification)
(similar to speech understanding,
image segmentation)
9
How to represent transactions? Feature
vectors
10
Classification scheme
11
Our classifier naïve Bayes (NB)
Simplifying (naïve) assumption feature
independence given class
2. Classification given
(unlabeled) instance , choose most
likely class
12
Classification results on Lotus CoC data
NB Bernoulli, mult. or geom.
NB shifted geom.
Accuracy
Baseline classifier Always selects
most- frequent transaction
Training set size
  • Significant improvement over baseline
    classifier (75)
  • NB is simple, efficient, and comparable to the
    state-of-the-art classifiers
  • SVM 85-87, Decision Tree 90-92
  • Best-fit distribution (shift. geom) - not
    necessarily best classifier! (?)

13
Transaction recognitionsegmentation
classification
Dynamic programming (Viterbi search)
(Recursive) DP equation
Naive Bayes classifier
14
Transaction recognition results
Accuracy
Training set size
  • Good EUT recognition accuracy 64 (harder
    problem than classification!)
  • Reversed order of results best classifier -
    not necessarily best recognizer! (?)

further research!
15
EUT recognition summary
  • A novel approach learning EUTs from RPCs
  • Patent, conference paper (AAAI-2000), prototype
    system
  • Successful results on Lotus Notes data (Lotus
    CoC)
  • Classification naive Bayes (up to 87 accuracy)
  • EUT recognition ViterbiBayes (up to 64
    accuracy)
  • Work in progress
  • Better feature selection (RPC subsequences?)
  • Selecting best classifier for segmentation task
  • Learning more sophisticated classifiers (Bayesian
    networks)
  • Information-theoretic approach to segmentation
    (MDL)

16
Outline
  • Introduction
  • Machine-learning applications in Performance
    Management
  • Transaction Recognition
  • In progress Event Mining
  • Probing Strategy
    etc.
  • Bayesian learning tools extending ABLE
  • Advancing theory
  • Summary and future directions

17
Event Mining analyzing system event sequences
Events from hosts
  • Example USAA data
  • 858 hosts, 136 event types
  • 67184 data points (13 days, by sec)
  • Event examples
  • High-severity events
  • 'Cisco_Link_Down,
  • 'chassisMinorAlarm_On, etc.
  • Low-severity events
  • 'tcpConnectClose, 'duplicate_ip, etc.

Time (sec)
18
1. Learning event dependency models
  • Current approach
  • learn dynamic probabilistic graphical models
  • (temporal, or dynamic Bayes nets)
  • Predict
  • time to failure
  • event co-occurrence
  • existence of hidden nodes root causes
  • Recognize sequence of high-level system states
  • unsupervised version of EUT recognition
    problem

???
Event1
EventM
Event N
Event2
Important issue incremental learning from data
streams
19
2. Clustering hosts by their history
Problematic hosts
Silent hosts
  • group hosts w/ similar event sequences what is
    appropriate similarity (distance) metric? One
    example
  • e.g., distance between compressed sequences
    event distribution models

20
Probing strategy (EPP)
  • Objectives find probe frequency F that minimizes
  • E (Tprobe-Tstart) - failure detection, or
  • E( total failure time total estimated
    failure time) -
  • gives accurate performance estimate
  • Constraints on additional load induced by probes
    L(F) lt MaxLoad

21
Outline
  • Introduction
  • Machine-learning applications in Performance
    Management
  • Bayesian learning tools extending ABLE
  • Advancing theory
  • Summary and future directions

22

ABLE Agent Building and Learning Environment
23
What is ABLE? What is my contribution?
  • A JAVA toolbox for building reasoning and
    learning agents
  • Provides visual environment, boolean and fuzzy
    rules, neural networks, genetic search
  • My contributions
  • naïve Bayes classifier (batch and incremental)
  • Discretization
  • Future releases
  • General Bayesian learning and inference tools
  • Available at
  • AlphaWorks www.alphaWorks.ibm.com/tech
  • Project page w3.rchland.ibm.com/projects/ABLE

24
How does it work?
25
Who is using Naïve Bayes tools?Impact on other
IBM projects
  • Video Character Recognition
  • (w/ C. Dorai)
  • Naïve Bayes 84 accuracy
  • Better than SVM on some pairs of characters
    (aver. SVM 87)
  • Current work combining Naïve Bayes with SVMs
  • Environmental data analysis
  • (w/ Yuan-Chi Chang)
  • Learning mortality rates using data on air
    pollutants
  • Naïve Bayes is currently being evaluated
  • Performance management
  • Event mining in progress
  • EUT recognition successful results

26
Outline
  • Introduction
  • Machine-learning in Performance Management
  • Bayesian learning tools extending ABLE
  • Advancing theory
  • analysis of naïve Bayes classifier
  • inference in Bayesian Networks
  • Summary and future directions

27
Why Naïve Bayes does well? And when?
Class-conditional feature independence
Unrealistic assumption! But why/when it works?
True
P(classf)
NB estimate

When independence assumptions
do not hurt classification?
28
Case 1 functional dependencies
Lemma 1 Naïve Bayes is optimal when features
are functionally dependent
given class
Proof
29
Case 2 almost-functional (low-entropy)
distributions
  • Lemma 2 Naïve Bayes is a good approximation
  • for almost-functional
    dependencies

Formally


  • Related practical examples
  • RPC occurrences in EUTs often almost-deterministi
    c (and NB does well)
  • Successful local inference in
    almost-deterministic Bayesian networks (Turbo
    coding, mini-buckets see DechterRish 2000)

30
Experimental results support theory
Random problem generator uniform P(class)
random P(fclass) 1. A randomly selected entry
in P(fclass) is assigned 2. The rest of entries
uniform random sampling normalization
  • Less noise (smaller )
  • gt NB closer to optimal

2. Feature dependence does NOT correlate with NB
error
31
Outline
  • Introduction
  • Machine-learning in Performance Management
  • Transaction Recognition
  • Event Mining
  • Bayesian learning tools extending ABLE
  • Advancing theory
  • analysis of naïve Bayes classifier
  • inference in Bayesian Networks
  • Summary and future directions

32
From Naïve Bayes to Bayesian Networks
Naïve Bayes model independent features given
class
Bayesian network (BN) model Any joint
probability distributions
P(S, C, B, X, D)
P(S) P(CS) P(BS) P(XC,S) P(DC,B)
Query P (lung canceryes smokingno,
dyspnoeayes ) ?
33
Example Printer Troubleshooting (Microsoft
Windows 95)
Heckerman, 95
34
How to use Bayesian networks?
Diagnosis P(causesymptom)?
Prediction P(symptomcause)?
MEU Decision-making (given utility function)
NP-complete inference problems
Approximate algorithms
35
Local approximation scheme Mini-buckets (paper
submitted to JACM)
  • Idea
  • reduce complexity of inference by ignoring
    some dependencies
  • Successfully used for approximating Most Probable
    Explanation
  • Very efficient on real-life (medical, decoding)
    and synthetic problems

Less noise gt higher accuracy similarly to
naïve Bayes!
Approximation accuracy
General theory needed Independence assumptions
and almost-deterministic distributions
noise
Potential impact efficient inference in complex
performance management models (e.g., event
mining, system dependence models)
36
Summary
  • Performance management
  • End-user transaction recognition (Lotus CoC)
  • novel method, patent, paper applied to Lotus
    Notes
  • In progress event mining (USAA), probing
    strategies (EPP)
  • Machine-learning tools (alphaWorks)
  • Extending ABLE w/ Bayesian classifier
  • Applying classifier to other IBM projects
  • Video character recognition
  • Environmental data analysis
  • Theory and algorithms
  • analysis of Naïve Bayes accuracy (Research
    Report)
  • approximate Bayesian inference (submitted
    paper)
  • patent on meta-learning

37
Future directions
Research interest
Automated learning and inference
Practical Problems
Theory
Generic tools
38
Collaborations
  • Transaction recognition
  • J. Hellerstein, T. Jayram (Watson)
  • Event Mining
  • J. Hellerstein, R. Vilalta, S. Ma, C. Perng
    (Watson)
  • ABLE
  • J. Bigus, R. Vilalta (Watson)
  • Video Character Recognition
  • C. Dorai (Watson)
  • MDL approach to segmentation
  • B. Dom (Almaden)
  • Approximate inference in Bayes nets
  • R. Dechter (UCI)
  • Meta-learning
  • R. Vilalta (Watson)
  • Environmental data analysis
  • Y. Chang (Watson)

39
Machine learning discussion group
  • Weekly seminars
  • 1130-230 (w/ lunch) in 1S-F40
  • Active group members
  • Mark Brodie, Vittorio Castelli, Joe Hellerstein,
    Daniel Oblinger,
  • Jayram Thathachar, Irina Rish (more people joint
    recently)
  • Agenda
  • discussions of recent ML papers, book chapters
  • (Pattern Classification by Duda, Hart, and
    Stork, 2000)
  • brain-storming sessions about particular ML
    topics
  • Recent discussions accuracy of Bayesian
    classifiers (naïve Bayes)
  • Web site
  • http//reswat4.research.ibm.com/projects/mlreading
    group/mlreadinggroup.nsf/main/toppage
Write a Comment
User Comments (0)
About PowerShow.com