Analyzing System Logs: A New View of What's Important - PowerPoint PPT Presentation

About This Presentation
Title:

Analyzing System Logs: A New View of What's Important

Description:

Analyzing System Logs: A New View of What's Important. Sivan Sabato. Elad Yom-Tov. Aviad Tsherniak. Saharon Rosset. IBM Research ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 11
Provided by: hassan8
Category:

less

Transcript and Presenter's Notes

Title: Analyzing System Logs: A New View of What's Important


1
  • Analyzing System Logs A New View of What's
    Important
  • Sivan Sabato
  • Elad Yom-Tov
  • Aviad Tsherniak
  • Saharon Rosset
  • IBM Research
  • SysML07 (Second Workshop on Tackling Computer
    Systems Problems with Machine Learning Techniques
    )?
  • Presented By

2
Introduction
  • System logs is a critical tool for system
    administrators.
  • They are massive in amount
  • We need to rank them according to importance.
  • Previous work
  • Ranking using expert rules
  • Visualization
  • One machine log

3
What is Important?
  • This paper propose that an important message is
    the message appears in a probability higher than
    the expected.
  • Represent messages of the same type by one
    message type.
  • Calculate the empirical distribution of
    probabilities and rank them.
  • Systems are not homogeneous.

4
Algorithm
  • Using K-means clustering to divide system logs
    into classes.
  • Estimate the empirical distribution of each
    class.
  • Given a system log, identify a class and rank
    messages according to its P

5
Clustering
  • K-Means tries to minimize an objective function
  • JSum j Sum i d2(Xi, Zj)?
  • Inputs
  • Number of Clusters
  • Distance Matrix
  • Outputs
  • Membership matrix
  • Objective function value

Features
Patterns
Clusters
Patterns
6
Dimensionality Problem
  • The data was 3000 system log with 15,000 message
    type. However, it is sparse
  • Distance measurement using these 15,000 feature
    is computationally intensive.
  • Solution Dimensionality reduction

7
Feature Construction
  • Using Spearman Correlation between every two
    system logs
  • Corr(x,y) 1 (6 rx ry2)/(N(N-1))?
  • From k logs X n message types to k X k similarity
    matrix.
  • Question How to calculate rank vectors?

8
Evaluation
  • Compare Spearman Correlation to other feature
    construction schemes.
  • Histogram of Pairwise distance
  • Maximal Mutual Information
  • Improvement in Score

9
Comment
  • Future Work
  • Correlation based clustering
  • Feature extraction choice of distance measure
  • Bi-clustering
  • Fuzzy Clustering
  • Evaluation
  • Use of human expertise to evaluate the ranking.
  • Clustering index

10
  • Thank you!
  • Pros and Cons!
Write a Comment
User Comments (0)
About PowerShow.com