cs412slides - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

cs412slides

Description:

Random access is expensive single scan algorithm (can only have one look) ... new point is within max-boundary, insert into the micro-cluster. o.w., create a new ... – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 58
Provided by: jiaw186
Category:

less

Transcript and Presenter's Notes

Title: cs412slides


1
(No Transcript)
2
D1. Data Mining in Data Stream and Sensor Networks
  • Mining data streams
  • Characteristics and challenges of data streams
  • Stream data cubing
  • Stream data clustering
  • Stream classification and anomaly analysis
  • Debugging sensor network systems by data mining

3
Characteristics of Data Streams
  • Data Streams
  • Data streamscontinuous, ordered, changing, fast,
    huge amount
  • Traditional DBMSdata stored in finite,
    persistent data sets
  • Characteristics of data streams
  • Huge volumes of continuous data, possibly
    infinite
  • Fast changing and requires fast, real-time
    response
  • Data stream captures nicely our data processing
    needs of today
  • Random access is expensivesingle scan algorithm
    (can only have one look)
  • Store only the summary of the data seen thus far
  • Most stream data are at pretty low-level or
    multi-dimensional in nature, needs multi-level
    and multi-dimensional processing

4
Stream Data Applications
  • Telecommunication calling records
  • Business credit card transaction flows
  • Network monitoring and traffic engineering
  • Financial market stock exchange
  • Engineering industrial processes power supply
    manufacturing
  • Sensor, monitoring surveillance video streams,
    RFIDs
  • Security monitoring
  • Web logs and Web page click streams
  • Massive data sets (even saved but random access
    is too expensive)

5
Architecture for Stream Query/Mining Processing
User/Application
SDMS (Stream Data Management System)
Results
Multiple streams
Stream Query Processor
Scratch Space (Main memory and/or Disk)
6
Challenges for Mining Dynamics in Data Streams
  • Most stream data are at pretty low-level or
    multi-dimensional in nature needs ML/MD
    processing
  • Analysis requirements
  • Multi-dimensional trends and unusual patterns
  • Capturing important changes at multi-dimensions/le
    vels
  • Fast, real-time detection and response
  • Comparing with data cube Similarity and
    differences
  • Stream (data) cube or stream OLAP Is this
    feasible?
  • Can we implement it efficiently?

7
D1. Data Mining in Data Stream and Sensor Networks
  • Mining data streams
  • Characteristics and challenges of data streams
  • Stream data cubing
  • Stream data clustering
  • Stream classification and anomaly analysis
  • Debugging sensor network systems by data mining

8
Multi-Dimensional Stream Analysis Examples
  • Analysis of Web click streams
  • Raw data at low levels seconds, web page
    addresses, user IP addresses,
  • Analysts want changes, trends, unusual patterns,
    at reasonable levels of details
  • E.g., Average clicking traffic in North America
    on sports in the last 15 minutes is 40 higher
    than that in the last 24 hours.
  • Analysis of power consumption streams
  • Raw data power consumption flow for every
    household, every minute
  • Patterns one may find average hourly power
    consumption surges up 30 for manufacturing
    companies in Chicago in the last 2 hours today
    than that of the same day a week ago

9
A Stream Cube Architecture
  • A tilted time frame
  • Different time granularities
  • second, minute, quarter, hour, day, week,
  • Critical layers
  • Minimum interest layer (m-layer)
  • Observation layer (o-layer)
  • User watches at o-layer and occasionally needs
    to drill-down down to m-layer
  • Partial materialization of stream cubes
  • Full materialization too space and time
    consuming
  • No materialization slow response at query time
  • Partial materialization what do we mean
    partial?

10
Cube A Lattice of Cuboids
time,item
time,item,location
time, item, location, supplier
11
Time Dimension A Titled Time Model
  • Natural tilted time frame
  • Example Minimal quarter, then 4 quarters ? 1
    hour, 24 hours ? day,
  • Logarithmic tilted time frame
  • Example Minimal 1 minute, then 1, 2, 4, 8, 16,
    32,

12
A Titled Time Model (2)
  • Pyramidal tilted time frame
  • Example Suppose there are 5 frames and each
    takes maximal 3 snapshots
  • Given a snapshot number N, if N mod 2d 0,
    insert into the frame number d. If there are
    more than 3 snapshots, kick out the oldest one.

13
Two Critical Layers in the Stream Cube
14
OLAP Operation and Cube Materialization
  • OLAP( Online Analytical Processing) operations
  • Roll up (drill-up) summarize data
  • by climbing up hierarchy or by dimension
    reduction
  • Drill down (roll down) reverse of roll-up
  • from higher level summary to lower level summary
    or detailed data, or introducing new dimensions
  • Slice and dice project and select
  • Pivot (rotate) reorient the cube, visualization,
    3D to series of 2D planes
  • Cube partial materialization
  • Store some pre-computed cuboids for fast online
    processing

15
D1. Data Mining in Data Stream and Sensor Networks
  • Mining data streams
  • Characteristics and challenges of data streams
  • Stream data cubing
  • Stream data clustering
  • Stream classification and anomaly analysis
  • Debugging sensor network systems by data mining

16
On-Line Partial Materialization vs. OLAP
Processing
  • On-line materialization
  • Materialization takes precious space and time
  • Only incremental materialization (with tilted
    time frame)
  • Only materialize cuboids of the critical
    layers?
  • Online computation may take too much time
  • Preferred solution
  • popular-path approach Materializing those along
    the popular drilling paths
  • H-tree structure Such cuboids can be computed
    and stored efficiently using the H-tree structure
  • Online aggregation vs. query-based computation
  • Online computing while streaming aggregating
    stream cubes
  • Query-based computation using computed cuboids

17
Stream Cube Structure From m-layer to o-layer
18
An H-Tree Cubing Structure
Minimal int. layer
19
Benefits of H-Tree and H-Cubing
  • H-tree and H-cubing
  • Developed for computing data cubes and ice-berg
    cubes
  • J. Han, J. Pei, G. Dong, and K. Wang, Efficient
    Computation of Iceberg Cubes with Complex
    Measures, SIGMOD'01
  • Fast cubing, space preserving in cube computation
  • Using H-tree for stream cubing
  • Space preserving
  • Intermediate aggregates can be computed
    incrementally and saved in tree nodes
  • Facilitate computing other cells and
    multi-dimensional analysis
  • H-tree with computed cells can be viewed as
    stream cube

20
D1. Data Mining in Data Stream and Sensor Networks
  • Mining data streams
  • Characteristics and challenges of data streams
  • Stream data cubing
  • Stream data clustering
  • Stream classification and anomaly analysis
  • Debugging sensor network systems by data mining

21

Stream Clustering A K-Median Approach
  • O'Callaghan et al., Streaming-Data Algorithms
    for High-Quality Clustering, (ICDE'02)
  • Base on the k-median method
  • Data stream points from metric space
  • Find k clusters in the stream s.t. the sum of
    distances from data points to their closest
    center is minimized
  • Constant factor approximation algorithm
  • In small space, a simple two step algorithm
  • For each set of M records, Si, find O(k) centers
    in S1, , Sl
  • Local clustering Assign each point in Si to its
    closest center
  • Let S be centers for S1, , Sl with each center
    weighted by number of points assigned to it
  • Cluster S to find k centers

22

Hierarchical Clustering Tree

level-(i1) medians
level-i medians
data points
23
Hierarchical Tree and Drawbacks
  • Method
  • maintain at most m level-i medians
  • On seeing m of them, generate O(k) level-(i1)
    medians of weight equal to the sum of the weights
    of the intermediate medians assigned to them
  • Drawbacks
  • Low quality for evolving data streams (register
    only k centers)
  • Limited functionality in discovering and
    exploring clusters over different portions of the
    stream over time

24
Clustering for Mining Stream Dynamics
  • Network intrusion detection one example
  • Detect bursts of activities or abrupt changes in
    real timeby on-line clustering
  • Our methodology (C. Agarwal, J. Han, J. Wang,
    P.S. Yu, VLDB03)
  • Tilted time frame work o.w. dynamic changes
    cannot be found
  • Micro-clustering better quality than
    k-means/k-median
  • incremental, online processing and maintenance)
  • Two stages micro-clustering and macro-clustering
  • With limited overhead to achieve high
    efficiency, scalability, quality of results and
    power of evolution/change detection

25
CluStream A Framework for Clustering Evolving
Data Streams
  • Design goal
  • High quality for clustering evolving data streams
    with greater functionality
  • While keep the stream mining requirement in mind
  • One-pass over the original stream data
  • Limited space usage and high efficiency
  • CluStream A framework for clustering evolving
    data streams
  • Divide the clustering process into online and
    offline components
  • Online component periodically stores summary
    statistics about the stream data
  • Offline component answers various user questions
    based on the stored summary statistics

26
BIRCH A Micro-Clustering Approach
Clustering Feature CF (N, LS, SS) where N
data points, LS , SS

27
The CluStream Framework
  • Micro-cluster
  • Statistical information about data locality
  • Temporal extension of the cluster-feature vector
  • Multi-dimensional points with time
    stamps
  • Each point contains d dimensions, i.e.,
  • A micro-cluster for n points is defined as a (2.d
    3) tuple
  • Pyramidal time frame
  • Decide at what moments the snapshots of the
    statistical information are stored away on disk

28
CluStream Pyramidal Time Frame
  • Pyramidal time frame
  • Snapshots of a set of micro-clusters are stored
    following the pyramidal pattern
  • They are stored at differing levels of
    granularity depending on the recency
  • Snapshots are classified into different orders
    varying from 1 to log(T)
  • The i-th order snapshots occur at intervals of ai
    where a 1
  • Only the last (a 1) snapshots are stored

29
CluStream Clustering On-line Streams
  • Online micro-cluster maintenance
  • Initial creation of q micro-clusters
  • q is usually significantly larger than the number
    of natural clusters
  • Online incremental update of micro-clusters
  • If new point is within max-boundary, insert into
    the micro-cluster
  • o.w., create a new cluster
  • May delete obsolete micro-cluster or merge two
    closest ones
  • Query-based macro-clustering
  • Based on a user-specified time-horizon h and the
    number of macro-clusters k, compute macroclusters
    using the k-means algorithm

30
D1. Data Mining in Data Stream and Sensor Networks
  • Mining data streams
  • Characteristics and challenges of data streams
  • Stream data cubing
  • Stream data clustering
  • Stream classification and anomaly analysis
  • Debugging sensor network systems by data mining

31
Stream Classification and Concept Drifts
  • Stream Classification
  • Construct a classification model based on past
    records
  • Use the model to predict labels for new data
  • Help decision making
  • Concept drifts
  • Define and analyze concept drifts in data streams
  • Show that expected error is not directly related
    to concept drifts
  • Classify data stream with skewed distribution
    (i.e., rare events)
  • Employ both sampling and ensemble techniques
  • Results indicate the proposed method reduces
    classification errors on the minority class

32
Concept Drifts
  • Changes in P(x, y) x-feature vector y-class label
    P(x,y) P(yx)P(x)
  • Four possibilities
  • No change P(yx), P(x) remain unchanged
  • Feature change only P(x) changes
  • Conditional change only P(yx) changes
  • Dual change both P(yx) and P(x) changes
  • Expected error
  • No matter how concept changes, the expected error
    could increase, decrease, or remain unchanged
  • Training on the most recent data could help
    reduce expected error

33
Issues in stream classification
  • Descriptive model vs. generative model
  • Generative models assume data follows some
    distribution while descriptive models make no
    assumptions
  • Distribution of stream data is unknown and may
    evolve, so descriptive model is better
  • Label prediction vs. probability estimation
  • Classify test examples into one class or estimate
    P(yx) for each y
  • Probability estimation is better
  • Stream applications may be stochastic (an example
    could be assigned to several classes with
    different probabilities)
  • Probability estimates provide confidence
    information and could be used in post processing

34
Mining Skewed Data Stream
  • Skewed distribution
  • Seen in many stream applications where positive
    examples are much less popular than the negative
    ones.
  • Credit card fraud detection, network intrusion
    detection
  • Existing stream classification methods
  • Evaluate their methods on data with balanced
    class distribution
  • Problems of these methods on skewed data
  • Tend to ignore positive examples due to the small
    number
  • The cost of misclassifying positive examples is
    usually huge, e.g., misclassifying credit card
    fraud as normal

35
Stream Ensemble Approach (1)
?

S1
S2
Sm
Sm1
Classification Model
Sm as training data? Positive examples not
sufficient!
36
Stream Ensemble Approach (2)
Sampling

S1
S2
Ensemble
Sm
?

C1
C2
Ck
37
Analysis
  • Error Reduction
  • Sampling
  • Ensemble
  • Efficiency Analysis
  • Single model
  • Ensemble
  • Ensemble is more efficient

38
Experiments(1)
  • Test on concept-drift streams

39
Experiments(2)
  • Test on real data

40
Experiments(3)
  • Model accuracy

41
Experiments (4)
  • Training time

42
Summary Stream Data Mining
  • Stream data mining A rich and on-going research
    field
  • Current research focus in database community
  • DSMS system architecture, continuous query
    processing, supporting mechanisms
  • Stream data mining and stream OLAP analysis
  • Powerful tools for finding general and unusual
    patterns
  • Effectiveness, efficiency and scalability lots
    of open problems
  • Our philosophy on stream data analysis and mining
  • A multi-dimensional stream analysis framework
  • Time is a special dimension Tilted time frame
  • What to compute and what to save?Critical layers
  • Partial materialization and precomputation
  • Mining dynamics of stream data

43
References on Stream Mining
  • Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang,
    Multi-Dimensional Regression Analysis of
    Time-Series Data Streams , Proc. 2002 Int. Conf.
    on Very Large Data Bases (VLDB'02), Hong Kong,
    China, Aug. 2002.
  • C. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu,
    Mining Frequent Patterns in Data Streams at
    Multiple Time Granularities, H. Kargupta, A.
    Joshi, K. Sivakumar, and Y. Yesha (eds.), Next
    Generation Data Mining, 2003.
  • H. Wang, W. Fan, P. S. Yu, and J. Han, Mining
    Concept-Drifting Data Streams using Ensemble
    Classifiers, Proc. 2003 ACM SIGKDD Int. Conf. on
    Knowledge Discovery and Data Mining (KDD'03),
    Washington, D.C., Aug. 2003.
  • C. Aggarwal, J. Han, J. Wang, and P. S. Yu, A
    Framework for Clustering Evolving Data Streams,
    Proc. 2003 Int. Conf. on Very Large Data Bases
    (VLDB'03), Berlin, Germany, Sept. 2003.
  • Y. D. Cai, D. Clutter, G. Pape, J. Han, M. Welge,
    and L. Auvil, MAIDS Mining Alarming Incidents
    from Data Streams, (system demonstration), Proc.
    2004 ACM-SIGMOD Int. Conf. Management of Data
    (SIGMOD'04), Paris, France, June 2004.
  • C. Aggarwal, J. Han, J. Wang, and P. S. Yu, On
    Demand Classification of Data Streams, Proc.
    2004 Int. Conf. on Knowledge Discovery and Data
    Mining (KDD'04), Seattle, WA, Aug. 2004.
  • C. Aggarwal, J. Han, J. Wang, and P. S. Yu,
    A Framework for Projected Clustering of High
    Dimensional Data Streams, Proc. 2004 Int. Conf.
    on Very Large Data Bases (VLDB'04), Toronto,
    Canada, Aug. 2004.
  • Jiawei Han, Yixin Chen, Guozhu Dong, Jian Pei,
    Benjamin W. Wah,Jianyong Wang, and Y. Dora Cai,
    Stream Cube An Architecturefor
    Multi-Dimensional Analysis of Data Streams,
    Distributed and Parallel Databases, 18(2)
    173-197, 2005.
  • J. Yang, X. Yan, J. Han, and W. Wang,
    Discovering Evolutionary Classifier over High
    Speed Non-static Stream, in S. Bandyopadhyay et
    al. (eds.), Advanced Methods for Knowledge
    Discovery from Complex Data, Springer Verlag,
    2005.
  • J. Yang, X. Yan, J. Han, and W. Wang,
    Discovering Evolutionary Classifier over High
    Speed Non-Static Stream, in S. Bandyopadhyay et
    al. (eds.), Advanced Methods for Knowledge
    Discovery from Complex Data, Springer Verlag,
    2005.
  • Hongyan Liu, Ying Lu, Jiawei Han, and Jun He,
    Error-Adaptive and Time-Aware Maintenance of
    Frequency Counts over Data Streams, in Proc.
    2006 Int. Conf. on Web-Age Information Management
    (WAIM'06), Hong Kong, China, June, 2006.
  • Jing Gao, Wei Fan, and Jiawei Han, A General
    Framework for Mining Concept-Drifting Data
    Streams with Skewed Distributions, in Proc. 2007
    SIAM Int. Conf. on Data Mining (SDM'07),
    Minneapolis, MN, April 2007.
  • Jing Gao, Wei Fan, and Jiawei Han, On
    Appropriate Assumptions to Mine Data Streams
    Analysis and Practice, Proc. 2007 Int. Conf. on
    Data Mining (ICDM'07), Omaha, NE, Oct. 2007.

44
D1. Data Mining in Data Stream and Sensor Networks
  • Mining data streams
  • Debugging sensor network systems by data mining
  • Software bug mining
  • Bug mining for sensor networks

45
Data Mining for Software Engineering and Computer
System Analysis
  • Software bug localization and failure proximity
  • C. Liu, Z. Lian, and J. Han, How Bayesians
    Debug?, ICDM'06.
  • Chao Liu and Jiawei Han, Failure Proximity A
    Fault Localization-Based Approach, FSE'06.
  • C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff,
    SOBER Statistical Model-based Bug
    Localization, FSE 2005.
  • Detection of software plagiarism
  • C. Liu, C. Chen, J.Han, and P.S. Yu, GPLAG
    Detection of Software Plagiarism by Procedure
    Dependency Graph Analysis, KDD'06.
  • Mining copy-paste bugs in operating systems
  • Z. Li, S. Lu, S. Myagmar and Y. Zhou. CP-Miner
    A Tool for Finding Copy-paste and Related Bugs in
    Operating System Code.  OSDI'04.

46
SOBER Bug Localization based on Classification
of Statistical Distribution of Statement Execution

Failing





O
Passing
O
O
O
O
O
O
O
O
O
  • C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff,
    SOBER Statistical Model-based Bug
    Localization, FSE 2005.

47
A Comparison with Other Approaches
48
Failure Clustering Based on Likely Fault Roots
  • Failure indexing
  • Identify failures likely due to the same bug

Y
  • C. Liu and J. Han, Failure Proximity A Fault
    Localization-Based Approach, FSE'06.

X
0
49
D1. Data Mining in Data Stream and Sensor Networks
  • Mining data streams
  • Debugging sensor network systems by data mining
  • Software bug mining
  • Bug mining for sensor networks

50
Challenges at Developing Robust Sensor Network
Systems
  • It is tricky and frustrating at developing robust
    sensor network systems
  • Bugs of networked sensor system are often cased
    by complex and interactions between multiple,
    often individually non-faulty components
  • Bugs are often not repeatable, particular
    sequences of events that invokes the bug may not
    be easy to reconstruct
  • Current status Most of the development time is
    at debugging and trouble shooting the current
    code ? greatly reduces productivity

51
DustMiner Troubleshooting Interactive Complexity
Bugs in Sensor Networks
  • Dustmine Mine sequences of events that may be
    responsible for faulty behavior, as opposed to
    localized bugs in one module
  • M. Khan, H. Le, H. Ahmadi, T. Abdelzaher, and J.
    Han, DustMiner Troubleshooting Interactive
    Complexity Bugs in Sensor Networks, Proc. 2008
    ACM Int. Conf. on Embedded Networked Sensor
    Systems (Sensys'08), Raleigh, NC, Nov. 2008
  • Architecture
  • Front-end collects runtime system logs being
    debugged
  • Offline backend frequent, discriminative pattern
    mining to uncover likely causes of failure

52
Major Difficulties of SNTS
  • Our (Terak Abdelzaher et al.s) previous sensor
    network debugging system, SNTS DCOSS, 2007,
    extracts conditions on current network state
    correlated with failure
  • Mining frequent patterns (occur frequently when
    the bugs manifest), however, the cause of a
    problem is often an infrequent pattern
  • DustMiner Automated discriminative sequence
    mining, containing two phases
  • Identifies frequent patterns that correlate to
    failures as before
  • Focuses on those patterns, correlating them with
    (infrequent) events that may have caused them,
    hence, uncovering the true root of the problem

53
Architecture of DustMiner
54
Preventing False Freq. Patterns Using Dynamic
Search Window
  • Two sample sequences, with different behaviors
  • S1 lta, b, c, d, a, b, c, dgt
  • S2 lta, b, c, d, a, c, b, dgt
  • The system fails when ltagt is followed by ltCgt
    before ltbgt
  • How to detect lt a, c, bgt as a discriminative
    pattern?
  • Using Apriori will not be able to detect it
  • Solution using dynamic search window scheme
  • Suppose the search window is 1, 4, 4, 8 in
    both sequences
  • Then lta, c, bgt will only be found at sequence S2
  • The dynamic search window scheme will also
    speed up the search significantly

55
Suppressing Redundant Subsequences
  • Two sample sequences, with different behaviors
  • S1 lta, b, c, d, a, b, c, dgt
  • S2 lta, b, c, d, a, b, d, cgt
  • The system has to have ltagt followed by ltcgt before
    ltdgt
  • E.g., ltenableRadiogt ltmessageSentgt, ltackRgt,
    ltdisableRadiogt
  • How to detect lta, d, cgt as an error pattern?
  • Using Apriori will not be able to detect it
  • Solution using sequence compression scheme
  • Remove sequence Si if it is a subsequence of Sj
    with same support
  • lta, b, dgt will be removed from S1 but retained in
    S2

56
Two-Stage Mining for Infrequent Events
  • In debugging, sometimes less frequent patterns
    could be more indicative
  • E.g., A singe node reboot event can cause a large
    number of message losses
  • Freq. pattern (FP) mining will miss the real
    cause!
  • Observation Much computation in sensor networks
    is recurrent
  • Two stage mining
  • Catch such recurrent symptoms (such as multiple
    subsequent message losses or false alarms) by FP
    mining
  • Narrow down the search space and correlated these
    symptoms with other less freq. preceding event
    occurrences

57
Experiment I LiteOSBug
  • Troubleshoot a simple data collection application
    where several sensors monitor light and report it
    to a sink node
  • Discriminative patterns found only on good logs
  • Discriminative patterns found only on bad logs

58
References on Trouble-Shooting in Software and
Networked Sensor Systems
  • M. Khan, H. Le, H. Ahmadi, T. Abdelzaher, and J.
    Han, DustMiner Troubleshooting Interactive
    Complexity Bugs in Sensor Networks, Proc. 2008
    ACM Int. Conf. on Embedded Networked Sensor
    Systems (Sensys'08), Raleigh, NC, Nov. 2008
  • M. K. Ramanathan, A. Grama, and S. Jagannathan,
    Path-Sensitive Inference of Function Precedence
    Protocols, ICSE 2007
  • Z. Li and Y. Zhou, PR-Miner Automatically
    Extracting Implicit Programming Rules and
    Detecting Violations in Large Software Code,
    ESEC/FSE 2005
  • B. Livshits and T. Zimmermann, DynaMine Finding
    Common Error Patterns by Mining Software Revision
    Histories, ESEC/FSE 2005
  • D. Andrzejewski, A. Mulhern, B. Liblit, and X.
    Zhu, Statistical Debugging Using Latent Topic
    Models, ECML 2007
  • B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M.
    I. Jordan, Scalable Statistical Bug Isolation,
    PLDI 2005
  • C. Liu, L. Fei, X. Yan, J. Han and S. Midkiff,
    Statistical Debugging A Hypothesis Testing-Based
    Approach, IEEE TSE 2006
  • C. Liu, Z. Lian and J. Han, How Bayesians Debug,
    ICDM 2006
  • C. Liu, X. Yan and J. Han, Mining Control Flow
    Abnormality for Logic Error Isolation, SDM 2006
  • C. Liu, X. Yan, L. Fei, J. Han and S.l Midkiff,
    SOBER Statistical Model-Based Bug Localization,
    ESEC/FSE 2005
  • C. Liu, X. Yan, H. Yu, J. Han and P. S. Yu,
    Mining Behavior Graphs for "Backtrace" of
    Noncrashing Bugs, SDM 2005
  • A. X. Zheng, M. I. Jordan, B. Liblit, M. Naik,
    and A. Aiken, Statistical Debugging Simultaneous
    Identification of Multiple Bugs, ICML 2006
  • I. Cohen, M. Goldszmidt, T. Kelly, J. Symons,
    Correlating instrumentation data to system
    states A building block for automated diagnosis
    and control, OSDI 2004.
  • J. Platt, E. Kiciman and D. Maltz, Fast
    Variational Inference for Large-scale Internet
    Diagnosis, NIPS 2007.
  • Rob Powers, Ira Cohen, and Moises Goldszmidt,
    "Short term performance forecasting in enterprise
    systems", KDD 2005.
  • Y.-M. Wang, C. Verbowski, J. Dunagan, Y. Chen, H.
    J. Wang, C. Yuan, and Z. Zhang, STRIDER A
    Black-box, State-based Approach to Change and
    Configuration Management and Support, Usenix
    LISA 2003.

59
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com