MAIDS Mining Alarming Incidents in Data Streams Implementation Discussion - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

MAIDS Mining Alarming Incidents in Data Streams Implementation Discussion

Description:

Example: Minimal: quarter, then 4 quarters 1 hour, 24 hours day, ... second, minute, quarter, hour, day, week, ... Targeted items ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 19
Provided by: jiaw186
Category:

less

Transcript and Presenter's Notes

Title: MAIDS Mining Alarming Incidents in Data Streams Implementation Discussion


1
MAIDS (Mining Alarming Incidents in Data
Streams)Implementation Discussion
  • MAIDS group
  • NCSA and Dept. of CS
  • University of Illinois at Urbana-Champaign
  • www.maids.ncsa.uiuc.edu

2
Implementation Essentials
  • Framework Titled time window
  • Extended FPgrowth for mining frequent patterns in
    data streams
  • Extended Naïve Bayes for mining classification
    models in data streams
  • Extended K-means by integration of
    micro-clustering and macro-clustering for cluster
    analysis in data streams
  • Extended H-tree cubing method for
    multi-dimensional query answering in data streams
  • Application developments and testing

3
Framework Titled Time Window (1)
  • Natural tilted time frame window
  • Example Minimal quarter, then 4 quarters ? 1
    hour, 24 hours ? day,
  • Logarithmic tilted time frame window
  • Example Minimal 1 minute, then 1, 2, 4, 8, 16,
    32,

4
Framework Titled Time Window (2)
  • Pyramidal tilted time frame window
  • Example Suppose there are 5 frames and each
    takes maximal 3 snapshots
  • Given a snapshot number N, if N mod 2d 0,
    insert into the frame number d. If there are
    more than 3 snapshots, kick out the oldest one.

5
Frequent Pattern Finder
  • Frequent Pattern Finder Tilted Window
    FPgrowth
  • A tilted time frame
  • Different time granularities (natural vs.
    pyramidal)
  • second, minute, quarter, hour, day, week,
  • Targeted items
  • User- or expert- selected items as targeted items
  • Trace targeted items and their combinations using
    FP-tree
  • FP-tree registers items with tilted time window
    information
  • Mining based on the extended FPgrowth algorithm
    on tilted window

6
FPGrowth (1) FP-Tree Construction
TID Items bought (ordered) frequent
items 100 f, a, c, d, g, i, m, p f, c, a, m,
p 200 a, b, c, f, l, m, o f, c, a, b,
m 300 b, f, h, j, o, w f, b 400 b, c,
k, s, p c, b, p 500 a, f, c, e, l, p, m,
n f, c, a, m, p
min_support 3
  • Scan DB once, find frequent 1-itemset
  • Sort frequent items in frequency descending
    order, f-list
  • Scan DB again, construct FP-tree

F-listf-c-a-b-m-p
7
FPGrowth (2) FP-Tree Mining
  • Start at the frequent item header table in the
    FP-tree
  • Traverse the FP-tree by following the link of
    each frequent item p
  • Accumulate all of transformed prefix paths of
    item p to form ps conditional pattern base

Conditional pattern bases item cond. pattern
base c f3 a fc3 b fca1, f1, c1 m fca2,
fcab1 p fcam2, cb1
8
FP-Tree with Tilted Window
  • With a fixed order of itemset (could be based on
    sampling, or alphabetic ordering)
  • Construct FP-tree while scanning data stream
  • Each node contains a tilted time frame for count
    accumulation
  • Add to the newest slot
  • Propagate when needed

F-listf-c-a-b-m-p
9
Mining Frequent Patterns in Dynamic Data Streams
  • Mining when a user submits mining queries
    (on-demand)
  • Mining on FPtree is done using FPgrowth on the
    data in the corresponding time windows
  • If mine freq. patterns in the last 30 minutes,
    then
  • If mine freq. patterns between 6am to 8am, then
  • We may compare what has been changed in the last
    24 hours (by comparing their frequent patterns,
    i.e., mining the current patterns, mining the
    patterns 24 hours ago, and then comparing them)

10
Classification for Dynamic Data Streams
  • Methodology Naïve Bayes Titled Time Windows
  • Tilted time framework as shown above (natural
    vs. pyramidal)
  • Instead of decision-trees, consider other models
    which do not changes drastically
  • Naïve Bayesian with boosting is a good approach
  • Major advantages
  • Store statistical information related to each
    variable
  • Model construction prediction
  • Incremental updating and dynamic maintenance
  • Advanced task Comparing of models to find changes

11
Bayesian Classification Why?
  • Probabilistic learning Calculate explicit
    probabilities for hypothesis, among the most
    practical approaches to certain types of learning
    problems
  • Incremental Each training example can
    incrementally increase/decrease the probability
    that a hypothesis is correct. Prior knowledge
    can be combined with observed data.
  • Probabilistic prediction Predict multiple
    hypotheses, weighted by their probabilities
  • Standard Even when Bayesian methods are
    computationally intractable, they can provide a
    standard of optimal decision making against which
    other methods can be measured

12
Bayesian Theorem Basics
  • Let X be a data sample whose class label is
    unknown
  • Let H be a hypothesis that X belongs to class C
  • For classification problems, determine P(H/X)
    the probability that the hypothesis holds given
    the observed data sample X
  • P(H) prior probability of hypothesis H (i.e. the
    initial probability before we observe any data,
    reflects the background knowledge)
  • P(X) probability that sample data is observed
  • P(XH) probability of observing the sample X,
    given that the hypothesis holds

13
Bayesian Theorem
  • Given training data X, posteriori probability of
    a hypothesis H, P(HX) follows the Bayes theorem
  • Informally, this can be written as
  • posterior likelihood x prior / evidence
  • MAP (maximum posteriori) hypothesis
  • Practical difficulty require initial knowledge
    of many probabilities, significant computational
    cost

14
Naïve Bayes Classifier
  • A simplified assumption attributes are
    conditionally independent
  • The product of occurrence of say 2 elements x1
    and x2, given the current class is C, is the
    product of the probabilities of each element
    taken separately, given the same class
    P(y1,y2,C) P(y1,C) P(y2,C)
  • No dependence relation between attributes
  • Greatly reduces the computation cost, only count
    the class distribution.
  • Once the probability P(XCi) is known, assign X
    to the class with maximum P(XCi)P(Ci)

15
Training dataset
Class C1buys_computer yes C2buys_computer
no Data sample X (agelt30, Incomemedium, Stud
entyes Credit_rating Fair)
16
Naïve Bayesian Classifier Example
  • Compute P(X/Ci) for each class
  • P(agelt30 buys_computeryes)
    2/90.222
  • P(agelt30 buys_computerno) 3/5 0.6
  • P(incomemedium buys_computeryes)
    4/9 0.444
  • P(incomemedium buys_computerno)
    2/5 0.4
  • P(studentyes buys_computeryes) 6/9
    0.667
  • P(studentyes buys_computerno)
    1/50.2
  • P(credit_ratingfair buys_computeryes)
    6/90.667
  • P(credit_ratingfair buys_computerno)
    2/50.4
  • X(agelt30 ,income medium, studentyes,credit_
    ratingfair)
  • P(XCi) P(Xbuys_computeryes) 0.222 x
    0.444 x 0.667 x 0.0.667 0.044
  • P(Xbuys_computerno) 0.6 x
    0.4 x 0.2 x 0.4 0.019
  • P(XCi)P(Ci ) P(Xbuys_computeryes)
    P(buys_computeryes)0.028
  • P(Xbuys_computeryes)
    P(buys_computeryes)0.007
  • X belongs to class buys_computeryes

17
Naïve Bayes for Data Streams
  • Store single variable statistics
    (Attribute-Value-ClassLabel AVC-list) in titled
    time windows
  • Incremental update based on count propagation in
    the titled time window
  • For computing accuracy, partition data into
    training set and testing set, and derive
    prediction accuracy similar to non-stream data
  • Boosting based on the testing data, put more
    weight on the data whose prediction is incorrect
  • Advanced task Comparing of models to find changes

18
Training and Testing for Data Streams
  • Two classes of models for prediction peer vs.
    future
  • We study how to predict future class
  • Take the data in the current window as testing
    data
  • Take the data in the previous windows as training
    set
  • Derive models based on different weighting scheme
    (e.g., uniform, linear decreasing, logarithmic
    decreasing, etc.)
  • Test and select the best model
  • Then based on this modeling scheme, construct
    model by including the current window data as new
    training set
  • To predict peer class
  • The training and test partition is along the same
    time framework
  • There is no retraining process

19
www.cs.uiuc.edu/hanj
  • Thank you !!!
Write a Comment
User Comments (0)
About PowerShow.com