Mining of Frequent Patterns from Sensor Data PowerPoint PPT Presentation

presentation player overlay
1 / 46
About This Presentation
Transcript and Presenter's Notes

Title: Mining of Frequent Patterns from Sensor Data


1
Mining of Frequent Patterns from Sensor Data
  • Presented by Ivy Tong Suk Man
  • Supervisor Dr. B C M Kao
  • 20 August, 2003

2
Outline
  • Outline of the Presentation
  • Motivation
  • Problem Definition
  • Algorithm
  • Apriori with data transformation
  • Interval-List Apriori
  • Experimental Results
  • Conclusion

3
Motivation
  • Continuous items
  • reflect values from an entity that changes
    continuously in the external environment.
  • Update ? Change of state of the real entity
  • E.g. temperature reading data
  • Initial temperature 25ºC at t0s
  • Sequence of updates lttimestamp, new_tempgt
  • lt1s, 27ºCgt, lt5s, 28ºCgt, lt10s, 26ºCgt, lt14s,..gt
  • t0s to 1s, 25ºC
  • t1s to 5s, 27ºC
  • t5s to 10s, 28ºC
  • What is the average temperature from t0s to 10s?
  • Ans (25x127x428x5)/10 27.3ºC

4
Motivation
  • Time is a component in some applications
  • E.g. stock price quotes, network traffic data
  • Sensors are used to monitor some conditions,
    for example
  • Prices of stocks by getting quotations from a
    finance website
  • Weather measuring temperature, humidity, air
    pressure, wind, etc.
  • We want to find correlations of the readings
    among a set of sensors
  • Goal To mine association rules from sensor data

5
Challenges
  • How different is it from mining association rules
    from market basket data?
  • Time component
  • When searching for association rules in market
    basket data, time field is usually ignored as
    there is no temporal correlation between the
    transactions
  • Streaming data
  • Data arrives continuously, possibly infinitely,
    and in large volume

6
Notations
  • We have a set of sensors R r1,r2,,rm
  • Each sensor ri has a set of numerical states Vi
  • Assume binary states for all sensors
  • Vi 0,1 ?i s.t. ri ?R
  • Dataset D a sequence of updates of sensor state
    in the form of ltts, ri, vigt where ri ?R, vi ?Vi
  • ts timestamp of the update
  • ri sensor to be updated
  • vi new value of the state of ri
  • For sensors with binary states
  • update in form of ltts, rigt as the new state can
    be inferred by toggling the old state

7
Example
  • RA,B,C,D,E,F
  • Initial states all off
  • D
  • lt1,Agt
  • lt2,Bgt
  • lt4,Dgt
  • lt5,Agt
  • lt6,Egt
  • lt7,Fgt
  • lt8,Egt
  • lt10,Agt
  • lt11,Fgt
  • lt13,Cgt

A
t
0
1
5
10
B
t
2
C
t
13
D
t
4
t
E
6
8
F
t
7
11
8
More Notations
  • An association rule is a rule, satisfying certain
    support and confidence restrictions, in the
    form X ? Ywhere X?R, Y?R and X?Y?

9
More Notations
  • Association rule X ? Y has confidence c,
  • In c of the time when the sensors in X are ON
    (with state 1), the sensors in Y are ON
  • Association rule X ? Y has support s,
  • In s of the total length of history, the
    sensors in X and Y are ON

10
More Notations
  • TLS(X) denote Total LifeSpan of X
  • Total length of time that the sensors in X are ON
  • T total length of history
  • Sup(X) TLS(X)/T
  • Conf(X ? Y) Sup(X U Y) / Sup(X)
  • Example
  • T 15s
  • TLS(A)9, TLS(AB)8
  • Sup(A) 9/15 60
  • Sup(AB) 8/15 53
  • Conf(A-gtB) 8/9 89

A
t
0
1
5
10
B
t
2
11
Algorithm A
  • Transform Apriori
  • Transform the sequence of updates to the form of
    market basket data
  • At each point of update
  • take a snapshot of the states of all sensors
  • Output all sensors with stateon as a transaction
  • Attach
  • Weight(transaction)
  • Lifespan(this update)
  • timestamp(next update) timestamp(this
    update)

12
Algorithm A - Example
  • Initial states all off
  • D lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,
    Agt, lt11,Fgt,lt13,Cgt

A
t
0
1
5
10
B
t
2
Transformed database D
C
t
13
D
t
4
t
E
6
8
F
t
7
11
13
Algorithm A - Example
  • Initial states all off
  • D lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,
    Agt, lt11,Fgt,lt13,Cgt

A
t
0
1
5
10
B
t
2
Transformed database D
C
t
13
D
t
timestamp1
4
t
E
6
8
F
t
7
11
timestamp1
14
Algorithm A - Example
  • Initial states all off
  • D lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,
    Agt, lt11,Fgt,lt13,Cgt

A
t
0
1
5
10
B
t
2
Transformed database D
C
t
13
D
t
timestamp1
4
timestamp2
t
E
6
8
F
t
7
11
timestamp2
15
Algorithm A - Example
  • Initial states all off
  • D lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,
    Agt, lt11,Fgt,lt13,Cgt

A
t
0
1
5
10
B
t
2
Transformed database D
C
t
13
D
t
4
timestamp2
t
E
6
8
timestamp4
F
t
7
11
timestamp4
16
Algorithm A - Example
  • Initial states all off
  • D lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,
    Agt, lt11,Fgt,lt13,Cgt

A
t
0
1
5
10
B
t
2
Transformed database D
C
t
13
D
t
4
t
E
6
8
F
t
7
11
End of history 15s
timestamp13
17
Algorithm A - Example
  • Initial states all off
  • D lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,
    Agt, lt11,Fgt,lt13,Cgt

A
t
0
1
5
10
B
t
2
Transformed database D
C
t
13
D
t
4
t
E
6
8
F
t
7
11
18
Algorithm A
  • Apply Apriori on the transformed dataset D
  • Drawbacks
  • A lot of redundancy
  • Adjacent transactions may be very similar,
    differed by the one sensor with state update

19
Algorithm B
  • Interval-List Apriori
  • Uses an interval-list format
  • ltX, interval1, interval2, interval3, gt
  • where intervali is the interval in which all
    sensors in X are on.
  • TLS(X) ? (intervali.h intervali.l)
  • Example

A
t
0
1
5
10
ltA, 1,5), 10,15)gt TLS(A) (5-1) (15-10) 9
20
Algorithm B
  • Step 1
  • For each ri ?R,
  • build a list of interval in which ri is ON by
    scanning the sequence of updates
  • Calculate the TLS of each ri
  • If TLS(ri) ? min_sup, put ri into L1

21
Algorithm B Example
  • Initial states all off
  • D
  • lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,Agt,
    lt11,Fgt,lt13,Cgt
  • ltA, emptygt
  • ltB, emptygt
  • ltC, emptygt
  • ltD, emptygt
  • ltE, emptygt
  • ltF, emptygt

22
Algorithm B Example
  • Initial states all off
  • D
  • lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,Agt,
    lt11,Fgt,lt13,Cgt
  • ltA, 1,?)gt
  • ltB, emptygt
  • ltC, emptygt
  • ltD, emptygt
  • ltE, emptygt
  • ltF, emptygt

23
Algorithm B Example
  • Initial states all off
  • D
  • lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,Agt,
    lt11,Fgt,lt13,Cgt
  • ltA, 1,?)gt
  • ltB, 2,?)gt
  • ltC, emptygt
  • ltD, emptygt
  • ltE, emptygt
  • ltF, emptygt

24
Algorithm B Example
  • Initial states all off
  • D
  • lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,Agt,
    lt11,Fgt,lt13,Cgt
  • ltA, 1,5)gt
  • ltB, 2,?)gt
  • ltC, emptygt
  • ltD, 4,?)gt
  • ltE, emptygt
  • ltF, emptygt

25
Algorithm B Example
  • Initial states all off
  • D
  • lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,Agt,
    lt11,Fgt,lt13,Cgt
  • ltA, 1,5),10,?)gt
  • ltB, 2,?)gt
  • ltC, 13,?)gt
  • ltD, 4,?)gt
  • ltE, 6,8)gt
  • ltF, 7,11)gt

26
Algorithm B Example
  • Initial states all off
  • D
  • lt1,Agt,lt2,Bgt,lt4,Dgt,lt5,Agt, lt6,Egt,lt7,Fgt,lt8,Egt,lt10,Agt,
    lt11,Fgt,lt13,Cgt
  • ltA, 1,5),10,15)gt
  • ltB, 2,15)gt
  • ltC, 13,15)gt
  • ltD, 4,15)gt
  • ltE, 6,8)gt
  • ltF, 7,11)gt

End of history T 15s
27
Algorithm B
  • Step 2
  • Find all larger frequent sensor-sets
  • Similar to Apriori Frequent Itemst Property
  • Any subset of a frequent sensor-set must be
    frequent.
  • Method
  • Generate candidates of size i1 from frequent
    sensor-sets of size i.
  • Approach used join to obtain sensor-sets of size
    i1 if two size-i frequent sensor-sets agree on
    i-1
  • May also prune candidates who have subsets that
    are not large.
  • Count the support by merging (intersection of)
    the interval lists of the two size-i frequent
    sensor-sets
  • If sup ? min_sup, put into Li1
  • Repeat the process until the candidate set is
    empty

28
Algorithm B
  • Example
  • ltA, 1,5), 10,15)gt
  • ltB, 2,15)gt
  • ltAB, 2,5),10,15)gt

A
t
0
1
5
10
B
t
2
T15
29
Algorithm B (Example)
C
D
E
F
A
B
LS2
LS11
LS2
LS4
LS13
LS9
AB
AF
BF
BD
AD
LS1
LS4
LS11
LS6
LS8
ABD
Min support count 3
LS6
30
Algorithm B Candidate Generation
  • When generating a candidate sensor-set C of size
    i from two size i-1 sensor-sets LA and LB
    (subsets of C), we also construct the interval
    list of C by intersecting the interval lists of
    LA and LB.
  • Joining the two interval lists (of length m and
    n) is a key step in our algorithm
  • Use simple linear scan requires O(mn) time
  • There are i different size i-1 subset of C
  • which two to pick?

31
Algorithm B Candidate Generation
  • Method 1
  • Choose two lists with fewest no of intervals
  • gtStore no of intervals for each sensor-set
  • Method 2
  • Choose two lists with smallest count (TLS)
  • Intuitively shorter lifespan implies fewer
    intervals
  • Easier to implement
  • Have the lifespan when checking if the sensor-set
    is frequent

32
Experiments
  • Data generation
  • Stimulate data generated by a set of n binary
    sensors
  • Make use of a standard market basket data
  • With n sensors, each of which can be either on or
    off
  • gt2n possible combination of sensor states
  • Assign a probability to each of the combinations

33
Experiments Data Gen
  • How to assign the probabilities?
  • Let N be the no of occurrences of the transaction
    in the market basket that contains exactly only
    the sensors that are ON
  • E.g. Consider RA,B,C,D,E,F
  • Suppose we want to assign prob to the sensor
    state AC (only A and C are ON)
  • N is no of transactions that contain exactly only
    A and C
  • Assign prob N/D, where D is the size of the
    market basket dataset
  • Note Need sufficiently large market basket data
  • transactions that occur very infrequently will
    not be given ZERO probability

34
Experiments Data Gen
  • Generating sensor set data
  • Choose the initial state (at t0s)
  • Randomly
  • According to the probabilities assigned
  • Pick the combination with highest probability
    assigned
  • gt first sensor set states

35
Experiment Data Gen
  • What is the next set of sensor-set states?
  • For simplicity, in our model, only one sensor can
    be updated at a time
  • For any two adjacent updates, the sensor-set
    states at the two time instants are differed by
    only one sensor
  • gt change only one sensor state
  • gt n possible combinations by toggling each of
    the n sensor states
  • We normalize the probabilities of the n
    combinations by their sum
  • Pick the next set of sensor-set states according
    to the normalized probabilities
  • Inter-arrival time of updates exponential
    distribution

36
Experiments
  • Market Basket Dataset
  • 8,000,000 transactions
  • 100 items
  • number of maximal potentially large itemsets
    2000
  • average transaction length 10
  • average length of maximal large itemsets 4
  • length of the maximal large itemsets 11
  • minimum support 0.05
  • length of the maximal large itemsets ?
  • Algorithms
  • Apriori cached mode
  • IL-apriori
  • (a) random-join (IL-apriori)
  • (b) join-by-smallest lifespan (IL-apriori-S)
  • (c) join-by-fewest-no-of-intervals (IL-apriori-C)

37
Experiments - Results
  • Performance of algorithms (larger support)
  • All IL-apriori algorithms outperform cache apriori

38
Experiments - Results
  • Performance (lower support)
  • More candidates gt IL-apriori Expensive to join
    interval lists

39
Experiments - Results
  • More long frequent sensor-sets
  • Apriori has to match the candidates by search
    through the DB
  • IL-apriori-C and IL-apriori-S reduce a lot of
    time in joining the lists

40
Experiments - Results
  • Amounts of memory usage - peak memory usage
  • Cache apriori - store the whole database
  • IL-apriori store a lot of interval lists when
    no of candidates is growing large

41
Experiments Results Experiments - Results
(min_sup 0.02)
  • Apriori is faster in the first 3 passes
  • Running time for IL-apriori drops sharply after
  • Apriori has to scan over the whole database
  • IL-apriori (C/S) needs to join relatively short
    interval-lists in later passes

42
Experiments - Results
(min_sup 0.02)
  • Memory requirement for IL-apriori is a lot higher
    when there are more frequent sensor-set interval
    lists to join

43
Experiments - Results
(min_sup 0.05)
  • Runtime for all algorithms increases linearly
    with total number of transactions

44
Experiments - Results
(min_sup 0.05)
  • Memory required by all algorithms increases as no
    of transactions increases.
  • Rate of increase in IL-apriori is faster

45
Conclusions
  • Interval-list method to mine sensor data is
    described
  • Two interval list joining strategies are quite
    effective in reducing running time
  • Memory requirement is quite high
  • Future Work
  • Other methods for joining interval-lists
  • Trade-off between time and space
  • Extending to the streaming case
  • Consider approaches other than Lossy Counting
    Algorithms (Manku, and R. Motwani, VLDB02)

46
QA
Write a Comment
User Comments (0)
About PowerShow.com