On information theory and association rule interestingness - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

On information theory and association rule interestingness

Description:

Suppose now we know that a player bets $10 on 'odd' each game, and on average wins ... The set of constraints generated by an association rule I J is defined as ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 23
Provided by: looki
Category:

less

Transcript and Presenter's Notes

Title: On information theory and association rule interestingness


1
On information theory and association rule
interestingness
  • Loo Kin Kong
  • 5th July, 2002

2
In my last DB talk...
  • I talked about some subjective approaches for
    finding interesting association rules
  • Subjective approaches require that a domain
    expert work on a huge set of mined rules
  • Some adopted another approach to find Optimal
    rules instead
  • Optimal according to some objective
    interestingness measure

3
Contents
  • Basic concepts on probability
  • Entropy and information
  • The maximum entropy method (MEM)
  • Paper review Pruning Redundant Association Rules
    Using Maximum Entropy Principle

4
Basic concepts
  • A finite probability space is a pair (S,P), in
    which
  • S is a finite non-empty set
  • P is a mapping PS ? 0,1, satisfying ?s?SP(s)
    1
  • Each s?S is called an event
  • P(s) is the probability of the event s
  • Ill also use ps to denote P(s) in this talk
  • A partition U is a collection of mutually
    exclusive events whose union equals S
  • Sometimes this is also known as a system of events

5
Basic concepts (contd)
  • The product U ? B of 2 partitions U Uai and B
    Ubj is defined as a partition whose elements are
    all intersections aibj of the elements of U and B
  • Graphically

b1
a1b1
a1
a2
a3
b2
b3
6
Self-information
  • The probability P(s) of an event s is a measure
    of our uncertainty that the event s would occur
  • If P(s) 0.999, s is almost certain to occur
  • If P(s) 0.1, quite reasonably we can believe
    that s would not occur
  • The self-information of s is defined as I(s)
    log P(s)
  • Note that
  • The smaller the value of P(s), the larger that of
    the I(s)
  • When P(s) 0, I(s) ?
  • Notion when something supposed to be very
    unlikely to happen really happens, it contains
    much information

7
Entropy
  • The measure of uncertainty that any event of a
    partition U would occur is called the entropy of
    the partitioning U
  • The entropy H(U) of a partition U is defined
    as H(U) p1 log p1 p2 log p2 pN log
    pN
  • Where p1, ... , pN are respectively the
    probabilities of events a1, ... , aN of U
  • Note that
  • Each term corresponds to the self-information of
    an event weighted by its probability
  • H(U) is maximum if p1 p2 ... pN 1/N

8
Conditional entropy
  • Let U a1, ... , aN, and B b1, ... , bM be
    2 partitions
  • The conditional entropy of U assuming bj
    is H(Ubj) - ? P(aibj) log P(aibj)
  • The conditional entropy of U assuming B is
    thus H(UB) ? P(bj) H(Ubj)
  • We can go on to show that H(U ? B) H(B)
    H(UB) H(U) H(BU)

9
Mutual information
  • Suppose that U, B are two partitions in S. The
    mutual information I(U,B) between U and B
    is I(U,B) H(U) H(B) H(U ? B)
  • Applying the equality H(U ? B) H(B) H(UB)
    H(U) H(BU)we get I(U,B) H(U) - H(UB)
    H(B) H(BU)

10
The maximum entropy method (MEM)
  • The MEM determines the probabilities pi of the
    events in a partition U, subject to various given
    constraints.
  • By MEM, when some of the pis are unknown, they
    must be chosen to maximize the entropy of U,
    subject to the given constraints.
  • Lets illustrate the MEM with an example.

11
Example rolling a die
  • Let p1 p6 denote the probability that the
    outcome of rolling the die is 1 6 respectively.
  • The entropy of this partitioning U is H(U) -
    p1 log p1 - - p6 log p6
  • If we have no information of the die
  • By MEM, we choose p1 p2 p6 1/6
  • Suppose now we know that a player bets 10 on
    odd each game, and on average wins 2 per game
  • p1 p3 p5 0.6 and p2 p4 p6 0.4
  • By MEM, we get p1p3p50.2 and p2p4p60.133

12
Paper review
  • S. Jaroszewicz and D.A. Simovici, Pruning
    Redundant Association Rules Using Maximum Entropy
    Principle
  • Published in PAKDD 02
  • Highlights
  • To identify a small and non-redundant set of
    interesting association rules that describes the
    data as completely as possible
  • The solution proposed in the paper uses the
    maximum entropy approach

13
Definitions
  • A constraint C is a pair C (I, p), where
  • I is an itemset
  • p?0,1 is the probability of I occurring in a
    transaction
  • The set of constraints generated by an
    association rule I ? J is defined as C(I ? J)
    (I, supp(I)), (I ? J, supp(I ? J))
  • A rule K ? J is a sub-rule of I ? J if K ? I

14
The active interestingness of a rule w.r.t. a set
of constraints
  • The active interestingness reflects the impact of
    adding the constraints generated by the rule to
    the current set of constraints
  • where D is some divergence function, and QC is
    the probability distribution induced by C
  • QC is obtained by using MEM. For simplicity, how
    QC is obtained is omitted here. It is proposed in
    the paper and a proof is available.

15
The passive interestingness of a rule w.r.t. a
set of constraints
  • The passive interestingness is the difference
    between the confidence estimated from the data
    and that estimated from the probability
    distribution induced by the constraints
  • where is the probability of X induced by
    C conf(I ? J) is the confidence of the rule I ?
    J

16
I-nonredundancy
  • A rule I ? J is considered I-nonredundant with
    respect to R, where R is a set of association
    rules, if
  • I ?, or
  • I(CI,J(R), I ? J) is larger than some threshold,
    where I() is either Iact() or Ipass(), CI,J(R) is
    the constraints induced by all sub-rules of I ? J
    in R

17
Pruning redundant association rules
  • Input A set R of association rules
  • For each singleton Ai in the database
  • Ri ? ? Ai
  • k 1
  • For each rule I ? Ai ? R, Ik, do
  • If I ? Ai is I-nonredundant w.r.t. Ri then
  • Ri Ri ? I ? Ai
  • k k1
  • Goto 4
  • R ??Ri

18
Results
  • An elderly people census dataset, with 300K
    tuples, was used in experiments.
  • When support threshold was set to 1, Apriori
    found 247476 association rules (without
    considering confidence of the rules).
  • The proposed algorithm trimmed the rule set to
    194 rules when interestingness threshold was set
    to 0.3. Running time was 4801s.
  • When interestingness threshold was lowered to
    0.1, the proposed algorithm trimmed the rule set
    to 2056 rules. Running time was 15480s.

19
Entropy and interestingness
  • Some common measures to rank the interestingness
    of association rules are based on entropy and
    mutual information. Examples include
  • Entropy gain
  • Gini Index

20
Conclusion
  • Entropy and mutual information are some tools
    that tell the uncertainty of some event(s).
  • The maximum entropy method (MEM) is an
    application of entropy, allowing us to make
    reasonable guesses on probabilities of events.
  • The MEM can be applied to prune uninteresting
    association rules.

21
References
  • R. J. Bayardo Jr. and R. Agrawal. Mining the Most
    Interesting Rules. Proc. KDD99, 1999.
  • D. Hankerson, G. A. Harris, P. D. Johnson, Jr.
    Introduction to Information Theory and
    Compression. CRC Press LLC, 1998.
  • S. Jaroszewicz and D.A. Simovici. A General
    Measure of Rule Interestingness. Proc. PKDD01,
    2001.
  • S. Jaroszewica and D.A. Simovici. Pruning
    Redundant Association rules Using Maximum Entropy
    Principle. Proc. PAKDD02, 2002.
  • A. Papoulis. Probability, Random Variables, and
    Stochastic Processes, Third Edition. McGraw Hill,
    1991.

22
Q A
Write a Comment
User Comments (0)
About PowerShow.com