SECURED OUTSOURCING OF FREQUENT ITEMSET MINING - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

SECURED OUTSOURCING OF FREQUENT ITEMSET MINING

Description:

SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Institute of Information Science, Academia Sinica – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 48
Provided by: hana163
Category:

less

Transcript and Presenter's Notes

Title: SECURED OUTSOURCING OF FREQUENT ITEMSET MINING


1
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING
  • Hana Chih-Hua Tai
  • Institute of Information Science, Academia Sinica

2
OUTLINE
  • Preliminary Frequent ItemSet Mining
  • Motivation
  • Privacy Model K-Support Anonymity
  • Algorithm
  • Performance Studies
  • Conclusion

3
OUTLINE
  • Preliminary Frequent ItemSet Mining
  • Motivation

4
FREQUENT ITEMSET MINING (FIM)
  • Discover what happened frequently

When threshold set as 3 (60), wine and
cigar are frequent. When threshold set as 2
(40), wine, cigar, tea, beer, wine,
cigar, and wine, beer are frequent.
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
5
FREQUENT ITEMSET MINING (FIM)
  • Discover what happened frequently
  • Frequent itemset mining (FIM)

When threshold set as 3 (60), wine and
cigar are frequent. When threshold set as 2
(40), wine, cigar, tea, beer, wine,
cigar, and wine, beer are frequent.
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
6
THE NEEDS OF OUTSOURCING FIM
  • For those who lack of expertise in FIM and/or
    computing resources, they have the need of
    outsourcing the mining tasks to a professional
    third party.

Data Owner
Mining Services Provider (Cloud Computing)
7
THE NEEDS OF OUTSOURCING FIM
  • For those who lack of expertise in FIM and/or
    computing resources, they have the need of
    outsourcing the mining tasks to a professional
    third party.

Data Owner
Privacy?!
Mining Services Provider (Cloud Computing)
8
THE RISKS OF OUTSOURCING FIM
  • Encryption/decryption method is believed as the
    possible solution.

Mining Services Provider (Cloud Computing)
9
THE RISKS OF OUTSOURCING FIM
  • Encryption/decryption method is believed as the
    possible solution.

How to achieve the encryption and decryption?
  • Privacy protected

Mining Services Provider (Cloud Computing)
  • Correct mining results
  • Reasonable overhead

10
THE RISKS OF OUTSOURCING FIM
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Trans. ID Items
1 a
2 a, c
3 c, d
4 a, b, c
5 a, b, d
Encrypt
11
THE RISKS OF OUTSOURCING FIM
  • Top frequency attack
  • Wine is the most frequent item ? a is wine
  • Approximate support attack
  • The support of cigar is about 5560 ?c is
    cigar

Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Trans. ID Items
1 a
2 a, c
3 c, d
4 a, b, c
5 a, b, d
Encrypt
12
THE RISKS OF OUTSOURCING FIM
The support information about the frequent
itemsets can be utilized to effectively reveal
the raw data as well as the sensitive information
from the anonymized transactions. T.
Mielikainen. Privacy problems with anonymized
transaction databases. In Proc. of Discovery
Science, 2004.
  • Top frequency attack
  • Wine is the most frequent item ? a is wine
  • Approximate support attack
  • The support of cigar is about 5560 ?c is
    cigar

The Risks of Outsourcing FIM
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Trans. ID Items
1 a
2 a, c
3 c, d
4 a, b, c
5 a, b, d
Encrypt
13
RELATED WORKS
  • Encrypt each real items by a one-many mapping
    function.
  • Wong, W. K., Cheung, D. W., Hung, E., Kao, B.,
    Mamoulis, N. Security in Outsourcing of
    Association Rule Mining. In Proc. of VLDB,
    2007.
  • However, it does not try to anonymize the support
    information.
  • Recently it is cracked.
  • Molloy, I., Li, N., Li, T. On the (In)Security
    and (Im)Practicality of Outsourcing Precise
    Association Rule Mining. In Proc. of ICDM,
    2009.

14
OUTLINE
  • Preliminary Frequent ItemSet Mining
  • Motivation
  • Privacy Model K-Support Anonymity

15
K-SUPPORT ANONYMITY ANONYMIZATION
  • For every sensitive item, there are at least k-1
    other items of the same support.
  • The probability of an item being correctly
    re-identified is limited to 1/k, even when the
    precise support information is known.
  • Given a transactional database T, encrypt T into
    E(T) such that
  • There exist a decryption function D such that
    MiningResult(T, ?) D(MiningResult(E(T), ?)), for
    any minimal support ?.
  • E(T) is k-support anonymous.

16
SOLUTION 1 A NAÏVE APPROACH
  • For each set of real items of the same support,
    add enough fake items randomly into transactions
    to make the fake items as frequent as real ones.

Items
a, e, g, h, i
a, c, e, f, h, i
c, d, e, f, g
a, b, c, f, h
a, b, d, e, f, g
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
For k 3, 16 additional items are required.
4 x 2 8 (e, f) for wine 3 x 2 6 (g, h) for
cigar 2 x 1 2 (i) for beer and tea
17
A NAÏVE SOLUTION
  • For each set of real items of the same support,
    add enough fake items randomly into transactions
    to make the fake items as frequent as real ones.

There could be too large storage overhead when k
is large.
Items
a, e, g, h, i
a, c, e, f, h, i
c, d, e, f, g
a, b, c, f, h
a, b, d, e, f, g
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
For k 3, 16 additional items are required.
4 x 2 8 (e, f) for wine 3 x 2 6 (g, h) for
cigar 2 x 1 2 (i) for beer and tea
18
GENERALIZED FIM
  • Discover all frequent items across concept
    levels, given a taxonomy indicating the
    hierarchical concepts between items

Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
When threshold set as 3 (60), wine, cigar,
alcoholic, beverage and all prod. are
frequent. beverage, cigar are also frequent.
19
GENERALIZED FIM
  • Discover all frequent items across concept
    levels, given a taxonomy indicating the
    hierarchical concepts between items

Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
When threshold set as 3 (60), wine, cigar,
alcoholic, beverage and all prod. are
frequent. beverage, cigar are also frequent.
1. The support of a parent node comes from the
supports of it child nodes. 2. Only lead
nodes need to appear in the transactions.
20
OUTLINE
  • Preliminary Frequent ItemSet Mining
  • Motivation
  • Privacy Model K-Support Anonymity
  • Algorithm

21
ANONYMIZATION OVERVIEW
  • For storage efficiency, we suggest to convert FIM
    to GFIM.

Data Owner
Third Party
Encrypt Transaction Data
Encrypted
Transaction Data
Pseudo Taxonomy
Pseudo Taxonomy Generation in the Encryption
Generalized Frequent Itemset Mining
Frequent Itemsets
Decrypt Frequent Itemsets
22
ANONYMIZATION STORAGE EFFICIENCY
  • In GFIM, items can be at multiple levels of a
    taxonomy and only the items at leaf level need to
    appear in the database.

Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Encrypt with k3
Trans. ID Items
1 c, d, g
2 b, d, g
3 b, h
4 a, b, c
5 a, c, d, h
4 additional items required
23
ANONYMIZATION STORAGE EFFICIENCY
  • In GFIM, items can be at multiple levels of a
    taxonomy and only the items at leaf level need to
    appear in the database.

Small storage overhead compared to the naïve
method.
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Encrypt with k3
Trans. ID Items
1 c, d, g
2 b, d, g
3 b, h
4 a, b, c
5 a, c, d, h
4 additional items required
24
ANONYMIZATION EASY DECRYPTION
  • The real frequent itemsets can be obtained by
    filtering out patterns containing any fake item
    in 1 scan of the returned results.

Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Trans. ID Items
1 c, d, g
2 b, d, g
3 b, h
4 a, b, c
5 a, c, d, h
min_sup 2
Results beer, cigar, wine, tea,
beer, wine, cigar, wine
Results a, b, c, d, e, f, g, h, i, j, k, ac,
af, bf, ce,
25
ANONYMIZATION EASY DECRYPTION
  • The real frequent itemsets can be obtained by
    filtering out patterns containing any fake item
    in 1 scan of the returned results.

The data owner can obtain the real results in 1
scan of the returned itemsets.
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Trans. ID Items
1 c, d, g
2 b, d, g
3 b, h
4 a, b, c
5 a, c, d, h
min_sup 2
Results beer, cigar, wine, tea,
beer, wine, cigar, wine
Results a, b, c, d, e, f, g, h, i, j, k, ac,
af, bf, ce,
26
ANONYMIZATION ENCRYPTION
The problem is how to build the taxonomy and
encrypt T for k-support anonymity.
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Trans. ID Items
1 c, d, g
2 b, d, g
3 b, h
4 a, b, c
5 a, c, d, h
Encrypt with k3
27
ANONYMIZATION ENCRYPTION
  • 1 Generalization of the Mining Task
  • To generate a pseudo taxonomy that can
  • (a) conserve the correct and complete mining
    results,
  • (b) facilitate k-support anonymization.
  • 2 Anonymization with Taxonomy Tree
  • To encrypt T for k-support anonymity with the
    help of the constructed taxonomy tree.

28
1 GENERALIZATION OF THE MINING TASK
  • Build a k-bud tree of T
  • All real items at the leaf level
  • The number of nodes in three categories is equal
    to or greater than k
  • Let xM denote the most frequent real item in T
  • Agt v sup(v) gt sup(xM) and v is leaf,
  • A v sup(v) sup(xM), and
  • Alt v sup(v) lt sup(xM) lt sup(u), where u is
    the parent node of v .

Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
3-bud tree
29
1 GENERALIZATION OF THE MINING TASK
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
3 groups
beer cigar
wine
tea
30
1 GENERALIZATION OF THE MINING TASK
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
3 subtrees
4
2
4
2
3
(beer)
(wine)
(cigar)
(tea)
31
1 GENERALIZATION OF THE MINING TASK
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
32
1 GENERALIZATION OF THE MINING TASK
Trans. ID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
3 bud-tree
33
2 ANONYMIZATION WITH TAXONOMY TREE
  • Alternate k-bud tree and modify T simultaneously
    to achieve k-support anonymity
  • Insertion
  • Split
  • Increase

34
2 ANONYMIZATION WITH TAXONOMY TREE
  • Alternate k-bud tree and modify T simultaneously
    to achieve k-support anonymity
  • Insertion (Ex.)
  • Split
  • Increase

sup(v) lt target-sup lt sup(u)
p the node with target support q randomly
select sup(p) sup(v) transactions from T(u)
T(v) T(x) is the set of transactions containing
the item x.
sup(u) and sup(v) should not be changed.
35
2 ANONYMIZATION WITH TAXONOMY TREE
For wine
TID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Items
wine, p1
cigar, wine, p1
cigar, tea
beer, cigar, wine
beer, tea, wine
y
x
p1
3-bud tree
insertion
36
2 ANONYMIZATION WITH TAXONOMY TREE
  • Alternate k-bud tree and modify T simultaneously
    to achieve k-support anonymity
  • Insertion
  • Split (Ex.)
  • Increase

target-sup lt sup(v)
p randomly select target-sup transactions from
T(v) q T(p) T(v) T(q) T(x) is the set of
transactions containing the item x.
sup(v) should not be change.
Split operation can raise up leaf nodes to
internal nodes!
37
2 ANONYMIZATION WITH TAXONOMY TREE
For wine
For cigar
TID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Items
wine, p1
cigar, wine, p1
cigar, tea
beer, cigar, wine
beer, tea, wine
Items
p1, p2
cigar, p1, p3
cigar, tea
beer, cigar, p2
beer, tea, p2
y
x
p1
p2
p3
3-bud tree
insertion
split
38
2 ANONYMIZATION WITH TAXONOMY TREE
  • Alternate k-bud tree and modify T simultaneously
    to achieve k-support anonymity
  • Insertion
  • Split
  • Increase (Ex.)

randomly select target-sup sup(v) transactions
from T(u) T(v)
sup(v) lt target-sup
sup(v) should not be changed. So, Increase
operation is applicable only on node that does
not belong to any anonymous group!
39
2 ANONYMIZATION WITH TAXONOMY TREE
For wine
For cigar
For cigar
TID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
Items
wine, p1
cigar, wine, p1
cigar, tea
beer, cigar, wine
beer, tea, wine
Items
p1, p2
cigar, p1, p3
cigar, tea
beer, cigar, p2
beer, tea, p2
Items
p1, p2, p3
cigar, p1, p3
cigar, tea
beer, cigar, p2
beer, tea, p2, p3
y
x
p1
p2
p3
p3
3-bud tree
insertion
split
increase
40
2 ANONYMIZATION WITH TAXONOMY TREE
3-support anonymity
For wine
For cigar
For cigar
TID Items
1 wine
2 cigar, wine
3 cigar, tea
4 beer, cigar, wine
5 beer, tea, wine
TID Items
1 c, d, g
2 b, d, g
3 b, h
4 a, b, c
5 a, c, d, h
Items
wine, p1
cigar, wine, p1
cigar, tea
beer, cigar, wine
beer, tea, wine
Items
p1, p2
cigar, p1, p3
cigar, tea
beer, cigar, p2
beer, tea, p2
Items
p1, p2, p3
cigar, p1, p3
cigar, tea
beer, cigar, p2
beer, tea, p2, p3
y
x
p1
p2
p3
p3
3-bud tree
insertion
split
increase
41
OUTLINE
  • Preliminary Frequent ItemSet Mining
  • Motivation
  • Privacy Model K-Support Anonymity
  • Algorithm
  • Performance Studies
  • Conclusion

42
PERFORMANCE STUDIES
  • Data sets
  • Retail dataset
  • 88162 transactions with 2117 different items
  • T10I1kD100k dataset
  • 100k transactions with 1000 different items
  • Security
  • Against precise item support attacks
  • Against precise itemset support attacks
  • Storage overhead
  • Execution efficiency

43
SECURITY
  • Against precise item support attacks
  • Item accuracy The ratio of items being
    re-identified
  • DB accuracy The avg. ratio of items in a
    transaction being re-identified

43
(a) Retail dataset
(b) T10I1kD100k dataset
44
SECURITY
  • Against precise itemset support attacks
  • Item accuracy The ratio of items being
    re-identified
  • DB accuracy The avg. ratio of items in a
    transaction being re-identified

44
(a) Retail dataset
(b) T10I1kD100k dataset
45
STORAGE OVERHEAD
EXECUTION EFFICIENCY
46
SUMMARY
  • We proposed k-support anonymity to enhance the
    privacy protection in outsourcing of frequent
    itemset mining (FIM).
  • For storage efficiency, we transformed FIM to
    GFIM, and proposed a taxonomy-based anonymization
    algorithm.
  • Our method allows the data owner to obtain the
    real frequent itemsets in 1 scan of the returned
    results.
  • Experimental results on both real and synthetic
    data sets showed that our method can achieve very
    good privacy protection with moderate storage
    overhead.

47
Q A
Write a Comment
User Comments (0)
About PowerShow.com