Preservation of Proximity Privacy in Publishing Numerical Sensitive Data - PowerPoint PPT Presentation

About This Presentation
Title:

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data

Description:

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 30
Provided by: hong112
Learn more at: http://www.cs.utsa.edu
Category:

less

Transcript and Presenter's Notes

Title: Preservation of Proximity Privacy in Publishing Numerical Sensitive Data


1
Preservation of Proximity Privacy in Publishing
Numerical Sensitive Data
  • J. Li, Y. Tao, and X. Xiao
  • SIGMOD 08
  • Presented by Hongwei Tian

2
Outline
  • What is PPDP
  • Existing Privacy Principles
  • Proximity Attack
  • (e, m)-anonymity
  • Determine e and m
  • Algorithm
  • Experiments and Conclusion

3
Privacy Preservation Data Publishing
  • A true story in Massachusetts, 1997
  • GIC
  • 20 dollars
  • Governor Weld

4
PPDP
  • Privacy
  • Sensitive information of individuals should be
    protected in the published data
  • More anonymized data
  • Utility
  • The published data should be useful
  • More accurate data

5
PPDP
  • Anonymization Technique
  • Generalization
  • Specific value -gt General value
  • Maintain the semantic meaning
  • 78256 -gt 7825, UTSA -gt University, 28 -gt 20,
    30
  • Perturbation
  • One value -gt another random value
  • Huge information loss -gt poor utility

6
PPDP
  • Example of Generalization

7
Some Existing Privacy Principles
  • Generalization
  • SA Categorical
  • k-anonymity
  • l-diversity, (a, k)-anonymity, m-invariance,
  • (c, k)-safety, Skyline-privacy
  • SA Numerical
  • (k, e)-anonymity, Variance Control
  • t-closeness
  • d-presence

8
Next
  • What is PPDP
  • Existing Privacy Principles
  • Proximity Attack
  • (e, m)-anonymity
  • Determine e and m
  • Algorithm
  • Experiments and Conclusion

9
Proximity Attack
10
(e, m)-anonymity
  • I(t)
  • private neighborhood of tuple t
  • I(t) t.SA - e, t.SA e
  • I(t) t.SA(1 - e), t.SA(1 e)
  • P(t)
  • the risk of proximity breach of tuple t
  • P(t) x / G

11
(e, m)-anonymity
  • e 20
  • I(t1) 980, 1020
  • x 3, G 4
  • P(t1) 3/4

12
(e, m)-anonymity
  • Principle
  • Given a real value e and an integer m 1, a
    generalized table T fulfills absolute (relative)
    (e,m)-anonymity, if
  • P(t) 1/m
  • for every tuple t ? T.
  • Larger e and m mean stricter privacy requirement

13
(e, m)-anonymity
  • What is the Meaning of m?
  • G m
  • The best situation is for any two tuples ti and
    tj in G, and
  • Similar to l-diversity when the equivalence class
    has l tuples with distinct SA values.

14
(e, m)-anonymity
  • How to make tj.SA does not fall in I(ti)?
  • All tuples in G are sorted in ascending order of
    their SA values
  • j i ? max left(tj,G), right(ti,G)

15
(e, m)-anonymity
  • Let maxsize(G)
  • max?t?G max left(t,G), right(t,G)
  • j i ? maxsize(G)

16
(e, m)-anonymity
  • Partitioning
  • Ascending order of tuples in G according to SA
    values
  • Hash the ith tuple into the jth bucket using
    function j (i mod maxsize(G))1
  • Thus, all tuples (SA values) in the same bucket
    do not fall into the neighborhood of each other.

17
(e, m)-anonymity
  • (6, 2)-anonymity
  • Privacy is breached
  • P(t3) ¾ gt1/m 1/2
  • Need partitioning
  • An ascending order is ready according to SA
    values
  • g maxsize(G) 2
  • j (i mod 2)1
  • New P(t3) 1/2

tupleNo QI SA
1 q 10
2 q 20
3 q 25
4 q 30
18
Determine e and m
  • Given e and m
  • Check if an equivalence class G satisfies (e,
    m)-anonymity
  • Theorem G has at least one (e, m)-anonymous
    generalization, iff
  • Scan the sorted tuples in G to find maxsize(G)
  • Predict whether G can be partitioned or not

19
Algorithm
  • Step 1 Splitting
  • Mondrain, ICDE 2006.
  • Splitting is only based on QI-attributes
  • Iteratively find median value of frequency sets
    on one selected QI-dimension to cut G into G1 and
    G2, and make sure G1 and G2 are legal to be
    partitioned.

20
Algorithm
  • Splitting ((6, 2)-anonymity)

10
40
20
25
50
30
21
Algorithm
  • Step 2 Partitioning
  • After step 1 stops
  • Check all G produced by splitting
  • Release directly if G satisfies (e, m)-anonymity
  • Otherwise, Partitioning, and then release new
    buckets

22
Algorithm
  • Partitioning ((6, 2)-anonymity)

10
40
20
25
50
30
23
Next
  • What is PPDP
  • Evolution of Privacy Preservation
  • Proximity Attack
  • (e, m)-anonymity
  • determine e and m
  • algorithm
  • Experiments and Conclusion

24
Experiments
  • Real Database SAL http//ipums.org
  • Attributes are Age, Birthplace, Occupation and
    Income with domains 16,93, 1,710, 1,983,
    and 1k, 100k, respectively.
  • 500K tuples
  • Compare to a perturbation method (OLAP, SIGMOD
    2005 )

25
Experiments - Utility
  • Use count query with workload 1000

26
Experiments - Utility
27
Experiments - Efficiency
28
Conclusion
  • Discuss most of existing privacy principles in
    PPDP
  • Identify the proximity attack and propose (e,
    m)-anonymity to prevent this attack
  • Verify that the method is effective and efficient
    experimentally

29
Any Question?
Write a Comment
User Comments (0)
About PowerShow.com