Privacy-oriented Data Mining by Proof Checking - PowerPoint PPT Presentation

About This Presentation
Title:

Privacy-oriented Data Mining by Proof Checking

Description:

The TAMALE Group. 4 profs. Some 30 graduate students. Areas: machine learning, data mining, text mining, NLP, data warehousing. Research in ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 26
Provided by: Inv81
Category:

less

Transcript and Presenter's Notes

Title: Privacy-oriented Data Mining by Proof Checking


1
Privacy-oriented Data Mining by Proof Checking
  • Stan Matwin
  • (joint work with Amy Felty )
  • SITE
  • University of Ottawa, Canada
  • stan_at_site.uottawa.ca

2
The TAMALE Group
  • 4 profs
  • Some 30 graduate students
  • Areas machine learning, data mining, text
    mining, NLP, data warehousing
  • Research in
  • Inductive Logic Programming
  • Text mining
  • Learning in the presence of knowledge
  • Applications of ML/DM (e.g. in SE tools for
    maintenance personnel)

3
  • Why did I get into this research?
  • what is already being done and why it s not
    enough
  • the main idea
  • its operation
  • discussion correctness  
  • prototype - Coq and CIC
  • example
  • some technical challenges
  • acceptance?

4
Some useful concepts...
  • opting out vs opting in
  • Use Limitation Principle data should be used
    only for the explicit purpose for which it has
    been collected

5
and existing technical proposals
  • On the web P3P Platform for Privacy Preferences
  • W3C standard
  • XML specifications - on websites and in browsers
    - of what can be collected and for what purpose
    - ULP?
  • Handles cookies
  • Data exchange protocol more than privacy
    protocol no provisions for opting out after an
    initial opt-in
  • the ULP part is in NLnot verifiable

6
Agrawal s data perturbation transformations
  • data is perturbed by random distortion xi
    ? xi r
  • r uniform or gaussian
  • a procedure to reconstruct a PAC-esitimation of
    the original distribution (but not the values)
  • a procedure to build an accurate decision tree on
    the perturbed distribution

7
Agrawal s transformations contd
  • proposes a measure to quantify privacy estimate
    intervals and their size
  • lately extended to non-numerical attributes, and
    to association rules
  • does not address the ULP
  • how do we know it is applied?

8
the main idea towards a verifiable ULP
  • User sets permissions what can and cannot be
    done with her data
  • Any claim that a software respects these
    permissions is a proof of a theorem about the
    software
  • Verifying the claim is then checking that proof
    against the software

9
Who are the players?
  • User C
  • Data miner Org
  • Data mining software developer Dev
  • Independent verifier Veri
  • BUT no one owns the data D

10
D database scheme A given set of
database and data mining operations S
source code for A
PC(D,A) Cs permissions T(PC,S) theorem
that S respects PC R(PC,S) proof of T(PC,S) B
binary code of S
11
Discussion - properties
  • It can be proven that C s permissions are
    respected (or not) PC is in fact a verifiable
    ULP
  • PC can be negative (out) or positive (in)
  • proof construction needs to be done only once for
    a given PC ,D and A
  • Scheme is robust against cheating by Dev or Org

12
Acceptance issues
  • No Org will give Veri access to S
  • Too much overhead to check R(PC,S) for each
    task, and each user
  • Too cumbersome for C
  • Based on all Orgs buying in

13
Acceptance1Veris operation- access
  • Veri needs
  • PC from C
  • R(S, PC) from Dev
  • S from Dev
  • B from Org
  • Veri could check R(S, PC) at Devs
  • Veri needs to verify that S (belonging normally
    to Dev) corresponds to B that Org runs.

14
Acceptance2 overhead
  • Veri runs proof checking on a control basis
  • Orgs execution ovhd ?

15
Issues
  • Naming the fields XML or disclosure
  • restricted class of theorems for a given
    P-automating proof techniques for this class

16
Acceptance3 Cs perspective
  • Building PCs must be easy for C, based on D and
    processing schema initially a closed set?
  • permissions could be encoded on a credit card,
    smart card, in the electronic wallet
  • or in the CA they can then be dynamically
    modified and revoked

17
 Political   aspects who is Veri?
  • generally trusted
  •  consumer association ?
  •  Ralph Nader ?
  •  transparency international ?
  • IT expert at the level of instrumenting and
    running the proof checker connection to Open
    Software Foundation?
  • theorem proving can be cast as  better testing 

18
how to make Orgs buy in?
  • The first Org is needed to volunteer
  • a Green Data Mining logo will be granted and
    administered (verified) by Veri
  • other Orgs will have an incentive to join

19
Future work
  • Build the tools
  • expand the prototype
  • extend from Weka to commercial data mining
    packages
  • Integrate with P3P?
  • find a willing Org

20
(No Transcript)
21
Link between S and B
  • compilation not an option
  • watermarking solution B is watermaked by a
    slightly modified compiler with MD5(tar(S)) 128
    bytes
  • marks are inserted by a trusted
    makefile-and-compiler in locations in B given by
    Veri and unknown to Org

22
Link
  • Veri, given access to S, can verify that B
    corresponds to S
  • An attack by I requires hacking the compiler
  • An attack by Org requires knowing the locations
    of watermarks

23
Example
  • C restricts her Employee data from participating
    in a join with her Payroll data
  • Record Payroll Set
  • mkPayPID nat JoinInd bool Position
    string Salary nat.
  • Record Employee Set
  • mkEmpName string EID nat .
  • Record Combined Set
  • mkCombCID nat CName string Csalary
    nat .

24
  • Fixpoint Join Ps list Payroll (list Employee)
    ? (list Combined)
  • Es list Employee
  • Cases Ps of
  • nil ?(nil Combined)
  • (cons p ps) ?(app (check_JoinInd_and_find
    _employee_record p Es)
  • (Join ps Es))
  • end.
  • (check_JoinInd_and_find_employee_record p Es)
  • if a record is found in Es whose EID matches Ps
    PID and JoinInd permits Join, then a list of
    length 1 with the result of Join is returned,
    otherwise empty

25
  • Definition Pc
  • S((list Payroll)?(list Employee) ?(list
    Combined)) ? Prop
  • ? Pslist Payroll. ? Eslist Employee.
    (UniqueJoinInd Ps) ?
  • ? P Payroll.(In P Ps) ? ((JoinInd P)false ?
  • not ? CCombined ((In C (S Ps Es)) ? ((CID
    C)(PID P)))
  • PC(S) is written as (PC Join) Coq expands the
    definition of PC and provides the theorem
  • request to proof checking operator of Coq will
    check this proof i.e it will check that the user
    permissions are encoded into the Join program
    given
  • Whole proof 300 lines of Coq code proof
    checking 1 sec on a 600MHz machine
Write a Comment
User Comments (0)
About PowerShow.com