Privacy in Cyberspace - PowerPoint PPT Presentation

Loading...

PPT – Privacy in Cyberspace PowerPoint presentation | free to download - id: 4a8f2-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Privacy in Cyberspace

Description:

HIPAA Privacy. Access To Medical Records. ... Notice of Privacy Practices. ... The privacy rule sets limits on how health plans and covered providers may use ... – PowerPoint PPT presentation

Number of Views:306
Avg rating:3.0/5.0
Slides: 44
Provided by: gc888
Learn more at: http://www.dartmouth.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Privacy in Cyberspace


1
Privacy in Cyberspace
  • George Cybenko
  • Thayer School of Engineering
  • Dartmouth College
  • gvc_at_dartmouth.edu
  • May 15, 2003

2
Acknowledgements
  • V.S. Subramanian, U Maryland
  • Ric Upton, ALPHATECH
  • My interest Privacy issues arising in DARPA
    Total Information Awareness Program
  • See http//www.darpa.mil/iao/TIASystems.htm
  • History of the subject Many problem statements,
    few technological or mathematical frameworks.
    Interest will grow with policy and
    national/homeland security needs.

3
Presentation Outline
  • Concepts of privacy
  • Government, workplace, consumer
  • Examples
  • Mathematical formulations and models
  • Suppression
  • Generalization
  • Logical Models
  • Conclusions

4
Concepts of Privacy
  • 100 years ago
  • People were anonymous 50 miles from home
  • but had no secrets from their neighbors
  • Today
  • Bankers in Singapore know all your financial
    information
  • but you dont know who your neighbors are!!!
  • Expectations and technology shape our concepts of
    privacy. Eg its necessity for democracy and
    human dignity.

5
Three aspects of privacy
  • Secrecy
  • limiting the dissemination of knowledge about
    oneself
  • Anonymity
  • protection from undesired attention
  • Solitude
  • maintain some minimal proximity to others

6
Downsides of Loss of Privacy
  • Identity theft
  • Fraud
  • Profiling
  • law enforcement
  • life and medical insurance
  • marketing spam
  • employment
  • political beliefs

7
Upsides of Loss of Privacy ?
  • More efficient commerce
  • Profiling
  • homeland security
  • lower crime rates
  • lower insurance premiums for some
  • targeted marketing
  • fewer workplace problems
  • more harmonious society

8
Privacy and Government
  • Amendment IV (United States)
  • The right of the people to be secure in their
    persons, houses, papers, and effects, against
    unreasonable searches and seizures, shall not be
    violated, and no warrants shall issue, but upon
    probable cause, supported by oath or affirmation,
    and particularly describing the place to be
    searched, and the persons or things to be seized.
  • Eg, US governments monitoring of domestic
    communications (phone, email, US mail) requires
    court order
  • Restrictions on US Census data publication

9
(No Transcript)
10
Privacy in the Workplace
  • no legal requirement to inform employees of
    which monitoring devices have been installed on
    their computer systems and which activities are
    subject to such monitoring. (in US)
  • Cyberethics, Richard A. Spinello, 2003, page 185
  • Even keystroke logging OK for example.
  • European laws are significantly stricter.

11
Privacy in Commerce
  • No US regulation on data aggregation
  • pending legislation to allow consumers to opt
    out
  • Great interest for targeted marketing
  • 14 billion catalogs were mailed with an average
    cost of .70 each. Ninety-eight percent of those
    catalog mailings did NOT result in a sale.
  • A recent Harvard Business Journal study indicated
    that a 5 increase in customer retention can
    generate an incremental gain in profitability of
    more than 25.
  • See www.abacus-direct.com

12
Health Insurance Portability and Accountability
Act of 1996 (HIPAA)
  • Improve the portability and continuity of health
    insurance coverage in the individual and group
    markets
  • Combat fraud and abuse in health insurance and
    healthcare delivery
  • Promote the use of medical savings accounts
  • Improve access to long-term care services and
    coverage
  • Simplify the administration of health insurance

13
HIPAA Privacy
  • Access To Medical Records. Patients generally
    should be able to see and obtain copies of their
    medical records
  • Notice of Privacy Practices. Covered health
    plans, doctors and other health care providers
    must provide a notice to their patients how they
    may use personal medical information and their
    rights under the new privacy regulation.
  • Limits on Use of Personal Medical Information.
    The privacy rule sets limits on how health plans
    and covered providers may use individually
    identifiable health information.
  • Prohibition on Marketing. The final privacy rule
    sets new restrictions and limits on the use of
    patient information for marketing purposes.
  • Stronger State Laws. The new federal privacy
    standards do not affect state laws that provide
    additional privacy protections for patients.
  • Confidential communications. Under the privacy
    rule, patients can request that their doctors,
    health plans and other covered entities take
    reasonable steps to ensure that their
    communications with the patient are confidential.
  • Complaints. Consumers may file a formal complaint
    regarding the privacy practices of a covered
    health plan or provider.

14
Privacy in Commerce Cookies and Web Bugs
Cookies Files that a web browser allows a web
site to write to your drive, readable only by the
same domain
Web Bug invisible image embedded into web pages
and web enabled email ltIMG SRC"http//click.myv
irtualdeals.com/sp/t.pl?id4564085245904o1"
BORDER0 WIDTH0 HEIGHT0gt
15
Possible Cookie Web Bug Scenario
Ron
16
Individual Privacy - California
  • Cal. Penal Code 631, 632 It is a crime in
    California to intercept or eavesdrop upon any
    confidential communication, including a telephone
    call or wire communication, without the consent
    of all parties.
  • However, a television network that used a hidden
    camera to videotape a conversation that took
    place at a business lunch meeting on a crowded
    outdoor patio of a public restaurant that did not
    include "secret" information did not violate the
    Penal Code's prohibition against eavesdropping
    because it was not a "confidential communication.
  • http//www.rcfp.org/taping/

17
Individual Privacy - Vermont
  • There is no legislation specifically addressing
    interception of communications in Vermont, but
    the state's highest court has held that
    surreptitious electronic monitoring of
    communications in a person's home is an unlawful
    invasion of privacy. Vermont v. Geraw, 795 A.2d
    1219 (Vt. 2002) Vermont v. Blow, 602 A.2d 552
    (Vt. 1991).
  • The state's highest court, however, also has
    refused to find the overhearing of a conversation
    in a parking lot unlawful because that
    conversation was "subject to the eyes and ears of
    passersby." Vermont v. Brooks, 601 A.2d 963 (Vt.
    1991).
  • http//www.rcfp.org/taping/

18
Examples of Privacy Leakage
  • Domestic/business trash - dumpster diving
  • Computer hard drive contents
  • Gateway currently has a promotion in which they
    will exchange a new computer for your old one
    what about the data?
  • Jan/Feb 2003 IEEE Security and Privacy article by
    Simson Garfinkel about used hard drive market
  • Thermal imaging
  • MS Office kill buffers in documents

19
End of Part I
  • Questions, Comments?
  • Next Mathematical and Computational Aspects

20
Mathematical Formulations of Privacy
  • Each individual is a virtual record

name birthdate spouse
ZIP medical gender
John Doe 05/12/1974 Jane Doe 03755
allergies M
Quasi-Identifier
This Quasi-Identifier is unique for 87 of US
citizens and can be cross-referenced with voter
registration lists. (L. Sweeney, see
http//www.heinz.cmu.edu/researchers/archive/00nov
.html) Definition A quasi-identifier can be
used to identify most individuals through
cross-referencing (joining) with some other known
databases.
21
Quasi-identifiers (Sweeney)
  • A subset of fields that can uniquely identify
    most individuals.
  • GEORGE V CYBENKO
  • 8 WHEELOCK STREET
  • ETNA NH
  • 03753
  • (603)643-1441 
  • 1953
  • GEORGE CYBENKO 
  • DOGFORD RD
  • ETNA NH
  • 03753 
  • (603)643-6269  
  • GEORGE CYBENKO 
  • PO BOX 48
  • HANOVER NH
  • 03755  
  • April 1953

Will the real George Cybenko please stand up? A
nontrivial problem for health care systems with
many clinics/ branches is to make sure all
records for an individual are aggregated. Not
adversarial !!
22
Quasi-identifiers
  • Problem statement
  • Do not publish data with quasi-identifiers
  • Assumptions
  • other data sets available for joining
  • minimal bound on allowed anonymity
  • Approaches
  • suppress data below minimal anonymity bound
  • generalize reported categories to achieve minimal
    anonymity bound

23
Quasi-identifiers
  • Who cares about this?
  • Health care providers (HIPAA)
  • Homeland Security (TIA)
  • Census Bureau
  • Financial institutions
  • etc

24
Roadmap
  • Detection
  • Does a possible privacy compromise exist?
  • Is a given protection scheme OK?
  • Protection
  • Suppress or generalize data
  • If existing suppression is not OK, how can it be
    extended?
  1. Protection via cell suppression (US Census,
    Denning)
  2. Detection of suppressed cell invariants
    (Gusfield, Kao)
  3. Protection via attribute generalization (Sweeney)
  4. Protection via nonlinear programming (Cybenko)
  5. A logical framework for privacy leakage with
    background knowledge (Subramanian and Cybenko)

25
Suppression in Cross-Tabulations
name birthdate spouse
ZIP medical gender
John Doe 05/12/1974 Jane Doe 03755
allergies M
Quasi-Identifier
03750 03751 03754 03755 Totals
1970 23 43 18 4 88
1971 12 25 13 10 60
1972 14 33 29 2 78
1973 12 14 3 8 37
Totals 61 115 63 24 263
Allergies by Age and Zipcode
26
Suppression in Cross-Tabulations (Gusfield, Kuo,
et al)
03750 03751 03754 03755 Totals
1970 23 (0,30) 43 (0,50) 18 (0,23) 4 (0,4) 88
1971 12 (12,25) 25 (0,30) 13 (0,15) 10 (0,15) 60
1972 14 (0,17) 33 (0,40) 29 (0,35) 2 (0,20) 78
1973 12 (0.20) 14 (0,30) 3 (0,23) 8 (0,12) 37
Totals 61 115 63 24 263
Allergies by Age and Zipcode
(a,b) is (lower, upper) bound on a cell value
27
Suppression in Cross-Tabulations (Gusfield, Kao,
et al)
03750 03751 03754 03755 Totals
1970 23 (0,30) 43 (0,50) 18 (0,23) 4 (0,4) 88
1971 12 (12,25) 25 (25,35) 13 (0,15) 10 (0,15) 60
1972 14 (0,17) 33 (0,40) 29 (0,35) 2 (0,2) 78
1973 12 (0.20) 14 (4,30) 3 (3,23) 8 (0,12) 37
Totals 61 115 63 24 263
(a,b) is (lower, upper) bound on a cell value, c
28
Gusfield, 1988 A cell is invariant iff its edge
is NOT contained in a traversable cycle.
03750 03751 03754 03755 Totals
1970 23 (0,30) 43 (0,50) 18 (0,23) 4 (0,4) 88
1971 12 (12,15) 25 (25,35) 13 (0,15) 10 (0,15) 60
1972 14 (0,17) 33 (0,40) 29 (0,35) 2 (0,2) 78
1973 12 (0.20) 14 (4,30) 3 (3,23) 8 (0,12) 37
Totals 61 115 63 24 263
(a,b) is (lower, upper) bound on a cell value, c
29
Discussion
  • Invariance of a cell can be determined by solving
    a linear program and determining if the solution
    is unique
  • m is number of suppressed cells
  • n is number of rows and columns
  • 2n upper and lower bound inequalities
  • L is max number of bits to represent cells and
    bounds
  • Linear programming requires O((mn)1.5 m L )
  • Graph algorithms for invariance O(mn)
  • Can be extended to determine invariance of linear
    combinations of suppressed cells (Kao)
  • Does not appear to scale to higher dimensional
    tables - graph construction patently 2D

30
Generalization in Cross-Tabulations
name birthdate spouse
ZIP medical gender
John Doe 05/12/1974 Jane Doe 03755
allergies M
Quasi-Identifier
03750 03751 03754 03755 Totals
1970 23 43 18 4 88
1971 12 25 13 10 60
1972 14 33 29 2 78
1973 12 14 3 8 37
Totals 61 115 63 24 263
03750 03751 03754-5 Totals
1970 23 43 22 88
1971 12 25 23 60
1972 14 33 31 78
1973 12 14 11 37
Totals 61 115 63 263
Requirement 6-anonymity
Solution generalize categories
31
K Anonymity (L. Sweeney)
  • Records have attributes A1, A2 , , An
  • Without loss of generality, attributes 1 to m are
    quasi-identifiers
  • A record has the k anonymity property if at least
    k-1 other records have the same quasi-identifier
    attribute values as that record
  • A collection of records is k anonymous if every
    record is k anonymous
  • Problem Given a collection of records, k gt 1,
    and the quasi-identifiers, produce a k anonymous
    derived set of records.

32
K Anonymity (L. Sweeney)
job birthdate married
ZIP allergies gender
professor 05/12/1974 Yes
03755 cats F
doctor 07/23/1974 Yes
61820 dogs M
painter 05/21/1974 No
61820 mold F
roofer 05/21/1974 Yes
03755 cats M
retired 05/11/1974 No
03755 cats M
This table is not 2 anonymous
Define Ai Ai1 lt Ai2 lt lt AiR a singleton
where Ai1 is generalized by Ai2 etc
Example 03755 is generalized by 0375, etc
33
job birthdate married
ZIP allergies gender
teacher 05/12/1974 Yes
03755 cats F
doctor 07/23/1974 Yes
61820 dogs M
painter 05/21/1974 No
61820 mold F
roofer 05/21/1974 Yes
03755 cats M
retired 05/11/1974 No
03755 cats M
teacher 05//1974 Yes
03755 cats F
doctor //1974 Yes
61820 dogs M
painter //1974 No
61820 mold F
roofer 05//1974 Yes
03755 cats M
retired 05//1974 No
03755 cats M
34
Sweeneys Datafly Algorithm
  • Note replacing Aij by Aij1 either leaves the
    number of non k anonymous records the same or
    decreases them
  • Algorithm
  • while there are records that are not k anonymous
  • select one the offending records
  • select one of the attributes, say the ith
  • replace Aij by Aij1
  • recompute records
  • end while

35
Comments on Sweeneys Datafly Algorithm
  • Ends when collection is k anonymous if there are
    more than k records because every terminal
    attribute set is a singleton
  • Solutions not unique in general
  • Can be augmented with a notion of distortion as
    follows
  • for every generalization, ie replacement of Aij
    by Aij1 increment the distortion counter by 1
  • then find a solution that minimizes the
    distortion counter
  • m Argus system (Hundepool and Willenborg) is
    proprietary and details of algorithm not
    published

36
Formulation of k-anonymity as an integer
nonlinear, convex optimization problem
  • ijks-th (0,1) variable, xijks , is i-th
    attribute, j-th generalization, k-th value within
    that generalization for the s-th record
  • ijks-th (0,1) coefficient, aijks , is 1 if the
    i-th attribute value of the s-th record in the
    j-th generalization is the k-th value in Aij
  • Example Ai1 03755, 03756, 61820, 61821,
    61825
  • Ai2 0375 , 6182
  • Ai3
  • record s (,61821,)
  • xi11s xi12s xi13s xi14s xi15s xi21s xi22s xi31s
    Sjk xijks 1 all i, s
  • ai11s ai12s ai13s ai14s ai15s ai21s ai22s ai31s
  • 0 0 0 1 0 0 1 1

Sjk aijks xijks 1 all i, s
37
Formulation of k-anonymity as an integer
nonlinear, convex optimization problem
  • Sjk xijks 1 all i, s (each record has
    exactly 1 generalization value for each
    attribute)
  • Sjk aijks xijks 1 all i, s (each record has
    a correct generalization value for each
    attribute)
  • Sr Pjjk xijks xijkr gt k for all s (at
    least k-1 records generalize identically to
    the s-th record)
  • Minimize Sijks j xijks (penalizes s for
    requiring j-th generalization on the i-th
    attribute)

38
Comments on this formulation
  • Very large number of variables
  • High degree of nonlinearity
  • Convex!!
  • Allows different levels of generalization within
    a single attribute for different records
  • Allows for different penalties/distortions easily
  • Algorithms Convex nonlinear LP, SA, GA

39
Logic Based Formulation of Privacy (Subramanian
and Cybenko)
  • Context a sequence of queries against multiple
    databases
  • Goal Knowledge of attribute values about an
    entity/person must not constitute a
    quasi-identifier as the queries are answered
  • Assumes a first order logic calculus for
    attribute properties and background knowledge

40
Logic Based Formulation of Privacy Key Elements
  • C is a collection of entities (citizens, eg)
  • Set of knowable attributes P pi (zip, SSN,
    eg)
  • Some attributes are identifiers
  • Predicate know( pi ) is true (in the context of
    an entity) if the value of pi is available to an
    analyst
  • Entity inference rules
  • R know( pi ) know( pj ) know( pk )
    know( p )
  • Background knowledge, BK, is a set of such
    inference rules, captures what is inferable
  • Note We are reasoning now about attributes in
    general, not specific values of attributes

41
Logic Based Formulation of Privacy
  • Current knowledge, CK, is a set, K( P ),
    associated with some unknown subset of C. Known
    by analyst.
  • I(CK) is the inferential closure of CK under
    application of BK
  • CK(0) CK
  • CK(n1) p (p e CK(n) ) or (R e BK such
    that R(CK(n)) p
  • I(CK) Ungt0 CK(n)
  • I(CK) is what an analyst currently knows from
    previous queries and could infer using BK
  • Goal I(CK) should not contain identifiers

42
Logic Based Formulation of Privacy
  • The next query will augment CK with new atoms,
    say p , so CK CK U p
  • Compute/update new inferential closure, I(CK)
  • If I(CK) contains identifiers
  • then suppress some of the p, until I(CK) is
    identifier free
  • else report p to analyst
  • Linear time algorithm for computing inferential
    closure (linear in number of attributes and
    background rules)
  • Requires some notion of distortion or value to
    determine which query results to suppress

43
Summary
  • Many mathematical aspects of privacy have not
    been addressed here
  • cryptography
  • PKI
  • biometrics
  • collusion among analysts, over time, etc
  • Mathematics and technology to ensure, strengthen,
    manage privacy will increase
  • Thank you!!!
About PowerShow.com