Privacy - PowerPoint PPT Presentation


PPT – Privacy PowerPoint presentation | free to view - id: 4a902-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation



Privacy is about a patient determining what patient/medical information the ... Data Mining as a Threat to Privacy ... Some Privacy Problems and Potential Solutions ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 28
Provided by: chrisc8
Tags: privacy


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Privacy

  • Prof. Bhavani Thuraisingham
  • The University of Texas at Dallas
  • March 5, 2008
  • Lecture 18

What is Privacy
  • Medical Community
  • Privacy is about a patient determining what
    patient/medical information the doctor should be
    released about him/her
  • Financial community
  • A bank customer determine what financial
    information the bank should release about him/her
  • Government community
  • FBI would collect information about US citizens.
    However FBI determines what information about a
    US citizen it can release to say the CIA

Some Privacy concerns
  • Medical and Healthcare
  • Employers, marketers, or others knowing of
    private medical concerns
  • Security
  • Allowing access to individuals travel and
    spending data
  • Allowing access to web surfing behavior
  • Marketing, Sales, and Finance
  • Allowing access to individuals purchases

Data Mining as a Threat to Privacy
  • Data mining gives us facts that are not obvious
    to human analysts of the data
  • Can general trends across individuals be
    determined without revealing information about
  • Possible threats
  • Combine collections of data and infer information
    that is private
  • Disease information from prescription data
  • Military Action from Pizza delivery to pentagon
  • Need to protect the associations and correlations
    between the data that are sensitive or private

Some Privacy Problems and Potential Solutions
  • Problem Privacy violations that result due to
    data mining
  • Potential solution Privacy-preserving data
  • Problem Privacy violations that result due to
    the Inference problem
  • Inference is the process of deducing sensitive
    information from the legitimate responses
    received to user queries
  • Potential solution Privacy Constraint Processing
  • Problem Privacy violations due to un-encrypted
  • Potential solution Encryption at different
  • Problem Privacy violation due to poor system
  • Potential solution Develop methodology for
    designing privacy-enhanced systems

Privacy Constraint Processing
  • Privacy constraints processing
  • Based on prior research in security constraint
  • Simple Constraint an attribute of a document is
  • Content-based constraint If document contains
    information about X, then it is private
  • Association-based Constraint Two or more
    documents taken together is private individually
    each document is public
  • Release constraint After X is released Y becomes
  • Augment a database system with a privacy
    controller for constraint processing

Architecture for Privacy Constraint Processing
User Interface Manager
Privacy Constraints
Constraint Manager
Database Design Tool Constraints during database
design operation
Update Processor Constraints during update
Query Processor Constraints during query and
release operations
Semantic Model for Privacy Control
Dark lines/boxes contain private information
Has disease
Johns address
Patient John
Travels frequently
Privacy Preserving Data Mining
  • Prevent useful results from mining
  • Introduce cover stories to give false results
  • Only make a sample of data available so that an
    adversary is unable to come up with useful rules
    and predictive functions
  • Randomization
  • Introduce random values into the data and/or
  • Challenge is to introduce random values without
    significantly affecting the data mining results
  • Give range of values for results instead of exact
  • Secure Multi-party Computation
  • Each party knows its own inputs encryption
    techniques used to compute final results
  • Rules, predictive functions
  • Approach Only make a sample of data available
  • Limits ability to learn good classifier

Cryptographic Approaches for Privacy Preserving
Data Mining
  • Secure Multi-part Computation (SMC) for PPDM
  • Mainly used for distributed data mining.
  • Provably secure under some assumptions.
  • Learned models are accurate
  • Efficient/specific cryptographic solutions for
    many distributed data mining problems are
  • Mainly semi-honest assumption (i.e. parties
    follow the protocols)
  • Malicious model is also explored recently. (e.g.
    Kantarcioglu and Kardes paper in this workshop)
  • Many SMC based PPDM algorithms share common
    sub-protocols (e.g. dot product, summation, etc.

Cryptographic Approaches for Privacy Preserving
Data Mining
  • Drawbacks
  • Still not efficient enough for very large
    datasets. (e.g. petabyte sized datasets ??)
  • Semi-honest model may not be realistic
  • Malicious model is even slower
  • Possible new directions
  • New models that can trade-off better between
    efficiency and security
  • Game theoretic / incentive issues in PPDM
  • Combining anonymization and cryptographic
    techniques for PPDM

Perturbation Based Approaches for Privacy
Preserving Data Mining
  • Goal Distort data while still preserve some
    properties for data mining propose.
  • Additive Based
  • Multiplicative Based
  • Condensation based
  • Decomposition
  • Data Swapping

Perturbation Based Approaches for Privacy
Preserving Data Mining
  • Goal Achieve a high data mining accuracy with
    maximum privacy protection.

Perturbation Based Approaches for Privacy
Preserving Data Mining
  • Privacy is a personal choice, so should enable
    individual adaptable (Liu, Kantarcioglu and
    Thuraisingham ICDM06)

Perturbation Based Approaches for Privacy
Preserving Data Mining
  • The trend is to make PPDM approaches fit in the
  • We investigated perturbation based approaches
    with real-world data sets
  • We give a applicability study to the current
  • Liu, Kantarcioglu and Thuraisingham, DKE 07
  • We found out,
  • The reconstruction the original distribution may
    not work well with real-world data set
  • Distribution is a hard problem, should not use as
    a media step
  • Try to modify perturbation techniques, and adapt
    some data mining tools, e.g. Liu, Kantarcioglu
    and Thuraisingham, Novel decision tree UTD
    technical report 06

CPT Confidentiality, Privacy and Trust
  • Before I as a user of Organization A send data
    about me to organization B, I read the privacy
    policies enforced by organization B
  • If I agree to the privacy policies of
    organization B, then I will send data about me to
    organization B
  • If I do not agree with the policies of
    organization B, then I can negotiate with
    organization B
  • Even if the web site states that it will not
    share private information with others, do I trust
    the web site
  • Note while confidentiality is enforced by the
    organization, privacy is determined by the user.
    Therefore for confidentiality, the organization
    will determine whether a user can have the data.
    If so, then the organization van further
    determine whether the user can be trusted

Platform for Privacy Preferences (P3P) What is
  • P3P is an emerging industry standard that enables
    web sites to express their privacy practices in a
    standard format
  • The format of the policies can be automatically
    retrieved and understood by user agents
  • It is a product of W3C World wide web consortium
  • When a user enters a web site, the privacy
    policies of the web site is conveyed to the user
    If the privacy policies are different from user
    preferences, the user is notified User can then
    decide how to proceed
  • Several major corporations are working on P3P
    standards including

Platform for Privacy Preferences (P3P)
  • Several major corporations are working on P3P
    standards including
  • Microsoft
  • IBM
  • HP
  • NEC
  • Nokia
  • NCR
  • Web sites have also implemented P3P
  • Semantic web group has adopted P3P

Platform for Privacy Preferences (P3P)
  • Initial version of P3P used RDF to specify
    policies Recent version has migrated to XML
  • P3P Policies use XML with namespaces for
    encoding policies
  • P3P has its own statements and data types
    expressed in XML P3P schemas utilize XML schemas
  • P3P specification released in January 20005 uses
    catalog shopping example to explain concepts P3P
    is an International standard and is an ongoing
  • Example Catalog shopping
  • Your name will not be given to a third party but
    your purchases will be given to a third party
  • ltPOLICIES xmlns http//
  • ltPOLICY name - - - -
  • lt/POLICYgt
  • lt/POLICIESgt

P3P and Legal Issues
  • P3P does not replace laws
  • P3P work together with the law
  • What happens if the web sites do no honor their
    P3P policies
  • Then appropriate legal actions will have to be
  • XML is the technology to specify P3P policies
  • Policy experts will have to specify the policies
  • Technologies will have to develop the
  • Legal experts will have to take actions if the
    policies are violated

Privacy for Assured Information Sharing
Data/Policy for Federation
Data/Policy for
Data/Policy for
Agency A
Agency C
Data/Policy for
Agency B
Privacy Preserving Surveillance
Raw video surveillance data
Face Detection and Face Derecognizing system
Suspicious people found
Faces of trusted people derecognized to preserve
Suspicious events found
Comprehensive security report listing suspicious
events and people detected
Suspicious Event Detection System
Manual Inspection of video data
Report of security personnel
Directions Foundations of Privacy Preserving
Data Mining
  • We proved in 1990 that the inference problem in
    general was unsolvable, therefore the suggestion
    was to explore the solvability aspects of the
  • Can we do something similar for privacy?
  • Is the general privacy problem solvable?
  • What are the complicity classes?
  • What is the storage and time complicity
  • We need to explore the foundation of PPDM and
    related privacy solutions

Directions Testbed Development and Application
  • There are numerous PPDM related algorithms. How
    do they compare with each other? We need a
    testbed with realistic parameters to test the
  • It is time to develop real world scenarios where
    these algorithms can be utilized
  • Is it feasible to develop realistic commercial
    products or should each organization adapt
    product to suit their needs?

Key Points
  • 1. There is no universal definition for privacy,
    each organization must definite what it means by
    privacy and develop appropriate privacy policies
  • 2. Technology alone is not sufficient for privacy
    We need technologists, Policy expert, Legal
    experts and Social scientists to work on Privacy
  • 3. Some well known people have said Forget about
    privacy Therefore, should we pursue research on
  • Interesting research problems, there need to
    continue with research
  • Something is better than nothing
  • Try to prevent privacy violations and if
    violations occur then prosecute
  • 4. We need to tackle privacy from all directions

Application Specific Privacy?
  • Examining privacy may make sense for healthcare
    and financial applications
  • Does privacy work for Defense and Intelligence
  • 3Is it eve meaningful to have privacy for
    surveillance and geospatial applications
  • Once the image of my house is on Google Earth,
    then how much privacy can I have?
  • I may want my location to be private, but does it
    make sense if a camera can capture a picture of
  • If there are sensors all over the place, is it
    meaningful to have privacy preserving
  • This suggestion that we need application specific
  • It is not meaningful to examine PPDM for every
    data mining algorithm and for every application

Data Mining and Privacy Friends or Foes?
  • They are neither friends nor foes
  • Need advances in both data mining and privacy
  • Need to design flexible systems
  • For some applications one may have to focus
    entirely on pure data mining while for some
    others there may be a need for privacy-preserving
    data mining
  • Need flexible data mining techniques that can
    adapt to the changing environments
  • Technologists, legal specialists, social
    scientists, policy makers and privacy advocates
    MUST work together