Statistical confidentiality and privacy: 1. General considerations * * * Robert McCaa Minnesota Population Center rmccaa@umn.edu - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical confidentiality and privacy: 1. General considerations * * * Robert McCaa Minnesota Population Center rmccaa@umn.edu

Description:

UNSD Principles and Recommendations (Rev. 1, 1997) endorse dissemination ... are innocuous. Nothing to be gained. from matching. Please allow me to invite you ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 23
Provided by: tri5154
Category:

less

Transcript and Presenter's Notes

Title: Statistical confidentiality and privacy: 1. General considerations * * * Robert McCaa Minnesota Population Center rmccaa@umn.edu


1
Statistical confidentiality and privacy1.
General considerations Robert
McCaaMinnesota Population Centerrmccaa_at_umn.edu
Inadequate use of microdata has high
costs--Len Cook (2003, registrar general, ONS)
2
UNSD Principles and Recommendations (Rev. 1,
1997) endorse dissemination of census microdata
  • 1.218 There are a range of methodsthat can be
    used to make such microdata available while still
    protecting individuals rights to privacy.
    (Rev. 2 has a stronger statement.)
  • In four decades of distributing microdata there
    is not a single allegation of a breach of
    confidentiality or privacy (includes 100
    microdata stored at CELADE in Santiago, Chile).

3
Why disseminate microdata? Julia Lane, European
Statisticians Conference (2003)
  • 1. Analyze more realistic questions
  • 2. Develop reality-based policy
  • 3. Acquire new constituencies and stakeholders
  • 4. Build trust reduce suspicions of data cooking
  • 5. Replicate findings
  • a. use standards of UNSD, Eurostat, ISCO, ISCED,
    etc.
  • b. facilitate comparative research in time and
    space
  • 6. Calculate marginal effects
  • 7. Assess data quality
  • and much, much more.

4
Imagine!!!
Whats the problem?
  • Confidentializing an integrated microdata base
    with
  • 200 samples of households (70 countries)
  • Containing ½ billion person records with
    thousands of variables
  • Available to tens of thousands of licensed users
    regardless of country of birth, citizenship,
    residence or place of work
  • Without a single allegation of violation of
    privacy or statistical confidentiality--

Ever!!
5
Usage Off-site vs. on-site use (secure
microdata laboratory)? Germany RDC, 2005-8
ten-to-one
Jan-Sept
RDCs are expensive and attract few users.
6
ONS-UK gold standard
Statistical disclosure control methods may
modify the data or the design of the statistic,
or a combination of both. They will be judged
sufficient when the guarantee of confidentiality
can be maintained, taking account of information
likely to be available to third parties, either
from other sources or as previously released
National Statistics outputs, against the
following standardIt would take a
disproportionate amount of time, effort and
expertise for an intruder to identify a
statistical unit to others, or to reveal
information about that unit not already in the
public domain. Protocols on Data Access and
Confidentiality, pp. 7-8 --ONS-UK(2004)www.stati
stics.gov.uk/about_ns/cop/downloads/prot_data_acce
ss_confidentiality.pdf
7
Risk assessment of household samples of UK 1991
census attempts at matching are fruitlessfew
matches many false positives
  • After taking into account errors in the data,
    coding variability and changing of personal
    characteristics in time
  • Dale and Elliott, JRSS-A (2003)
    For a user of an outside database,
    attempting this sort of match with no opportunity
    for verification would prove fruitless. In the
    first place, the small degree of expected overlap
    would be a considerable deterrent to an intruder.
    However, if a match between the two files was
    attempted the large number of apparent matches
    would be highly confusing as an intruder would
    have no way of checking correct identification.

8
Level of Anonymization(FSO-Germany)
Degree of confidentiality
stronger anonymisationmethod
delete direct identifier
anonymisationmethod
de-facto anonymised microdata
fully anonymised microdata
complete microdata
confidential microdata
Degree of analysis potential
Trade-off between confidentiality and analysis
potential is it
monotonic (as portrayed)?
9
Level of Anonymizationnot monotonic
Degree of confidentiality
95
99
99.9
stronger anonymisationmethod
delete direct identifier
anonymisationmethod
Construct sample
de-facto anonymised microdata
fully anonymised microdata
complete microdata
confidential microdata
50
25
45
Degree of analysis potential
Trade-off is not monotonic
10
Resources
  • UN-ECE (2007), Managing Statistical
    Confidentiality Microdata Access
    http//www.unece.org/stats/documents/tfcm.htm
  • IHSN Tools Guidelines, anonymizationwww.survey
    network.org
  • Eurostat (1999)

11
UN-ECE (2007) www.unece.org/stats/documents/tfcm
.htm
12
IHSN www.Surveynetwork.org
13
IHSN www.Surveynetwork.org
14
IHSN www.Surveynetwork.org
  • Remove variables
  • Identifiers name, address, low-level
    administrative geography
  • Sensitive tribe, disability
  • Global recoding
  • Aggregate classes age (5 yr groups), country of
    birth (continent), administrative geography,
    occupation (4 digit ? 3), etc.
  • Top and bottom coding (continuous
    variables--income, size of residence, number of
    rooms, etc.)
  • Local suppression--sparse categories (population
    n lt 2502,500)
  • Data swapping (household geography)
  • Complex perturbations

15
EUROSTAT statistical confidentiality standards
(Thorogood, 1999) --all endorsed by
IPUMS-International
  • 1. Restrict access to samples
  • 2. Limit geographical detail
  • 3. Re-code unique categories--top and bottom
  • 4. Sign non-disclosure agreement
  • 5. Prohibit redistribution to third parties
  • 6. Prohibit attempts to identify individuals or
    the making any claim to that effect
  • 7. Require users to provide copies of
    publications

16
EUROSTAT statistical confidentiality standards
(Thorogood, 1999) --all endorsed by
IPUMS-International
  • 8. Construct age from birthdate, if necessary
  • 9. Do not identify date of birth
  • 10. Do not identify precise place of birth
  • 11. Migration timing/place not identified in
    detail
  • 12. Identify place of residence by major civil
    division (popgt20k, 60k, 100k, 1 millioni.e.,
    national convention)
  • 13. Do sensitivity analysis
  • 14. Do confidentiality assessment (not yet)

17
Countering Fear, Hysteria and Paranoiawith reason
There has been no known attempt at
identification with the 1991 SARs microdata
samples of the UK-nor in any other countries
that disseminate samples of microdata
--Elliott and Dale, Journal of the Royal
Statistical Society, 1999
18

No official statistical microdata!!
Why Not?Companies want linkable data with names,
addresses, ID s, etc.
Probabilistic linking with 90 of the
population missing is not good enough
ChoicePoint Data Sources and Clients. Source
Washington Post
http//www.choicepoint.com/
19

No statistical microdata!!
To play pizza videohttp//www.aclu.org/pizza/
20
(No Transcript)
21
Statistical samples are innocuous. Nothing to
be gained from matching.
Countering Fear, Hysteria and Paranoiawith reason
There has been no known attempt at
identification with the 1991 SARs microdata
samples of the UK-nor in any other countries
that disseminate samples of microdata
--Elliott and Dale, Journal of the Royal
Statistical Society, 1999
22
Please allow me to invite you to think about
producing (or permitting IPUMS to produce)
anonymized, integrated samples for all the
censuses of your country for which microdata
surviveThank you Contact
rmccaa_at_umn.eduthis ppt is available
atwww.hist.umn.edu/rmccaa/ipums-global See
Port of Spain workshop
Write a Comment
User Comments (0)
About PowerShow.com