The census in global perspective and the coming census microdata revolution * * * Robert McCaa - PowerPoint PPT Presentation

About This Presentation
Title:

The census in global perspective and the coming census microdata revolution * * * Robert McCaa

Description:

... 1950 1% samples - 1980, 1990 samples varying densities, contents CELADE: Latin America - 1960s: 16 ... Ireland , Netherlands ... Czech Republic, Estonia ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 53
Provided by: RobertM231
Category:

less

Transcript and Presenter's Notes

Title: The census in global perspective and the coming census microdata revolution * * * Robert McCaa


1
The census in global perspective and the coming
census microdata revolution Robert McCaa
Steven RugglesMinnesota Population
Centerhttp//www.ipums.orgIPUMS International
funded byNational Science Foundation
2
Subtext Why should Nordic countries
participate in a project to preserve the worlds
census microdata and help make them usable?
Longest historical series of census microdata in
the world Cross-national research on a global
scale requires representation of all cultural
regions Intriguing demographic, historical
laboratory Large pool of scientific talent with
global concerns Persisting cultural, scientific
ties with Minnesota (would, for example, U. of
Texas be as interested?)
3
Globalization of the census the coming census
microdata revolution
  • 1. Introduction census census microdata
  • 2. The population census goes global coverage,
    periodicity, and content
  • 3. Liberating census microdata preservation,
    anonymization, integration, dissemination
  • 4. Statistical confidentiality and census
    samples a 36 year-long perfect record
  • 5. International norms of statistical
    confidentiality
  • 6. Harmonizing and disseminating scientifically
    anonymized census samples IPUMSi

4
1. IntroductionThe census what is it?Census
microdata what are they?How can they be made
usable? Why should we care?
5
16th c. census of Mexico (Nahuatl, 1530s).
Here is the home of one...
(from Museum of Antropology, Mexico City)
original ms.
transcribed
translated
digitized
6
16th c. census of Mexico (Nahuatl, 1530s).
Here is the home of one...
(from Museum of Antropology, Mexico City)
original ms.
transcribed
translated
digitized
When is a census, a census? Goyer (1986)
5. Individual enumeration 6. Periodic
enumeration7. Publication of results8.
Dissemination of results
1. National legal authority2. Defined
enumeration area 3. Complete coverage 4.
Simultaneous enumeration
7
An Aztec extended family 5 conjugal units, 4
generations, 3 married brothers
1530
8
450 years later An example of a patrilateral
household from rural Morelos 5 conjugal unions, 3
generations
1990
(not kin)
9
Examples to percentagesHave there been changes
in 4 1/2 centures?
10
Census microdata of the late 20th century What
are they?Who bears preservation
responsibility?Who will make them usable?
Person number
Age
Sex
  • 12100102600700720000011210000104
  • 22200202600700720000011210000104
  • 32300100600700720000012123000000
  • 42300200400700000000000000000000
  • 52300200200700000000000000000000
  • 62300200000700000000000000000000

Census microdata
Censuses are costly
Public goods should be democratized
Where microdata are available, they are used
11
Globalization of the census the coming census
microdata revolution
  • 1. Introduction census census microdata
  • 2. The population census goes global coverage,
    periodicity, and content
  • 3. Liberating census microdata preservation,
    anonymization, integration, dissemination
  • 4. Statistical confidentiality and census
    samples a 36 year-long perfect record
  • 5. International norms of statistical
    confidentiality
  • 6. Harmonizing and disseminating scientifically
    anonymized census samples the case of IPUMSi

12
2. The population census goes global.Coverage
becomes universal(thanks to A.N. Kiær,
Statistics Norway, who promoted globalization of
census at beginning of 20th c.)Content becomes
uniformDecennial censuses become the norm
13
Population censuses became universal in the 20th
century.
Will census microdata ... in the 21st?
  • 153 countries with 1 million pop. in 2000
  • 2000 round figures are provisional

14
Content ... increasingly uniform, principal
source on population information.social
variables
15
Content ... increasingly uniformeducation and
migration variables
16
Content ... increasingly uniformdemographic and
economic variables
17
Decennial censuses are the rule (1945-2004).of
153 countries with 1 million poptotaling 6
billion people in 2000
  • At least one census per decade 66
    countries 50 of worlds population
  • Missed a single decennial enumeration 43
    countries 38 of worlds population
  • Missed 2 or 3 enumerations 32 countries 10
    pop.
  • Fewer than 3 enumerations 12 countries
    2 of pop.

18
On a millennial scale, censuses and census
microdata survive for only a short, but
significant period
19
Globalization of the census the coming census
microdata revolution
  • 1. Introduction census census microdata
  • 2. The population census goes global coverage,
    periodicity, and content
  • 3. Liberating census microdata preservation,
    anonymization, integration, dissemination
  • 4. Statistical confidentiality and census
    samples a 36 year-long perfect record
  • 5. International norms of statistical
    confidentiality
  • 6. Harmonizing and disseminating scientifically
    anonymized census samples the case of IPUMSi

20
official statistics that meet the test of
practical utility are to be compiled and made
available on an impartial basis by official
statistical agencies to honor citizens
entitlement to public information.-- UN
Statistical Commission, 1994
21
IPUMSi helps five ways
  • 1. Inventory the worlds census microdata
  • 2. Preserve endangered microdata and
    documentation
  • 3. Anonymize census microdata to preserve
    statistical confidentiality, using highest
    standards (Stat. Nether.)
  • 4. Integrate datasets of selected countries using
    UN, Eurostat and other standards
  • 5. Disseminate database free with complete copies
    to all partners

Integrated Public Use Microdata Series -
International
22
IPUMSi
INVENTORIES
  • Microdata...for any population or administrative
    division Nation, province, district, city,
    ethnic group, etc.
  • Example Latin America, - 20 countries- 67
    censuses inventoried- 1 - 100 sample
    densities- 100,000 to 150 million cases19th
    century 2 censuses1960s 14 1970s 17 1
    980s 16 1990s 17
  • Found complete census data for Colombia 1973 and
    16 other countries

23
PRESERVES
IPUMSi
UN Demographic Center for Latin America (CELADE,
Santiago, Chile)3000 microdata tapes to be
preserved
and metadata (documentation)
24
Preserve against accident, deterioration and
technological obsolescence
  • Microdata
  • - transfer to stable media
  • - use standard data storage protocols
  • - entrust copies with at least two depositories
  • Metadata collect, catalogue, and reproduce
  • - Enumeration forms (preserve all versions used)
  • - Enumerator and data processing instructions
  • - Codebooks (photocopies and scanned images)
  • - Technical studies, evaluations, reports

UN Stat. Div. entire archive deposited, to be
scanned
25
Globalization of the census the coming census
microdata revolution
  • 1. Introduction census census microdata
  • 2. The population census goes global coverage,
    periodicity, and content
  • 3. Liberating census microdata preservation,
    anonymization, integration, dissemination
  • 4. Statistical confidentiality and census
    samples a 36 year-long perfect record
  • 5. International norms of statistical
    confidentiality
  • 6. Harmonizing and disseminating scientifically
    anonymized census samples the case of IPUMSi

26
How anonymized census samples became a standard
statistical product
  • US Census Bureau
  • - 1960 census 0.1 public use microdata series
  • - 1970 census six 1 samples harmonized with
    1960
  • - 1984 1940, 1950 1 samples
  • - 1980, 1990 samples varying densities, contents
  • CELADE Latin America
  • - 1960s 16 countries, densities 1-5
  • - 1970s 19 countries, 1-10

27
How anonymized census samples became a standard
statistical product
  • Canada
  • - 1971, 1976, 1981, 1986, 1991, 1996 varying
    designs, densities
  • - 1996 Data Liberation Initiative led to an
    explosion in of usage in research and teaching
  • UK
  • - 1991 2 individuals, 0.5 householdshundreds
    of publications, thousands of users
  • - 2001 double the densities because
    confidentiality assessments were too conservative.

28
Risk assessment of statistical confidentiality
  • Take into account error, coding variability and
    changing of personal characteristics in time
  • Dale and Elliott, JRSS-A (forthcoming)
    For a user of an outside
    database, attempting this sort of match with no
    opportunity for verification would prove
    fruitless. In the first place, the small degree
    of expected overlap would be a considerable
    deterrent to an intruder. However, if a match
    between the two files was attempted the large
    number of apparent matches would be highly
    confusing as an intruder would have no way of
    checking correct identification.

29
Statistical confidentiality in the USA a brief
history
  • Before 1954
  • - 1850 exclusively for the use of the
    government, and not to be used...to the
    gratification of curiosity...
  • - 1920s deny access to data on individuals
  • - 1942 refused to supply War Dept. w/ addresses
    of Japanese-Americans
  • after 1954
  • - census microdata do not reveal identities of
    individuals
  • - basic geographical identifiers, low sample
    densities, masking, swapping, top-coding,
    re-coding
  • In practice, not a single breach or allegation of
    a breach!

30
Heightened concerns about confidentiality in USA
  • Assault on privacy by businesses
  • Distrust of government
  • Never a question of use of census microdata. Yet
    must avoid any possible perception of mis-use to
    retain confidence and cooperation of citizens.
  • Pro-active strategy
  • - Publicize confidentiality safe-guards
  • - Offer a variety of microdata products higher
    risks, higher security
  • - Data enclaves expensive, low usage,
    exceedingly detailed microdata

31
Globalization of the census the coming census
microdata revolution
  • 1. Introduction census census microdata
  • 2. The population census goes global coverage,
    periodicity, and content
  • 3. Liberating census microdata preservation,
    anonymization, integration, dissemination
  • 4. Statistical confidentiality and census
    samples a 36 year-long perfect record
  • 5. International norms of statistical
    confidentiality
  • 6. Harmonizing and disseminating scientifically
    anonymized census samples the case of IPUMSi

32
statistical confidentiality shall mean the
protection of data related to single statistical
units which are obtained directly for statistical
purposes or indirectly from administrative or
other sources against any breach of the right to
confidentiality. It implies the prevention of
non-statistical utilization of the data obtained
and unlawful disclosure. --COUNCIL REGULATION
(EC) No 322/97 of 17 February 1997
33
Statistical confidentiality standards in Eurostat
Countries ( in IPUMSi consortium)
  • Norway Statistics Norway is prohibited to
    publish or disclose data from which information
    about individual persons or firms can be derived.
    Researchers may be given access to such
    information under strict rules and conditions.
    Guidelines provided by the Norwegian Data
    Inspectorate form the framework for internal
    management of data security.
  • Other countries with strict provisions
    Austria, Canada, Denmark, Finland, France,
    Germany, Ireland, Netherlands, Sweden

34
Anonymized census microdata sampleavailability
for European countries( in IPUMSi consortium,
negotiating)
  • 15 countries available via PAU, 1990 round (3 in
    IPUMSi),
  • Belgium, Czech Republic, Estonia, Finland,
    Hungary, Italy, Latvia, Lithuania, Norway,
    Poland, Spain, Sweden, Switzerland, Turkey, UK
  • 11 countries not available via PAU (2 in IPUMSi)
  • Austria, Croatia, Denmark, France, Germany,
    Iceland, Ireland, Netherlands, Portugal, Slovak
    Republic, Slovenia

35
EUROSTAT statistical anonymity standards(Thorogoo
d, 1999)--all accepted by IPUMSi
  • 1. small sample size
  • 2. limited geographical detail
  • 3. top and bottom coding of unique categories
  • 4. signed non-disclosure agreement
  • 5. prohibit redistribution of datasets to third
    parties
  • 6. prohibit attempts to identify individuals or
    the making any claim to that effect
  • 7. require users to provide copies of
    publications

36
EUROSTAT statistical anonymity standards(Thorogoo
d, 1999)--all accepted by IPUMSi and more
  • 8. Age (constructed, where necessary)
  • 9. Never identify date of birth
  • 10. Never identify place of birth
  • 11. Migration timing and place not identified
    in detail
  • 12. Place of residence identified by major civil
    division (popgt60k, 120k, 250k, 1
    million--national rule)
  • 13. Sensitivity analysis of variables by
    national experts
  • 14. Confidentiality assessment by national
    experts

37
International Monetary Funds General Data
Dissemination System52 countries with uniform
standards
  • All embrace strict standards of statistical
    confidentiality
  • Prohibit disclosure of information which may
    identify individuals or entities
  • 37 countries distribute anonymized census
    microdata samples

38
Globalization of the census the coming census
microdata revolution
  • 1. Introduction census census microdata
  • 2. The population census goes global coverage,
    periodicity, and content
  • 3. Liberating census microdata preservation,
    anonymization, integration, dissemination
  • 4. Statistical confidentiality and census
    samples a 36 year-long perfect record
  • 5. International norms of statistical
    confidentiality
  • 6. Harmonizing and disseminating scientifically
    anonymized census samples the case of IPUMSi

39
IPUMSi
Making the data usable... and used.
IPUMSi,1999-2004 20 countries 1850-2000
40
PAYS
IPUMSi
National experts in each country are contracted
to
Assemble microdata and documentation Develop
samples to minimize confidentiality risks and
maximize robustness Design national integration
plancensus-by-censusconcept-by-conceptcode-by-c
ode Write integrated documentation
41
INTEGRATES
IPUMSi
StandardUN/Eurostat Principles Recs...
Census documentation compiled for Colombian
microdata
Photos from Colombia integration project,
February-March, 20004 experts from DANE (census
office)7 academics (3 universities)
42
IPUMSi integration principles
  • 1. Respect absolute anonymity
  • 2. Preserve all original data, except adjustments
    to insure privacy (top codes blurrings, masking,
    re-ordering, etc.)
  • 3. Harmonize codes for countriesoccupation
    ISCO, HISCO (detailed, general)education ISCED
    family IPUMS, etc.
  • 4. Enhance with constructed variables

43
INTEGRATES
IPUMSi
10 projects started
First 18 months
USA 1850-1880, 1900-2000 France 1962, 1968,
1975, 1982, 1990 Norway 1801, 1865, 1875, 1900
negotiating 1960, 1970, 1980, 1990, 2001 Canada
1871, 1881, 1901 negotiating
1961-2001 United Kingdom (1851, 1881), 1991
negotiating 1961, 1971, 1981, 2001 Argentina
1869, 1895 Colombia 1964, 1973,1985, 1993,
2003 Vietnam 1989, 1999 Hungary 1970, 1980,
1990, 2000
44
INTEGRATES
IPUMSi
5 projects planned
Mexico 1960, 1970, 1980, 1990, 2000 Spain 1981,
1991, 2001 Brazil 1960, 1970, 1980, 1991,
2001 China 1982, 1990, 2000 Kenya 1989, 1999
3 negotiations underway
Ghana 1984, 2000 Italy 1981, 1991,
2001 Austria 1971, 1981, 1991, 2001
45
??
IPUMSi
7 future possibilities
Country Census microdata a. 1860, 1870,
1880, 1950, 1960, 1970, 1980, 1990,
2000 b. 1961, 1971, 1981, 1991, 2001 c. 1961,
1971, 1976, 1981, 1986, 1991, 1996 d. 1960,
1965, 1970, 1975, 1980, 1985, 1990,
1995 e. 1960, 1966, 1970, 1975, 1980, 1985,
1990, 1995 f. 1971, 1981, 1991, 2001 g. 1970,
1980, 1990, 2000 and .... ???
46
ANONYMIZES
IPUMSi
Using the highest standards currently
availabletechnical (Statistics
Netherlands)administrative (license agreement)
Imagine a new statistical product a
scientifically anonymized census microdata sample
made up of unidentifiable individuals...
47
IPUMSi preserves statistical confidentiality(in
addition to NSO safe-guards)
  • 1. Construct small samples
  • 2. Suppress geographical detail (minor civil
    divisions and others with less than 100,000
    population), date of birth, 3-4 digit
    occupational codes, etc.
  • 3. Blur codes for sensitive variables where
    identity might be compromised (income)
  • 4. Top-code income, education, etc.
  • 5. Swap a small fraction of records
  • 6. Assess confidentiality risks for unique
    records for all defined geographical areas
    (ARGUS, Statistics Netherlands)

48
Repositories of anonymized census microdata
samples for scientific research
  • ICPSR, University of Michigan
  • ACAP, University of Pennsylvania
  • CELADE, Centro Latino Americano de Demografía,
    Santiago Chile.
  • ECE/PAU, Population Affairs Unit, Geneva
    Switzerland.
  • EWC, East-West Center, U. of Hawaii.
  • IPUMSi, University of Minnesota.
  • Will others (a Nordic institution?) join the
    effort?

49
DISSEMINATES
IPUMSi
International web-based access system
End-User license agreement protects privacy and
confidentiality assures proper use User selects
countries, cases, variables, and
samples--makes cross-national research
possible Open architecture software and mirror
sites available to all partners
50
Why should Nordic countries participate now?
Legal and scientific foundations in place
EUROSTAT, France, Austria, UK, etc. Project has
been underway 18 months of 5 year project if
resources are required, budget planning must
begin soon. Historical census microdata projects
are well advanced 1801, 1865 (100 club), 1875,
1900. Time to turn to contemporary census
microdata
51
additional information athttp//www.ipums.org
Thank you
52
Work plan, part II make census microdata usable
  • 3. Integrate March 2000- National partners
  • -integrate phase I countries using UN/Eurostat
    Principles Recommendations
  • -help to design prototype
  • Analyze all concepts, variables and codes of
    census schedules for 30 target countries
  • -help to implement for phase I and II countries
  • 4. Disseminate -October 2004
  • - Design international data access engine
  • - Implement with phase I and II countries
Write a Comment
User Comments (0)
About PowerShow.com