Title: Data protection : issues about anonymisation and dissemination for researchers The French experience
1Data protection issues about anonymisation and
dissemination for researchers The French
experience
- Roxane Silberman
- CCDSHS/Réseau Quetelet
- CESSDA Wokshop , Athèens 11-12 ocotbre 2006
2Introduction
- Contribute to the discussion
- The French experience over the last 20 years
- Impact of institutional arrangements
- Impact of pressure and space for negotiations
- Changes in the contexts
- Old and new questions
- Role of the Data Archives in a new world
3A specific context
- Importance of the National Institute for
Statistics (Insee) a lot of surveys - A specificity of Insee a scientific dimension
(a department of research) - Funding academic surveys is not a tradition in
France - Socio-political surveys
- Some specialized research institutes INED,
INSERM - High pressure upon Insee to get access and
questions of equal treatment between researchers - Less experience in sharing data
4Different experiences of anonymisation
- Very different experiences and ways to deal with
anonymization in the Réseau Quetelet - Centre Maurice Halbwachs (ex-Lasmas) access to
public statistics - INED making surveys (in relation with Insee but
as a research institute) and disseminating its
own surveys - CDSP disseminating socio-political surveys
5Main issues
- The French legal framework its evolution and
the impact of the European directive - Changes and differences in practices
- Current questions and current negotiations
6I.The French legal framework
7The French implementation
- Four sources
- Statistical law
- Privacy protection law
- Archives law
- Law about informatio
- Changes over the 30 last years
- Some conflicts between these four sources
8A. Statistical laws
- Importance of the implementation of the
statistical law - Two regulations for surveys, for administrative
purpose - The 1951 statistical law
- The CERFA procedure for administrative data
9The 1951 law
- The 1951 law defines the rules in collecting
statistical data for the state (obligation,
coordination, statistical secret) - Personal data and business data
- Promulgation by the Ministry of Economy, under
the control of the CNIS (National council for
statistical information) which includes the
social partners (and researchers), and with
authorization of the CNIL for individual data - The 1986 addition allow Insee to ask for
administrative databases
10Statistical secret
- Formal interpretation
- - statistical secret no dissemination,
- - liberal interpretation no dissemination of
non anonymised data - Exception for business data (assumption
business data cannot be anonymised)
dissemination through a Committee (Comité du
secret) including business representatives
11Recent changes
- 2004 updating of the 1951 law
- Enlargement of the role of the Comité du Secret
to give access to business administrative data - One researcher in the Comité du secret
12The CERFA procedure
- For administrative purposes
- Visas
- Right of access for citizens through CADA
- Under the control of the administration
- The 1986 addition to the 1951 law gave right to
the Insee to mobilize administrative data
13Other changes in the law
- 2004
- 1978 law and the European directive
- Compatibility of research aim (history and
statistics also)
14B. The 1978 law data protection
- Protection of individual data
- One of the first laws in Europe (the first wave
before the implementation of the 1995 European
directive) - Linked to the SAFARI episode in the context of
informatique revolution 1974 - Had an impact on the statistical law the
Ministry must have the advice of the CNIL for
Insee surveys and the 1986 addition in order to
allow Insee to get administrative data - Private and public regimes differ declaration
or authorization cf the SAFARI episode
15Researchers
- Nothing specific for researchers
- No reactions from researchers in 1978
- Additional chapter in 1994 for epidemiologist
(with a specific ethic research committee) - Difficult relations between researchers and the
CNIL - Complex relations between Insee and CNIL
16The impact of the 1995 European directive
- France had a rather restrictive law, no needs to
add much in order to apply the directive (France
was late in updating the law) - But the European directive was seized as an
opportunity by the statisticians and the
researchers to implement the compatibility of
statistical and research aims with initial aim of
the data collection - Joint lobbying of the statisticians and the
researchers - The 2004 updating of the 1978 law introduces the
compatibility of the initial aim with the
historical, statistical and scientific aims
(impact for storing on the long term and impact
on dissemination and reuse)
17But
- Moves from nominative data to direct or indirect
identification - Some extension in the definition of the
sensitive variables impact for collecting
data but also for access - Same regime for private and public data
declaration except for sensitive data more
pressure on researchers - Still no statistician in the CNIL
18C. The law about the Archives
- Establishes huge delays for access to archives
but also obligation to store data (including
individual data) - A tendency to open archives
19D. Law about information
- In line with European directive
- Say nothing about researchers
- But may impact dissemination of data (see Insee
and open access to surveys on the website)
20First conclusions
- Different laws with potential conflicts on right
of access - CADA, CNIL, Archives
- Complexity interrelation between the CNIL (data
protection) and the INSEE (statistical law) who
is deciding what in terms of access for
researchers ?
21II. Practices
22Differences in practices and evolutions
- Different periods
- Different contexts research / public statistics
- Qualitative and quantitative
23A first period the end of the 70ies and the
beginning of the 80ies
- A general feature France was late in setting up
a Data Archives (86) - Socio-political surveys CIDSP (Grenoble)
- Insee and other statistical bodies only
individual access through personal relations but
anonymisation is not a main issue very
different arguments but mainly no real discussion
on the topic, and individual exceptions - Research institutes as Ined no culture of
sharing data - Sharing data is the issue, not anonymisation
24Second period 86 - 98
- Some progress in sharing data
- First collective agreement with Insee and other
statistical departments - Restrictions to the CNRS (not the universities)
- Anonymisation is not a main issue a liberal
interpretation of the statistical law, right to
disseminate anonymised data, mostly direct
identification - Commercial issues are more important
- First discussion in INED about sharing data
25Third period 1999 -2002
- Changes in the level of anonymisation a
decision from CNIL and Insee for the 1999 census
no more dissemination at the level of little
geographical units, no dissemination of sensitive
variables (less details on nationality and
country of birth) - Huge impact for geographers (but also for urban
managers and municipalities) - Difficulties for specific geographical
agregations - Difficulties for longitudinal analysis
- Difficulties for contextual analysis
26Negotiation with Insee and other statistical
departments
- Other problems
- - access for universities
- - access to other statistical departments
- - costs
- General negotiation
- A committee in charge of a national policy for
social sciences research, Insee and other
statistical departments a place to discuss and
negotiate (CCDSHS)
27A rather liberal situation
- A lot of individual micro-data available for all
researchers (also available for other countries) - Access to business surveys through the Comité du
secret (see composition) few refusals - Unequal access to administrative data
28but growing concerns with anonymisation
- Mainly not much progress on the Census
dissemination a complex system with different
products that have been offered to replace less
anonymised sample, tabulation but no
modelisation, time consuming - NB In the same time, urban managers were more
successful) - Geographical levels anonymisation became the
general rule for all surveys - New restrictions 1) sensitive variables
nationality,country of birth, spoken languages - 2) indirect identification not
only geographical precision but professions,
income .
29 in all sectors
- Indirect identification becomes an issue also for
research institutes as Ined - New pressure on individual researchers who were
unaware of these issues notifying surveys to
the CNIL, asking for authorizations, paying
attention to confidentiality
30Lobbying to change the 1978 law
- Difficult relations with the CNIL (no
statisticians) - Discussion with the CNIL
- Common lobbying research and Insee
- The 2004 updating of the 1978 law open new space
to negotiate
31but not the only problem
- The statistical law
- The argument of the statistical secret comes back
in a different way (responsibility, possible
sanctions, rate of response ) - Changes in the statistical law ? Status for
researchers ?
32 in a new context
- New and powerful statistical tools that demand
all data - More administrative data that can also be merged
with surveys - More panels difficult to anonymised
33Difference in practices for anonymisation
- Differences between surveys
- Differences between institutions (Insee,
statistical departments, governmental agencies,
research institutes, individual researchers)
different knowledge about the laws, different
interpretation, different practices about
indirect identification - Discussion about indirect identification with the
CNIL
34 but also some access
- Access through CNIL and Comité du secret
- Individual contracts under the responsibility of
the statistical department or the governmental
agency for administrative data - Access through CADA (even for newspapers) and
National Archives - Impact of law about information
- A specific treatment the research unit in Insee
35III. Current discussions and new negotiation
- Very different practices and situations
- High pressure from researchers
- Space in the law
36Two directions
- Researches files
- Safe centers
37Research files
- General ideal an intermediate level between
anonymised data and safe centers - Two levels public files (now on the Insee web
site ) and research file - A negotiation Insee/CNIL/ Ministry of Research
- A general authorization from the CNIL that will
allow to discuss with Insee more detailed files - The Data Archives will be in charge for
dissemination and will have the responsibility - Rely on organization and confidence
38Safe centers
- For more sensitive data
- Also to merge datasets
- Business data will go in the safe centers
- Census not very clear at the moment
- Different options and questions are currently
under discussion - Safe centers or remote access
- Role of the researchers
- Role of the Data Archives
39Conclusions 1
- A different world different data panels,
administrative data, merged datasets, powerful
statistical tools, more concern about
confidentiality - Space for negotiation in the law and between the
laws - Discussion and collective pressure is effective
- But needs for organization different levels of
dissemination
40Conclusions 2
- Role of the Data Archives in this new world?
- Importance of information and documentation in a
distributed system - More discussion about indirect identification
- More discussion about sensitive variables
- More discussion about choice between
anonymisation or different levels and systems of
dissemination - Data Archives as an actor in the negotiations