Title: Sharing of social science and humanities data
1Sharing of social science and humanities data
- Professor Denise Lievesley
- Head of School of Social Science and Public
Policy, - Kings College London and
- Chair, European Statistical Advisory Committee
2- Principles
- Policy
- Practice
- Partnerships
3Principles
- Scientific principle research findings together
with the data should be available for others to
refute, confirm, clarify, or extend the results
part of public accountability - Responsibility to funders and to society to use
resources efficiently (data are often
under-exploited) - Important to reduce response burden
- Increasing international responsibilities
4Scientific paradigm
- Many codes of professional conduct espouse these
principles eg - The International Statistical Institutes
declaration on professional ethics states that A
principle of all scientific work is that it
should be open to scrutiny, assessment and
possible validation by fellow scientists.
5Two important publications
- Fienberg S., Martin and Straf (1985) Sharing
research data National Academy Press -
- Arzberger P., Schroeder, Beaulieu, Bowker, Casey,
Laaksonen, Moorman, Uhlir, Wouters (2004)
Promoting Access to Public Research Data for
Scientific, Economic, and Social Development
Data Science Journal
6- Publicly funded research data are a public
good, produced in the public interest. As such
they should remain in the public realm.
Availability should be restricted only by
legitimate considerations of national security
restrictions protection of confidentiality and
privacy intellectual property rights and
time-limited exclusive use by principal
investigators.
7- In recent years, the debate on e-science has
tended to focus on the open access to the
digital output of scientific research, namely,
the results of research published by researchers
as the articles in the scientific journals. This
focus on publications often overshadows the
issues of access to the input of research - the
research data, the raw material at the heart of
the scientific process and the object of
significant annual public investments. In terms
of access, availability of research data
generally poses more serious problems than access
to publications. Arzberger et al (2004)
8Reduction of response burden
- Compliance costs important especially in small
countries and in surveys of elites, businesses,
institutions - Fresh data collection takes time and resources
- Secondary data analysis can take place in
resourceconstrained (including a
time-constrained) environment
9conclusion
- Deliberate replication is to be encouraged
- Duplication in ignorance of previous research is
to be abhored - There is growing awareness that failure to
exploit the full potential of data has costs for
society and many institutions and agencies now
espouse the aim of ensuring that data are used as
extensively as possible.
10Importance of establishing policies on data
access, sharing and preservation
- by
- funding agencies
- universities or university consortia
- professional societies
- data producers
- Policies need an implementation plan which
must pay attention to the sticks and carrots and
to the means of achieving the plan -
-
11Example policy UK Economic and Social Research
Council
- limits new data collection
- encourages secondary analysis
- requires deposit of new data and derived data in
UK data archive - determines the date for deposit
- sets standards for documentation
- provides resources for data access and
preservation - builds data commons
- funds data use workshops.
12Barriers to data access
- legal obstacles especially with respect to
confidentiality, commitments to respondents - technical and financial obstacles including
in-house capacity to handle the complex aspects
of micro-data dissemination such as data
anonymization - political obstacles
- psychological obstacles the tendency to control
access perhaps because of concerns over its
mis-interpretation or because data is power
13Incentives in academic system
- In 1985 the report of the US committee of
national statistics pointed out that A scientist
is recognised and rewarded through the scientific
community and its institutions. Researchers will
have greater incentives to share data if the
community and its institutions foster the idea
that the practice advances science and is part of
what is recognised as necessary and proper
scientific behaviour. - Competition, performance targets, etc
14Policies must pay attention to the
responsibilities of data users
- acknowledge and give credit
- respect conditions of access
- use data responsibly
- provide feedback on use
- Value and role of data intermediaries
15Benefits to universities of sharing data
- Development of knowledge
- Encourage greater exploitation of data and
therefore greater impact - Contribute to sound policy decisions
- Foster multiple perspectives on data
- Facilitate comparative research
- Create knowledgeable data community
- Provide feedback on data and improve data quality
- Improve citations and competitiveness
- Improve quality and relevance of teaching
16Putting the plan into practice
- Promotion of the plan
- Clear guidance for data producers
- Resources
- for providing access
- for preservation
17Access one size doesnt fit all
- Needs of users/usages differ
- especially in relation to their sophistication
and the need for individual level data - Data sets vary especially in relation to
sensitivity of content and possibility of
disclosure - Particular challenges are posed by
- Integrated, longitudinal data
- Qualitative data
- Administrative data
- Cross-national data
18Shared resources?
- Centralisation v. disseminated model
- Specialised services v. generic
- Delivering data remotely v. safe havens
19Partnership - with data intermediaries
- for both technical work and advocacy partnership
across the data archiving, data librarian,
statistical and research communities is to be
encouraged - Preservation
- Metadata and documentation
- Providing access
- Keeping records
- Running user training
20Preservation is essential
- Having collected data at some cost to the
taxpayer, it behoves us to manage them well. - Alongside dissemination, this entails data
preservation. - Due to poor data management, human error as well
as technical change and inadequate use of
technology, many data sets are no longer
readable. - Thus all that remains of this important legacy
are the, often quite superficial, reports or
papers that were produced at the time. - To this extent an important part of our heritage
is lost and we are severely limited in our
analysis of change.
21- Long term preservation of electronic material is
not a straightforward task especially with data
sets which have embedded software - It can be hard to persuade financial authorities
to spend money on the preservation of data for
historians and researchers of the future, when
there are so many pressing problems today.
22Partnerships- with government data agencies
- to broaden data use and reuse
- to foster diversity and deepen the quality of
data analysis thereby extracting more information
from the data - to add value to data by bringing subject-matter
knowledge to data analysis - to improve data quality (Data analysts can and
often do detect errors in data and when they
provide feedback to statistical agencies, this
can lead to improvements in future data
collection.)
23Such agencies aim to graduate from being data
producers to generators of information and
knowledge
- attention to data collection at expense of
generation of information and knowledge - collection costly and difficult
- importance of quality of data
- mountains of data insufficiently processed and
analysed - most people not adept at understanding data
- important for government agencies to get involved
in interpretation and use of information
24- It is the responsibility of official agencies to
ensure that the widest possible use is made of
data consistent of course with the legal
constraints and ethical undertakings. - Partnership with Universities is a key way of
enabling them to deliver on this responsibility.
25Case study building the secondary uses services
National Health Service in Englandindividual
patient care records
- Conducting audits of clinical practice
- Surveillance of infectious diseases
- Management of the health system
- Monitor equity of access and provision
- Evidence-based health policy
- Providing better information to the general
public - Improving the quality and safety of care
26Aim of SUS to promote the widest possible
informed use of the data whilst maintaining trust
in the system
- Hierarchy of data access consistent with ensuring
lowest risk of patient identification - Need to know
- Role of honest brokers and safe havens
- Development of virtual safe havens
27 Information governance of Secondary Uses
Service
- aggregate data widely available
- default anonymised
- - or pseudonymised
- if identifiers needed consent should be obtained
- full justification in terms of benefits to be
made for exceptions - exceptions assessed by transparent, equitable,
replicable and open process involving patients
representatives - requirement for safety and security of
information (ie accountability)
28Partnerships internationally
- Data archives
- Cochrane collaboration
- Campbell collaboration
- National Library of Health
- Communities of practice
- Principle of reciprocity
29(No Transcript)
30(No Transcript)
31(No Transcript)
32Results of a meta-analysis
- Collation of the results of many studies
contradict this advice - Extract from publicity prepared for the UK
Reduce the Risk Campaign (early 1990s) - The risk of cot death is reduced if babies are
not put on the tummy to sleep. Place your baby on
the back to sleep. .Healthy babies placed on
their backs are not more likely to choke.
33(No Transcript)
34Iain Chalmers
- No doubt like millions of his other readers, I
passed on and acted on this apparently rational
and authoritative advice. - We now know that the advice promulgated so
successfully in Spock's book led to thousands, if
not tens of thousands, of avoidable cot deaths. - (Letter to BMJ)
35Communities of practice
- International social survey programme
- CROP - the Comparative Research Programme on
Poverty whose major aim is to produce sound and
reliable knowledge, which can serve as a basis
for poverty reduction - RENCORE - encourage and enhance comparative
empirical research of individual, national and
institutional level data from the states of
western, central and eastern Europe - Cleveland conference on education research
- African Programme on Rethinking Development
Economics
36- Concluding remarks
- Social scientists and humanities researchers
are involved in the creation of a diverse range
of datasets, many of which are unique, rich in
information content and incapable of replication.
- Sharing allows scientists to extend the value
of these datasets through new, high quality,
ethical research and exploitation. - It also reduces unnecessary duplication of
data collection. - Building preservation systematically into
routine data management is part of good research
practice it strengthens quality, enables
replication and audit, and provides a sound basis
for data sharing. -
37- Research data grow in value the more they are
used, unlike most commodities which are
diminished with use.