Provision of access to data for secondary analysis - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Provision of access to data for secondary analysis

Description:

Internet delivery has broadened the potential role of data services ... data access services themselves may be virtual centres, distributed across multiple sites ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 50
Provided by: melanie142
Category:

less

Transcript and Presenter's Notes

Title: Provision of access to data for secondary analysis


1
Provision of access to data for secondary analysis
  • Louise Corti, Jo Wathan and Keith Cole
  • Economic and Social Data ServiceE-society
    Programme
  • March 07

2
Overview of chapter
  • Why access secondary quantitative data?
  • brief overview of the potential of secondary data
  • Finding, accessing and obtaining secondary data
  • describes the ESDS distributed national on-line
    data service designed
  • Case studies the UK Economic and Social Data
    Service
  • practical exemplars of how data can be re-used

3
Why access secondary quantitative data?
  • Quantitative methods have an important
    longstanding place in social research. Can
    identify
  • typical characteristics and background
    description
  • the amount of variation within a population of
    interest
  • differences between groups
  • how possible explanatory factors can account for
    differences
  • predictions and forecasts
  • Kinds of data
  • Micro data resemble the sort of data obtained
    from a survey
  • Longitudinal data follow the same individuals (or
    other study unit) over time
  • Macro or aggregate data contain records for much
    larger units e.g countries or regions

4
Secondary analysis
  • reduces respondent burden
  • enables data linkage and the creation of new
    datasets
  • informs policy disputes about the interpretation
    of analyses
  • provides transparency within research
  • enables methodologists to learn from each other
  • allows students to engage with real data, to
    obtain results which relate to the real world and
    to tackle real problems of data management
    (substantive social science and research methods
    teaching)

5
Data expensive
  • Collecting good quality, reliable, representative
    data is expensive and technically demanding
  • In 2001/2 the British General Household Survey
    (GHS) sample included all individuals in 8,989
    households and cost 1.43 million
  • In 2001, the American Community Survey collected
    data from nearly 400,000 interviews in the year
    at an estimated to cost 131 million

6
Data historical - enabling trend analysis
  • In the UK the General Household Surveys (GHS) and
    Labour Force Surveys (LFS) date back to 1971 and
    1973
  • In the United States, the General Social Survey
    series dates back to 1972 and Current Population
    Survey data dating back to 1964 (ICPSR)
  • Longitudinal studies
  • US Panel Study of Income Dynamics, started in
    1968
  • German Socioeconomic Panel in 1984
  • British Household Panel Study in 1991

7
Finding, accessing and obtaining secondary data
  • The development of secondary analysis has
    depended on the development and growth of social
    science data archives
  • Inter-University Consortium in Political and
    Social Research
  • (ICSPR)
  • the UK Data Archive (UKDA)
  • Zentralarchiv für Empirische Sozialforschung
    (ZA)
  • Norwegian Social Science Data Services (NSD)
  • Now networked
  • Council of European Social Service Data Archives
    (CESSDA)
  • International Federation of Data Organisations
    (IFDO)

8
Changing provision
  • early data archives predated e-social science,
    and the internet as we know it.by decades
  • the gradual development of online data archives
    and dissemination services has varied across the
    world
  • the more mature archives have reached the point
    at which most users will interact with the data
    service wholly through the internet
  • Internet delivery has broadened the potential
    role of data services

9
Functions of the modern archives role
  • acquire - nurture, cajol, plead, evaluate
  • prepare, document and enhance data check and
    add context
  • store data safely for ever back up, store and
    migrate
  • distribute data - download, explore online
  • provide support for their use - promote, write,
    teach
  • improve resource discovery and data access - RD

10
Acquisition and checking
  • data archives typically select and evaluate
    potential data collections against criteria
    designed to ensure that they are appropriate for
    re-use
  • assessed for their
  • research value, quality, degree of fit to meet
    existing collection
  • data are checked and validated by the receiving
    archive by
  • examining the data values or text validation
    and consistency checking
  • ensuring that, where requested, the data are
    anonymous (where required)
  • checking for Intellectual property and
    commericial ownership rights in the data

11
Documentation and metadata
  • Documentation which enables users to understand
    the origins of the data and to correctly
    interpret outputs
  • user guides created - how the data were collected
  • questionnairess, questionnaires, code books,
    interviewer instructions, technical reports,
    original and subsequent publications and outputs
  • catalogue record, and full variable and value
    labels (standard used - DDI)
  • a few archives work closely with data creators in
    the early stages to ensure that good data
    management practices are adhered to

12
Online dissemination
  • first steps towards online data archiving and
    dissemination came with the development of
    archive websites
  • increasingly sophisticated data catalogues
  • nowdays, searchable online data catalogues
    enables users to search and browse collections
  • and view documentation freely online
  • online registration account management, data
    download
  • access data via a web browser

13
New generation data services
  • online data exploration with tools
  • Survey Documentation and Analysis (SDA), Nesstar,
    Beyond 2020, interactive (GIS) mapping tools
  • increasingly necessary to link to data sites,
    offsite support and related datasets as the
    complexity of the data infrastructure increases
  • data services may be distributed services
  • data need not be co-located
  • social science increasingly looking to the
    potential of grid technologies

14
Economic and Social Data Service (ESDS)
  • new generation distributed data service that
    provides a seamless integrated service
  • offers enhanced support for the secondary use of
    key economic and social data across the research,
    learning and teaching communities
  • value-added service goes far beyond the original
    role of traditional data archives as data storage
    and dissemination houses
  • brings together centres of expertise in data
    creation, dissemination, preservation and use

15
UK Data archiving history
  • Data Archive established in 1968 (as Data Bank)
  • funded by (then) SSRC to provide a service to UK
    HE sector
  • initial focus on academic surveys then government
    survey data
  • new distributed service established 1 January
    2003 as the ESDS
  • core arching service plus four value added
    specialist services

16
Types of data
  • ESDS acquires mixed data types and formats
  • social surveys
  • aggregate data
  • administrative data
  • textual data
  • images
  • audio visual data
  • UKDA hosts specialist Qualidata unit, Census
    unit, and History Data Service
  • since 2005 designated as Place of Deposit by
    The National Archives (TNA)
  • New data types
  • Online surveys, interviews and focus groups
  • social transaction data
  • Linked admin data
  • blogs and so on

17
Who produces the social science data held by ESDS?
  • government agencies
  • increasing tendency for government agencies to
    contract out survey work to private sector
    (NatCen)
  • academic sector
  • private sector
  • local Government
  • Research Council funded
  • ESRC, MRC, NERC, AHRB, Wellcome, Leverhulme
  • increasing number of large digitisation projects
  • JISC, NOF
  • access to international data via links with other
    data archives worldwide
  • IGOs

18
Core Service
  • run by UKDA
  • acquiring, processing, preserving and
    disseminating data
  • data creation and deposit support
  • central registration service operating across the
    ESDS
  • central 'first stop' help desk service
  • front line user support
  • cataloguing and describing data
  • maintaining and developing web presence
  • publicity and training

19
Specialist data services
  • ESDS Government
  • ESDS International
  • ESDS Longitudinal
  • ESDS Qualidata
  • Greater emphasis on
  • value-added data and documentation
  • enhanced resource discovery
  • improved delivery services
  • support and training for the secondary use of
    data for research, learning and teaching
  • outreach and promotion

20
Facts and figures UKDA
  • 4,000 datasets in the collection
  • 350 new datasets and editions added each year
  • 30,000 registered users
  • 15,000 datasets distributed worldwide p.a.
  • 100,000 online sessions p.a.
  • 15,000,000 web hits p.a.

21
Data In
  • Data acquisition
  • offers and proactive scoping of data
  • formal data evaluation via committee
  • Data ingest
  • checking, verifying
  • converting, formatting, processing
  • documenting and contextualising
  • Data preservation
  • long-term data management
  • Preservation Policy

22
Online exploration
  • Online data browsing, including
  • simple data analysis, visualisation, downloading
    and subsetting via Nesstar
  • ESDS Government Vital Statistics online
  • International macro data via Beyond 20/20 and
    visualisation interface
  • ESDS Qualidata Online interview transcripts
  • Census data services

23
1 Using Government microdata to explore health
  • UK is fortunate in its wealth of available major
    cross-sectional surveys
  • government surveys rich resources
  • large micro data files with a large number of
    detailed variables
  • series of repeated cross sections which enable
    comparisons over time
  • nationally representative United Kingdom or
    constituent countries
  • sample survey data, which may involve a degree of
    complexity - structure ((hierarchical) and
    sampling strategy
  • data holdings and documentation are extensive

24
1 Government data
  • General Household Survey/Continuous Household
    Survey (NI)
  • Labour Force Survey/NI LFS
  • Health Survey for England/Wales/Scotland
  • Family Expenditure Survey/NI FES
  • British/Scottish Crime Survey
  • Family Resources Survey
  • National Food Survey/Expenditure and Food Survey
  • ONS Omnibus Survey
  • Survey of English Housing
  • British Social Attitudes/Scottish Social
    Attitudes/Young Peoples Social Attitudes/NI Life
    Times
  • National Travel Survey
  • Time Use Survey
  • Vital Statistics for England and Wales

25
1 Investigating smoking
  • ESDS high web presence
  • Google search ESDS pages
  • ESDS catalogue advanced searching on key words
    study and variable level information
  • browse by subject
  • major studies lists
  • Government series pages
  • theme guides
  • publications database
  • software and analysis
  • guides

26
1 Accessing Data
  • register with ESDS, using the online
    authentication system ATHENS (currently moving
    towards a new system Shibboleth which provides a
    greater degree of differentiation in user types)
  • ESDS Users must specify the purpose for which
    they will use each data set
  • registered users can choose to download the whole
    file (typically SPSS, Stata and tab delimited)
    or undertake further analyses, including
    graphing, within Nesstar
  • more stringent conditions apply to more sensitive
    data such as detailed microdata with detailed
    geography (Special Licence)

27
1 Online exploration
  • Nesstar system - allows unregistered users to
    view metadata and univariate distributions online
  • based on the DDI standard to describe data
  • permits users to specify subsets and download in
    a wide range of formats
  • ability to quickly browse data useful where
    particular subsets of cases in the data are of
    interest
  • GHS to undertake an analysis of people who would
    like to give up smoking - need to know whether
    there were a sufficiently large number of people
    in the dataset who smoke but would like to give up

28
(No Transcript)
29
1 What can a user do with the data?
  • multivariate analysis that look within households
    and analyses that look at change over time
  • look at relationships between multiple individual
    characteristics
  • depth of many questionnaires, allows users to
    explore the validity of existing means of
    operationalising concepts, or to use new ones

30
2 Analysing longitudinal health data
  • true cohort analysis requires information about
    the same individuals over time
  • explore the chronological ordering of behaviours
    or characteristics
  • ESDS Longitudinal specializes in supporting five
    major UK-based longitudinal data sets
  • British Household Panel Survey (BHPS)
  • 1970 British Cohort Study (BCS70)
  • National Child Development Study (NCDS)
  • Millennium Cohort Study (MCS)
  • English Longitudinal Study of Ageing (ELSA)
  • BHPS is a household hierarchical dataset -
    interviews all members of the households of panel
    members. Can explore household factors

31
3 Providing a common user interface to
international macro data to support comparative
research
  • researchers now require access to the key
    international evidence bases in order to
    contribute and comment on trans-national policy
    responses to global issues
  • ESDS International was established to address
    these needs through the provision of free
    web-based access to a portfolio of authoritative,
    high quality international databanks
  • high quality, regularly updated time series
    databanks - contain huge range of macro-economic
    and social indicators aggregated to national or
    regional level worldwide

32
  • datasets supported produced by a number of key
    International Governmental Organisations (IGOs)
    such as the International Monetary Fund, the
    United Nations, the World Bank, the Organisation
    for Economic Cooperation and Development and the
    International Energy Agency
  • access via a common user interface to all the
    international aggregate datasets which makes it
    easy for users to obtain access to data
  • beyond 20/20 Web Data Server (WDS) to display,
    subset, visualize, chart and download data
  • Iraqi exports to the rest of the world 1980-2005
    (Source International Monetary Fund (IMF),
    Direction of Trade Statistics (DOTS) July 2006)

33
  • CommonGIS used to build a web-based data
    exploration interface to geographically
    referenced international data
  • CommonGIS provides standard GIS functionality and
    can be used as a tool for visualisation and
    exploratory analysis based on geographically
    referenced statistical data
  • CommonGIS visualization shows the relationship
    between birth and death rates in European
    countries in 2005 to CIA Word Factbook
  • the cross classification map shows those
    countries, such as Moldova, which have high birth
    and death rates

34
4 Grid-enabling quantitative datasets to support
more complex forms of analysis
  • Data Grids facilitate unimpeded and integrated
    use of distributed, heterogeneous, autonomous
    data resources
  • grid enabling a dataset creates new opportunities
    for its use
  • enables users to integrate it with other datasets
  • makes it possible to analyse the dataset using
    techniques that require the kind of computational
    power that it is only feasible using the Grid
    (e.g. more complex models, more data points).
  • standardisation of procedures and mechanisms used
    to access and update the dataset, increase its
    shareability
  • automated analyses (i.e. analyses can be re-run
    automatically when databases are updated)

35
4 ConvertGrid Key Objectives
  • a practical demonstration of how the Grid can be
    used to facilitate data integration and overcome
    a major barrier to research use of multiple
    datasets
  • demonstrates how to build a social science Data
    Grid by grid enabling a number of key
    geo-referenced socio-economic data sources
  • uses Grid technologies to extend the
    functionality of an existing web based data
    service (i.e. Convert) to exploit the existence
    of a Data Grid
  • demonstrates how Grid technologies can automate
    complex workflows and enhance the capacity to
    address substantive social science research
    questions
  • builds a user interface to a Grid based service
    which is suitable for student/teaching use

36
4 ConvertGrid The Research Context
  • many research questions require the combination
    of a data from multiple geo-referenced datasets
  • E.g. Linking post coded data to census geography
  • conversion of data relating to different
    geographies to a common target geography is
  • complex time consuming task
  • requires a range of data handling/processing
    skills
  • the data conversion process will require users to
    perform the following generic tasks
  • extract and download data in different formats
    from a number of databases using different
    interfaces
  • convert each dataset to the desired target
    geography using geographical conversion tables
  • combine the converted sets into a single dataset
    for analysis
  • these generic tasks can be automated!

37
4 ConvertGrid A Worked Example
  • what factors explain spatial variations in
    participation rates in higher education
  • study target geography 1991 Census Ward
  • data required
  • 1991 Census
  • total persons aged 16-17 18-19 (1991 Census
    Ward)
  • Neighbourhood Statistics
  • number of applicants aged under 20 entering
    university (1998 Electoral Ward)
  • Experian
  • average house price sales Quarter 2 2000 to
    Quarter 1 2001 (1999 Postcode Sectors)

38
4 ConvertGrid Data Visualisation Interface
High average house price sales but low
participation rates
Low average house price sales but high
participation rates
Ten minutes from start to finish
  • relationship between average house price sales
    (Experian) and percentage of 16-19 year olds
    entering university (Neighbourhood Statistics
    Census aggregate statistics)

39
5 Mixed Methods Data
  • there is an increasing interest in and
    recognition of the value of re-using qualitative
    data
  • in the past few years there has been a
    significant move to utilise mixed methods
    strategies in research
  • ESDS has seen the deposit of multiple methods
    datasets combining quantitative and qualitative
    datasets
  • processed and supported by dedicated unit - ESDS
    Qualidata

40
5 ESDS Qualidata
  • range of qualitative datasets, hosted by the UK
    Data Archive
  • data from National Research Council (ESRC)
    individual and programme research grant awards
    (Data Policy)
  • data from classic social science studies
  • other funders/sources
  • focus on DIGITAL Collections, but also facilitate
    paper-based archiving

41
5 Types of qualitative data
  • diverse data types in-depth interviews
    semi-structured interviews focus groups oral
    histories mixed methods data open-ended survey
    questions case notes/records of meetings
    diaries/ research diaries
  • multimedia audio, video, photos and text (most
    common is interview transcriptions)
  • formats digital, paper, analogue audio-visual
  • data structures - differ across different
    document types

42
5 Classic study datasets
  • Townsend Poverty, old age and Katherine
    Buildings
  • Thompson oral history and Edwardians
  • Goldthorpe et al - The Affluent Worker
  • Jackson and Marsden Education and the Working
    class
  • National Social Policy and Social Change Archive

43
5 Online access to data
44
5 schoolchildrens attitudes towards risk-taking
and health
  • typical example of a mixed methods study might be
    undertaking a sample survey and conducting
    ethnographic fieldwork (eg observation and
    in-depth interviews) based on the survey sample
    or on other cases
  • Incidents and the Health-related Behaviour of
    Schoolchildren, 1997, M. Denscombe
  • Studying critical incidents in the life of young
    people which act as crucial flashpoints in the
    generation of attitudes towards health-related
    behaviour

45
5 schoolchildrens attitudes towards risk-taking
and health
  • the project used a mixture of quantitative and
    qualitative methodology
  • survey of 1648 children
  • eleven transcripts of focus group interviews
  • eight transcripts of interviews - two students
    together
  • Denscombe in-depth interviews also cover a lot of
    detail about the role and pressure of exams at
    the age of 15/16, and future life ambitions

46
Secondary use?
  • qualitative aspect can offer a more detailed
    explanation of a quantitative analysis and
    possibly enable a more complex model to be built
  • sequencing of data collection methods or the
    selection of cases needs to be carefully
    considered in re-use
  • in larger data collections, the data types may
    have been collected by different teams with
    differing methodological agendas - researchers
    tend to prioritise one method because of
    familiarity with the data type and analytic
    methods
  • possibility that each method could show
    conflicting findings - re-users should be aware
    how they report findings and be reflexive about
    how the secondary data were selected, confronted
    and analysed

47
Collaboration - UK
  • Government agencies work closely
  • Research Councils on formal data sharing policies
  • Research Centres and Programmes collecting data
  • Other funding agencies e.g JISC on technical
    issues
  • authentication, digitisation, TL resources
  • TNA on records management and preservation
    practise
  • E-science on grid enabled data issues, ontologies
  • Research Methods centres on data quality and
    secondary analysis

48
Conclusion
  • secondary analysis permits a range of valuable
    analyses to be undertaken quickly, effectively,
    transparently and with minimal respondent burden
  • digital formats have enable users to easily
    consult full documentation, explore and analyse
    data online
  • and to make linkages between appropriate
    resources in a context of an increasingly complex
    data infrastructure
  • data access services themselves may be virtual
    centres, distributed across multiple sites
  • anticipate that grid developments will provide
    increased scope for harmonising access to
    different data types

49
  • Contact
  • www.esds.ac.uk
  • help_at_esds.ac.uk
  • corti_at_essex.ac.uk
  • 01206 872145
Write a Comment
User Comments (0)
About PowerShow.com