ALSPAC Record Linkage to External Databases - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

ALSPAC Record Linkage to External Databases

Description:

ALSPAC Record Linkage to External Databases. Andy Boyd. ALSPAC, Social Medicine ... Processes involved in linkage projects. Find the contact ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 26
Provided by: cda61
Category:

less

Transcript and Presenter's Notes

Title: ALSPAC Record Linkage to External Databases


1
ALSPAC Record Linkage to External Databases
  • Andy Boyd
  • ALSPAC, Social Medicine
  • University of Bristol

2
The data sources and processes involved
  • The processes involved in linkage projects
  • Overview of ALSPACs existing data linkage
    projects
  • National Pupil DB Geographic linkage as
    examples
  • Data Availability Linkage Problems

3
Processes involved in linkage projects
  • Find the contact
  • Ethics informed consent and/or Section 60
    support
  • Data Security
  • HM Revenue Customs
  • Creating a linkage data set
  • Data QC checks
  • Identifiers
  • Formats and data normalisation

4
Processes involved in linkage projects cont
  • Who links the data?
  • one of the two parties or an independent 3rd
    party
  • Processing the data
  • Anonymity vs sufficient data for research
  • Ages in Months Years
  • First Half of Postcode
  • Recode unusual outcomes into wider categories

5
Major External Databases
  • Health related datasets
  • Office National Statistics (ONS) Tracing
  • Cancer Registry GRO
  • NSTS (NHS Strategic Tracing Service)
  • Electronic antenatal birth records
  • PCT data (Exeter DB, My Quest)
  • Non health Datasets
  • National Pupil Database (DCSF, DIUS, UCAS)
  • ALSPAC Schools Collection
  • G.I.S Datasets (Geographic Information Systems)
  • DWP
  • Home Office Linkage currently being
    investigated

6
National Pupil Database
  • Maintained by Dept. Children Schools Families
  • Covers all state maintained schools in England
  • Annual / now 3 time points, census
  • Data at school and pupil level
  • Key data include
  • Exam results
  • Attendance
  • Pupil demographics (including address, ethnicity,
    Free School Meals, Special Educational Needs)
  • School Characteristics (pupil numbers, staff
    pupil ratios)

7
NPD How we did it
  • 3rd party conducted match The Fischer Trust
    independent charity
  • Provided data on the eligible cohort
  • ALSPAC DCSF provided the following linkage
    variables
  • Surname, Forename, Familiar name
  • Date of Birth, Gender
  • Postcode, Previous Postcode Postcode accuracy
    flag
  • Current School (from ALSPAC data collection)

8
NPD - Details
  • ALSPAC Cohort covers three academic years
  • We hold data on all YPs across these three years
    approx. 600,000 cases a year
  • Figures based on eligible cohort
  • 17671 linked (86)
  • Majority of unlinked cases thought to be in
    private education (will be in NPD from KS4)

9
NPD - Advantages
  • Covers all English state schools
  • Good match rate for eligible cohort
  • Regular updates
  • Access to confidential variables
  • PLUG workshops provide good opportunities to
    discuss data and solutions to problems

10
NPD - Problems
  • Central ID QC issues (a few duplicates)
  • Only applies to English state maintained until
    KS4, then re-link extra costs and bias until
    then
  • Data collection method/standards varies from
    school to school
  • Documentation (lack of)
  • Size of raw data, time consuming to process
  • Fixed time point census, doesnt record all
    school movements (especially annual census)

11
G.I.S Data
  • Spatial data held at many geographic levels
  • Geographies range in scale from 0.1 meters to
    regional/national data
  • Tied together via postcode or grid reference as
    central ID
  • Key data include
  • NSPD ( was All Fields Postcode Directory) - geo
    linking database
  • Deprivation Socio Economic indices (IMD,
    Townsend, Acorn)
  • Census data

12
G.I.S How we link cases to data
  • Master file of Postcodes
  • Postcodes linked to grid reference
  • Grid references of various scales
  • PCs/GridRef mapped to
  • Electoral geographies
  • Census geographies
  • Ethics
  • We dont generally identify residence at PC or
    equivalent level

Ordinance Survey The National Grid
13
G.I.S - Details
  • 50,000 ALSPAC address points, associated with a
    date range which can then be linked to ALSPAC
    data collection
  • Linkage examples
  • Indices of multiple deprivation
  • Travel from home to
  • school patterns
  • Cancer rates and residential
  • distance from power lines

The geographic relation between household income
and polluting factories FoE 1999
14
G.I.S advantages
  • Many data sets in public domain (or available
    through athens)
  • Many geographies are broad enough to not identify
    cohort members
  • National picture (some exclude Scotland)

15
G.I.S Problems
  • Shifting geographies across time points
  • Royal Mail change postcodes
  • Postcode not precise enough in some cases
  • Postcode boundaries are not contiguous with other
    geographic boundaries

16
Accuracy issues with analysis at postcode level
Address level
Postcode level
17
Accuracy issues with analysis at postcode level
Address level
Postcode level
18
Accuracy issues with analysis at postcode level
Address level
Postcode level
19
Data Availability Linkage Problems
  • Cohort Data
  • GIS Data
  • GIS Ethics

20
Linkage problems with the cohort data
  • Missing data
  • Especially problematic for the cases who didnt
    enrol in the original recruitment
  • Partners
  • 69 cases with no known birth outcome
  • Gaps in the address data
  • However
  • ONS matched 99.7 mothers, so we have their old
    new NHS numbers and cleaned data (original
    recruitment cases only)

21
Linkage problems we encounter
  • Many of the early records are paper based or in
    varied formats.
  • Quality Control ONS data returned to us with 37
    incorrect ALSPAC Ids
  • Unknown methods No documentation from ONS or
    Fischer regarding the quality of the match
  • Lack of uniqueness in
  • the ID (either duplicates
  • or multiple IDs per case)

22
GIS Data Availability
  • Collected as administrative resource
  • Not yet cleaned, documented and presented to
    usual ALSPAC standards
  • Initiatives under way to validate and fill gaps
    in record
  • Schools GIS data in the main not processed
  • Aim to build into standard ALSPAC resource

23
GIS Ethics
  • Postcode level or greater accuracy treated as a
    personal identifier
  • Research proposals to use these data need ALSPAC
    Law Ethics Approval
  • Broader geographical data can be released in
    normal manner
  • A two-stage process is used to collect and
    process precise data

24
GIS Ethics
  • Step 1 Postcodes (or full address) provided to
    researcher with unique collection ID with no
    other data attached
  • Step 2 Researcher attaches their data and
    returns file to ALSPAC
  • Step 3 ID converted to the appropriate
    collaborator ID, postcode data removed
  • Step 4 Requested ALSPAC data added to the file
    and data sent to the researcher

25
Andy BoydA.W.Boyd_at_Bristol.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com