Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme - PowerPoint PPT Presentation

Loading...

PPT – Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme PowerPoint presentation | free to view - id: 45e2b9-MjIzM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme

Description:

Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 31
Provided by: Jane1252
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme


1
Data harmonisation and the value of cross-cohort
analysis an overview of the new CLOSER
programme
Sub-brand to go here
  • Jane Elliott
  • Director of the Centre for Longitudinal Studies
    and Director of CLOSER
  • J.Elliott_at_ioe.ac.uk

2
Summary
  • A brief overview of CLOSER
  • Early progress on harmonisation work packages
  • biological structure
  • Socioeconomic status and qualifications
  • Uniform Search Platform
  • Contextual database
  • Benefits of cross cohort analysis

3
Cohorts and Longitudinal Studies Enhancement
Resources CLOSER
  • Nine Longitudinal Studies
  • Hertfordshire Cohort Study
  • 1946 British Birth Cohort
  • 1958 British Birth Cohort
  • 1970 British Birth Cohort
  • ALSPAC Avon Longitudinal Study of Parents and
    Children
  • Millennium Cohort Study
  • Southampton Womens Study
  • Life Study
  • Understanding Society
  • Funded by ESRC and MRC

4
Objectives timetable
  • Maximise the use, value and impact of data
    collected through a portfolio of key UK
    longitudinal studies
  • Stimulate interdisciplinary research across major
    longitudinal studies
  • Provide common resources for research
  • Assist with training and development
  • Share information and expertise between study
    teams
  • 1st October 2012 30th September 2017

5
Work streams
  • 4 work packages on data harmonisation
  • 3 work packages on data linkage
  • Core work on
  • Impact Lead by the British Library
  • Training and Capacity Building
  • Uniform Search platform
  • Leadership team contributing to strategic
    planning, sharing of best practice, funders
    strategies
  • See our website www.CLOSER.ac.uk for further
    information
  • Twitter _at_CLOSER_UK

6
Leadership team
7
Vision for the USP
  • Portal to discovery of hundreds of thousands of
    variables, questions and data collection
    instruments across the nine longitudinal studies
  • covering survey and biomedical data collection
  • promoting CLOSER harmonisation work
  • state-of-the-art searching tool
  • focus on improving visibility of associations
    between (currently) disparate metadata items
  • shared subject/topic classification
  • We should remember that this is massively
    ambitious something that matches or surpasses
    the best multi-study metadata repository out
    there
  • RAND Survey Meta Data Repository covering the HRS
  • family of studies https//mmicdata.rand.org/
    megametadata/

8
Why do it?
  • Benefits to users
  • single resource discovery portal replacing a
    fractured resource discovery landscape
  • lowers barriers to conducting cross-cohort
    analysis
  • increased visibility of cohort data and resources
  • Benefits to data managers
  • standardised metadata management workflows
    currently curated in isolation
  • workflows in place for future joiners
  • Benefits to Principal Investigators/survey
    commissioners
  • make prospective harmonisation easier
  • promotion and re-use of tested questions and
    instruments

9
Assumptions, constraints
  • Not a data repository
  • Not a major software development project
  • major is for metadata creation/enhancement
  • DDI-L agreed as standard for metadata exchange
  • covers subject areas (bio and soc science) and
    data collection methods (hard instrument and
    survey)
  • designed for marking-up longitudinal/repeated
    metadata items
  • Colectica Designer selected as preferred metadata
    ingest/editing software

10
Challenges
  • Legacy metadata
  • elderly and decrepit!
  • not always designed for equivalence within a
    study, much less across studies
  • differing or non-existent naming conventions
  • substantial (manual) effort required to establish
    equivalences and level of equivalence
  • Metadata managed by five or six different units
    different formats, workflows, vocabularies
  • Relative lack of familiarity with DDI-L
  • uneven knowledge across study units
  •  

11
Metadata State of play
  • gt200k variables
  • c.150 data collections
  • CAI, PAPI, nurse visit, clinic-based protocol,
    biosamples, etc.
  • c.85 validated survey instruments
  • GHQ, AUDIT, Malaise Inventory, etc.
  • c.10 instruments used in gt1 study
  • c.20 validated clinical measures
  • blood pressure, bone density, lung function, etc.
  • range of instruments used
  • c.15 cognitive or physical tests

12
How to do it?
  • USP will be a web interface that sits on top of a
    central repository fed by metadata created and
    delivered both by the individual study units and
    the CLOSER core
  • Study units continue to curate metadata as they
    see fit but not in conflict with proposed USP
    metadata profile
  • Substantial metadata creation and enhancement to
    be undertaken by the study units inputting
    historical questionnaires mapping between data
    items and data collection
  • CLOSER core responsible for identifying common
    (cross-study) variable and question schemes,
    allowing studies to reference these and also any
    agreed controlled vocabularies (concept, life
    stage etc.)

13
Contextual database - rationale
  • Life course approach stresses the importance of
    the connection between individuals and the
    historical and socioeconomic context in which
    these individuals lived
  • But some research based on cohort studies pays
    little attention to the social, economic or
    historical context that helps shape the lives of
    individuals
  • Some data on social change and social context
    will come from the studies themselves (e.g.
    Breast feeding)
  • Aim of the contextual database is to provide a
    central source of key indicators over time likely
    to be of direct relevance to cohort research

14
Source Changing Britain Changing Lives Three
generations at the turn of the century Table 8.3
(Wadsworth et al)
15
Proportion of women in paid employment, by age
and cohort
Source Jenny Neuburger - Paper presented at CLS
June 2008
16
Contextual database - elements
Economic indicators
Qualifications Education
Demography
Health health behaviour
Inequality poverty
Labour market and unemployment
Housing
Digital economy
Also want to include policy narratives and a
bibliography
17
Work package 1 Biological structure and
function Two years March 2013- February
2015 William Johnson Rebecca Hardy MRC Unit for
Lifelong Health and Ageing
18
Research priority Body size - because of the
obesity epidemic and the long term consequences
of adiposity on health well-being Need for
harmonisation
Body size data from a single study Harmonised body size data across multiple studies
Restricted N and power Larger N and greater power
Results may not be generalizable Replication of results and quantification of heterogeneity
Modelling capability dependent on study data Modelling capability increased by pooling data
Age and period effects confounded Decompose age and period effects
No cohort effects (secular trend) Investigate cohort effects
19
First papers Compare body size distributions
and mean trajectories, across different phases of
the life course, between cohorts Investigate
how SEP inequalities in body size trajectories,
across different phases of the life course,
differ between cohorts
Li L et al. Am J Epidemiol. 2008
Howe LD et al. JECH. 2012
20
Studies
21
Data
22
Challenges
Between studies Data covering different age
ranges Data increasingly positively skewed in
more recent studies Within individuals Differen
t number of observations at different exact
ages Different precision of data Within and
between individuals Both measured and
self-report data
23
(No Transcript)
24
What we are aiming to achieve
1) Demonstration research project focussing on
socioeconomic differences in growth and obesity
across cohorts 2) A harmonised dataset, with
accompanying documentation for other users
25
Socio-economic data harmonisation work package
  • Claire Crawford, Brian Dodgeon, Tim Morris, Sam
    Parsons, Anna Vignoles (lead)
  • Two years April 2013- March 2015

26
What measures?
  • Measures to be harmonised are
  • parental education level
  • cohort member level of education
  • socio-economic (occupation) status
  • household equivalised income
  • home ownership
  • Cohorts NSHD NCDS BCS ALSPAC MCS

27
Priority Measures agreed
  • Highest qualification (vocational/academic
    separately) held at every age
  • Age left full time education
  • Whether the person went past compulsory schooling
  • Average GCSE score or equivalent
  • GCSE Grades in mathematics and English (not for
    all cohorts)
  • For cohort member parents - age left full time
    education and highest qualification at birth of
    CM
  • Grandparents age left school

28
Measures available by cohort
NSHD NCDS BCS70 ALSPAC MCS
Cohort Member Highest qualification (each age) ? ? ? ?
Age left full-time education ? ? ? ?
Post compulsory education ? ? ? ?
Maths grade Olevel, CSE, GCSE ? ? ? ?
English grade Olevel, CSE, GCSE ? ? ? ?
Exam total score Olevel, CSE, GCSE ? ? ? ?
Parent Age left full-time education ? ? ? ? ?
Highest qualification birth or nearest data collection point ? ? ? ?
Grandparent Age left full-time education ?
29
The value of cross-cohort analysis
  1. A meta-narrative of societal change over time
  2. Creating a synthetic life course understanding
    life time trajectories
  3. Investigate cohort effects - examining the impact
    of different social and policy contexts
  4. Replication of results checking the robustness
    of models
  5. Larger N and greater power
  6. Decompose age and period effects

30
Lifetime systolic blood pressure trajectories and
velocities (predicted means)
Men
Women
Wills et al. PLOS Med, 2011
About PowerShow.com