Big Data in UK Biobank: Opportunities and Challenges Funders: Wellcome Trust and Medical Research Council, with Department of Health, Scottish - PowerPoint PPT Presentation

About This Presentation
Title:

Big Data in UK Biobank: Opportunities and Challenges Funders: Wellcome Trust and Medical Research Council, with Department of Health, Scottish

Description:

Big Data in UK Biobank: Opportunities and Challenges Funders: Wellcome Trust and Medical Research Council, with Department of Health, Scottish & Welsh Governments ... – PowerPoint PPT presentation

Number of Views:412
Avg rating:3.0/5.0
Slides: 43
Provided by: CTSU1
Category:

less

Transcript and Presenter's Notes

Title: Big Data in UK Biobank: Opportunities and Challenges Funders: Wellcome Trust and Medical Research Council, with Department of Health, Scottish


1
Big Data in UK BiobankOpportunities and
ChallengesFunders Wellcome Trust and Medical
Research Council,with Department of Health,
Scottish Welsh Governments, British
Heart Foundation and Diabetes UK
  • Rory Collins
  • UK Biobank Principal Investigator
  • BHF Professor of Medicine Epidemiology
  • Nuffield Department of Population Health
  • University of Oxford, UK

2
UK Biobank Prospective Cohort
  • 500,000 UK men and women aged 40-69 years when
    recruited and assessed during 2006-2010
  • Extensive baseline questions and measurements,
    with stored biological samples (and opportunities
    to add enhanced assessments in large subsets)
  • Repeat assessments over time in subsets of the
    participants to allow for sources of variation
  • General consent for follow-up through all health
    records and for all types of health research
  • Sufficiently large numbers of people developing
    different conditions to assess causes reliably

3
Need for prospective studies to be LARGE CHD
versus SBP for 5K vs 50K vs 500K people in the
Prospective Studies Collaboration (PSC)
500,000 people
50,000 people
5000 people
Age at risk
256
256
256
80-89
Age at risk
80-89
128
128
128
70-79
70-79
Age at risk
64
64
64
60-69
80-89
60-69
32
32
32
50-59
70-79
50-59
60-69
16
16
16
40-49
8
8
8
40-49
50-59
40-49
4
4
4
2
2
2
1
1
1
120
140
160
180
120
140
160
180
120
140
160
180
Usual SBP (mmHg)
Usual SBP (mmHg)
Usual SBP (mmHg)
4
Locations ofUK Biobank assessment centres around
the UK (with people recruited from urban and
rural areas)
5
UK Biobank 500,000 participants aged 40-69
recruited in 2007-10
Age 40-49 119,000
Age 50-59 168,000
Age 60-69 213,000
Gender Male 228,000
Gender Female 270,000
Deprivation More 92,000
Deprivation Average 166,000
Deprivation Less 241,000
Generalisability (not representativeness)
Heterogeneity of study population allows
associations with disease to be studied reliably
6
Production line baseline assessment
visit(improved throughput efficient staffing)
7
Baseline assessment Questionnaire content
  • Self-completion topics Median time
  • (minutes)
  • Socio-demographics 1.7
  • Ethnicity 0.1
  • Work-employment 1.4
  • Physical activity 4.4
  • Smoking (non-smokers) 0.5
  • (past/current smokers) 1.5
  • Diet (food frequency) 4.5
  • Alcohol 1.1
  • Sleep 1.2
  • Sun exposure 1.3
  • Environmental exposures 1.0
  • Early life factors 0.8
  • Family history of common diseases 1.6
  • Reproductive history screening (women) 2.4
  • (men) 0.8
  • Sexual history 0.4
  • General health 2.1

Interview topics Median time
(minutes) Medical history/medication
3.1 Occupation 0.4 Other 0.6 Total time
4.1
Subset of 200,000 participants repeated daily
diet diaries conducted via the internet Touchs
creen and interview questions (plus extra
enhancement questions) available at
www.ukbiobank.ac.uk
8
Baseline assessment Physical measurements (with
enhanced measures in large subsets)
  • All 500,000 participants
  • Blood pressure heart rate
  • Height (standing/seated)
  • Waist/hip circumference
  • Weight/impedance
  • Spirometry
  • Heel ultrasound
  • Subset 175,000 participants
  • Hearing test
  • Vascular reactivity
  • Subset 120,000 participants
  • Visual acuity, refractive index intraocular
    pressure
  • Subset 85,000 participants
  • Retinal images optical coherence tomograms
  • Fitness test ECG limb leads

9
UK Biobank different types of biological
sampleallowing a wide range of different assays
Sample collection tube Fractions collected Potential assays
Na EDTA Plasma Buffy coat Red cells Plasma proteome and metabonome Assays of genomic DNA Membrane lipids and heavy metals
Lithium Heparin (PST) Plasma Plasma proteome and metabonome (without haemolysis)
Silica clot accelerator (SST) Serum Serum proteome and metabonome (without haemolysis)
Acid citrate dextrose Whole blood Assays of DNA extracted from EBV immortalised cell lines (B-cell transcriptome)
EDTA Whole blood Standard haematological parameters
Tempus RNA stabilisation Whole blood with lysis reagent Blood transcriptome Representative transcriptomes of other tissues
Urine Urine Urine proteome and metabonome Gut microbiome
Saliva Mixed saliva sample Salivary proteome and metabonome Salivary microbiome (Mucosal proteome and metabonome)
10
Further enhancements of the phenotyping of UK
Biobank participants currently being conducted
  • Web-based assessments of diet completed

11
Web-based dietary assessment 24-hr recall
  • Design considerations
  • Easy and quick takes only 10-15 minutes
  • Automated data collection and coding
  • Repeatable (capturing seasonal variation)
  • Detailed enough to estimate nutrient intake
  • Over 200,000 participants completed the
    questionnaire at least once, and about 90,000 did
    so more than once

12
Future web-based assessments for exposures
  • Cognitive function
  • Repeat assessment of baseline measures
  • Broaden cognitive phenotyping with new measures
  • Complements enhanced cognitive function
    assessment that is planned for the imaging
    assessment visit
  • Occupational history
  • Information about all previous occupations (not
    just latest)
  • Greater detail on type of work and duration
  • Physical activity questionnaire (RPAQ)
  • Complement data from activity monitor

13
Further enhancements of the phenotyping of UK
Biobank participants currently being conducted
  • Web-based assessments of diet completed and next
    to be cognition/mental health (2014)
  • Wrist-worn accelerometers to be mailed to all
    participants who agree to wear one (2013-15)

14
UK Biobank wrist-worn accelerometer
  • 45 of participants agree to wear one
  • Willing participants sent device by mail
  • It is to be worn continuously for 7 days
  • Returned by mail and data downloaded
  • Device cleaned and sent to next participant
  • 100K participants from mid-2013 to mid-2015
    (50,000 complete data-sets already obtained)

15
Further enhancements of the phenotyping of UK
Biobank participants currently being conducted
  • Web-based assessments of diet completed and next
    to be cognition/mental health (2014)
  • Wrist-worn accelerometers to be mailed to all
    participants who agree to wear one (2013-15)
  • Biobank chip to genotype (GWAS candidate SNPs
    exome) all participants (2013-15)

16
Genotyping of all UK Biobank participants
  • 820K bespoke UK Biobank Affymetrix genotyping
    chip
  • 250,000 SNPs in a whole-genome array
  • 200,000 markers for known risk factor or disease
    associations, copy number variation, loss of
    function, and insertions/deletions
  • 150,000 exome markers for high proportion of
    non-synonymous coding variants with allele
    frequency over 0.02
  • Estimate (impute) additional genotypes by
    combining measured genotypes with reference
    sequence data
  • Researchers can study associations of genotype
    data with biochemical risk factors and detailed
    phenotyping from baseline assessment, along with
    health outcomes

17
Further enhancements of the phenotyping of UK
Biobank participants currently being conducted
  • Web-based assessments of diet completed and next
    to be cognition/mental health (2014)
  • Wrist-worn accelerometers to be mailed to all
    participants who agree to wear one (2013-15)
  • Biobank chip to genotype (GWAS candidate SNPs
    exome) all participants (2013-15)
  • Standard panel of assays (e.g. lipids clotting)
    on samples from all participants (2014-15)

18
Rationale for assaying many standard markers in
baseline samples from all 500,000 participants  
  • Cost-effective way of increasing the usability of
    the resource for researchers, by providing data
    for
  • Cross-sectional analyses with prevalent disease
  • Identification of subsets based on assay values
  • Conducting these assays in all of the
    participants at the same time should facilitate
    good quality control
  • Lower cost for conducting all of these assays at
    one time rather than in multiple retrievals and
    assays
  • Facilitates management of depletable samples

19
Consideration of a proposal to conduct assays of
biomarkers of infectious disease in all
participants
  • Request from the international research community
    to facilitate studies of the associations of
    infectious agents with disease (in particular,
    different types of cancer)
  • Plan would be to assay a panel of infectious
    agents (e.g. HPV, Hepatitis B C, HBV, EBV, H.
    pylori) in the baseline sample collected from all
    500,000 participants
  • As with the biochemical and genetic assays that
    are being conducted, assays of a wide range of
    infectious agents would increase the efficient
    use of the resource
  • Detailed proposal for funding is now being
    developed

20
Further enhancements of the phenotyping of UK
Biobank participants currently being conducted
  • Web-based assessments of diet completed and next
    to be cognition/mental health (2014)
  • Wrist-worn accelerometers to be mailed to all
    participants who agree to wear one (2013-15)
  • Biobank chip to genotype (GWAS candidate SNPs
    exome) all participants (2013-15)
  • Standard panel of assays (e.g. lipids clotting)
    on samples from all participants (2014-15)
  • Information from multiple imaging modalities
    (e.g. brain/heart/body MRI bone/joint DEXA)

21
Imaging of 100,000 UK Biobank participants
  • MRI of brain, heart and abdomen
  • DEXA of bones, joints and body
  • Ultrasound of carotid arteries
  • Shortened baseline assessment plus more detailed
    cognitive function tests and ECG to detect rhythm
    disturbances

Pilot phase 4-6,000 people in 1 centre
(2014-15) Main phase 95,000 people in 3 centres
(2015-19) Opportunities for repeat imaging in
sub-sets (e.g. as part of MRCs focus on
dementia)
22
Body Mass Index (BMI) vs Heart Disease and Stroke
(PSC1M people followed for 12 years Lancet 2009)
160
Heart disease
(18 237 deaths)
80
Annual deaths per 1000 (floated so mean
PSC rates at age 65-69)
40
Stroke (6122 deaths)
20
10
15
20
25
30
35
40
50
Baseline BMI (kg/m2)
Adjusted for age, sex, smoking study first 5
years of follow-up excluded
23
Similar age, gender, BMI body fat, but
different amounts of INTERNAL FAT
5.86 litres of internal Fat
1.65 litres of internal fat
24
Atrial fibrillation (AF) prevalence and
mortalityduring the period between 1993 and 2007
Prevalence increasing
Mortality little change
Piccini et al. Circulation Cardiovascular
Quality and Outcomes. 2012
25
Consideration of prolonged cardiac monitoring
  • Cardiac arrhythmias (especially AF)
  • can indicate significant underlying cardiac
    disease
  • can directly cause significant morbidity and
    mortality
  • important risk factors for cardio-embolic events
    (esp. stroke)
  • Detection requires prolonged monitoring
  • many are intermittent (e.g. paroxysmal AF)
  • substantial under-detection with standard 12 lead
    ECG
  • AF increases with age (lt50 years lt1 gt80 years
    10)
  • No large-scale population-based prospective
    studies with prolonged monitoring, so the full
    extent/impact of AF on health outcomes is likely
    to have been underestimated

26
Example of device for prolonged arrhythmia
detection
  • iRhythmZio Patch
  • Has been used in 18,000 people
  • Non-invasive stick-on patch
  • Comfortable (median wear 12 days)
  • Can be applied in clinic or at home
  • Beat-to-beat ECG recording
  • Validated against reference Holter
  • Potentially recyclable device chip which stores
    data for downloading
  • Planning to pilot feasibility and acceptability
    during imaging pilot

27
UK Biobank Centralised follow-up of health
  • Death and cancer registries
  • In-patient and out-patient hospital episodes
    (including psychiatric) and related procedure
    registries
  • Primary care records of health conditions,
    prescriptions, diagnostic tests and other
    investigations
  • Other health-related disease registries
    dispensing records imaging screening dental
    records
  • Direct to participants self-reported medical
    conditions treatments actually being taken
    degree of functional impairment cognitive and
    psychological scores

28
Health outcome data-linkage challenges
  • Regulation, bureaucracy, and permissions (despite
    explicit consent from participants)
  • Data transfer, matching and coding queries
  • Understanding different data structures
  • Mapping between coding systems
  • Mapping between different countries
  • Presenting outcome data to researchers
  • Original outcome codes
  • Post-adjudication outcomes

29
Progress with UK-wide linkage to outcome data
(both before and after baseline assessment)
30
Meaning of coded data from health records
  • What do the coded data actually tell us?
  • Characteristics of coded data
  • How accurate?
  • How detailed?
  • How complete?
  • Do we need to go beyond the coded data?

31
UK Biobank Expected numbers of participants
developing diseases during long-term follow-up
Condition 2012 2017 2022
Diabetes 10,000 25,000 40,000
MI/CHD death 7,000 17,000 28,000
Stroke 2,000 5,000 9,000
COPD 3,000 8,000 14,000
Breast cancer 2,500 6,000 10,000
Colorectal cancer 1,500 3,500 7,000
Prostate cancer 1,500 3,500 7,000
Lung cancer 800 2,000 4,000
Hip fracture 800 2,500 6,000
Rh. arthritis 800 2,000 3,000
Alzheimers 800 3,000 9,000
32
General strategy for outcome adjudication
  • Avoid false positive cases (but tolerate some
    false negatives)
  • Geographical generalisability
  • Cost-effectiveness
  • Future-proofed
  • Scalability
  • Staged approach
  • Ascertain
  • Confirm
  • Classify

33
Staged approach to outcome adjudication
APPROACH CHARACTERISTICS POSSIBLE DATA SOURCES
ASCERTAINMENT of suspected cases Cost-effective Feasible Scalable Death registers Cancer registers Hospital episodes Primary care records Web-based questionnaires


34
Staged approach to outcome adjudication
APPROACH CHARACTERISTICS POSSIBLE DATA SOURCES
ASCERTAINMENT of suspected cases Cost-effective Feasible Scalable Death registers Cancer registers Hospital episodes Primary care records Web-based questionnaires
CONFIRMATION of case-ness As above, but greater cost/lower feasibility Cross-referencing e-records Disease registers

35
Staged approach to outcome adjudication
APPROACH CHARACTERISTICS POSSIBLE DATA SOURCES
ASCERTAINMENT of suspected cases Cost-effective Feasible Scalable Death registers Cancer registers Hospital episodes Primary care records Web-based questionnaires
CONFIRMATION of case-ness As above, but greater cost/lower feasibility Cross-referencing e-records Disease registers
CLASSIFICATION of disease cases More involved and costly per case Review of clinical records Tumour collections/assays Specialised databases (e.g. imaging)
36
Expert Working Groups developing protocols for
ascertainment, confirmation and classification
37
UK Biobank Principles of Access
  • UK Biobank is available to all bona fide
    researchers for all types of health-related
    research that is in public interest
  • No preferential or exclusive access (and, in
    particular, access does not involve
    collaboration with UK Biobank)
  • Researchers have to pay for access to the
    Resource for their proposed research on a
    cost-recovery basis only
  • Access to the biological samples that are limited
    and depletable will be carefully controlled and
    coordinated
  • Researchers are required to publish their
    findings and return the data so that other
    researchers can use them

38
Showcase e-catalogue of data itemscurrently
in the UK Biobank Resource(www.ukbiobank.ac.uk)
39
Showcase supports search strategies for data
items in the UK Biobank Resource
40
Body Composition Body Fat
41
Preliminary applications subdivided by type of
researcher, location and type of research
42
What makes UK Biobank special?
  • PROSPECTIVE It can assess the full effects of a
    particular exposure (such as smoking) on all
    types of health outcome (such as cancer, vascular
    disease, lung disease, dementia)
  • DETAILED The wide range of questions, measures
    and samples at baseline allows good assessment of
    exposures, and outcome adjudication allows good
    disease classification
  • BIG Inclusion of large number of participants
    allows reliable assessment of the causes of a
    wide range of diseases, and of the combined
    impact of many different exposures

Unique combination of BREADTH and DEPTH
Write a Comment
User Comments (0)
About PowerShow.com