Genome-wide association studies - PowerPoint PPT Presentation

Loading...

PPT – Genome-wide association studies PowerPoint presentation | free to download - id: 4eda2f-NDgyZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Genome-wide association studies

Description:

Genome-wide association studies Misha Kapushesky Slides: Johan Rung, EBI St. Petersburg Russia 2010 Signal propagation The structure of biological networks result in ... – PowerPoint PPT presentation

Number of Views:516
Avg rating:3.0/5.0
Slides: 76
Provided by: logicPdm
Learn more at: http://logic.pdmi.ras.ru
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Genome-wide association studies


1
Genome-wide association studies
  • Misha Kapushesky
  • Slides Johan Rung, EBI
  • St. Petersburg Russia 2010

2
Overview
  • Methods for genome-wide association studies
  • Montreal GWAS for Type 2 Diabetes
  • GWAS results - context and caveats

3
Study coverage
  • Associating phenotype/disease state to genetic
    variation
  • Cost per genotype has decreased
  • Instead of a candidate gene approach, just scan
    the entire genome
  • SNP microarrays covering up to 5M SNPs on one
    chip
  • Increased sample sizes

4
Recombination
5
Linkage disequilibrium
Two markers on the genome are inherited together
more often than would be expected by chance This
leads to high correlation between nearby markers
in its haplotype block
6
Haplotypes and genotype tagging
7
Association studies
  • Linkage disequilibrium enables association
    studies, because of detection by proxy - not
    every variant need to be typed

8
Study power
9
Study power
  • The power of a study is to correctly predict a
    true positive
  • To calculate this, you need
  • risk model
  • genotype relative risk
  • allele frequency
  • number of cases and controls
  • population penetrance
  • Acceptable rate of false positives

10
How many SNPs should be tested? Studies of small
regions revealed linkage disequilibrium blocks in
which common SNPs are highly correlated (usually
lt10,00030,000 base pairs in African populations
or 30,00050,000 base pairs in the newer European
and Asian populations) (22). This motivated the
HapMap Project (www.hapmap.org 12), which has
validated approximately 4 million SNPs, including
2.8 million of the estimated 10 million common
SNPs in major world populations, while creating
competition among biotechnology companies to
develop high-throughput genotyping technologies.
Sequencing and genotyping studies showed that
sets of 500,000 (European populations) to
1,000,000 (African populations) SNPs could "tag"
(serve as proxies for) approximately 80 of
common SNPs (23).
11
Quality controls
  • Call rates for samples and SNPs
  • Exclusion of low frequency SNPs
  • Exclusion of SNPs out of Hardy-Weinberg
    Equilibrium
  • Clean (or take into account) population
    stratification

12
Hardy-Weinberg Equilibrium
  • If the alleles A and B have frequencies p and q,
    you would expect the following genotype
    frequencies
  • AA p2
  • AB 2pq
  • BB q2

13
Hardy-Weinberg Equilibrium
  • When observed genotype frequencies deviate from
    the ones expected under HWE, this is indicative
    of
  • population stratification
  • different mutation rates between males and
    females
  • different fitness between alleles
  • genotype calling problems
  • true association at the locus

14
Binary or real-valued phenotypes
  • Binary traits are typically disease state labels
    (case or control)
  • Real-valued traits are quantitatively measured
    phenotypes
  • blood sugar
  • lipids
  • height
  • BMI
  • gene expression

15
Molecular vs disease phenotypes
  • Disease phenotypes are the result of combinations
    of molecular phenotypes in the body
  • Progression with time
  • Precision of phenotype measurement

16
Molecular vs disease phenotypes
  • Many physiological phenotypes involved in
    disease dynamics

17
Molecular vs disease phenotypes
Molecular phenotypes can give more precise
information about disease state
18
Association statistics
  • Association statistics for binary traits are most
    often based on a c2-statistic, based on the
    genotype count table, or a logistic regression
    model
  • c2-statistic summarizes independence between
    disease state and genotype

19
Association statistics
  • For aa in cases, you would expect
  • The sum of the squares of the differences is
    c2-distributed

aa aA AA Sum
Cases r0 r1 r2 R
Controls s0 s1 s2 S
Count n0 n1 n2 N
20
Regression
  • For real-valued phenotypes, use linear regression
  • For binary phenotypes, use logistic regression

21
Population stratification
  • Population stratification occurs when groups or
    subpopulations within your sample are more
    related than would be expected by random
  • This introduces correlations and inflates
    association p-values and need to be corrected for

22
Genomic control
23
Eigenstrat
24
Imputation
  • Using a reference population (like HapMap or 1000
    genomes) we can infer the genotype of SNPs that
    were not tested
  • IMPUTE or MACH commonly used
  • Yields probabilistic genotypes that need special
    treatment

25
Imputation
Wu et al, Nat. Genet. 41, 991-995, 2009
26
Montreal GWAS
27
Type 2 diabetes
  • Blood glucose levels are regulated by insulin
    release
  • Increased blood glucose levels triggers release
    of insulin, that signals to the cells in muscle
    for glucose intake
  • Through b-cell dysfunction or insulin resistance,
    insulin regulation is impaired, leading to
    increased glucose levels and eventually type 2
    diabetes

28
Type 2 diabetes
29
Genetics of type 2 diabetes
  • Before GWAS, T2D genetics was studied with
    linkage studies and candidate gene approaches
  • Results in particular for MODY variants, caused
    by disruptions of single genes
  • Genome-wide association studies and SNP arrays
    made it possible to study complex diseases
  • Five large GWAS for T2D in 2007
  • DIAGRAM meta-analysis in 2008

30
Montreal GWAS
  • Part of a larger T2D project at McGill and Genome
    Quebec
  • After initial planning for candidate gene
    genotyping, we switched to a GWAS strategy

31
Multi-stage GWAS
  • Two main strategies for increasing study power
  • Meta-analyses increase effective sample size by
    combining results from different studies
  • Multi-stage approaches scan the whole genome with
    relatively low power, followed by focusing in on
    the hits with higher power
  • Maximizing power in a single study in a
    cost-effective way

32
Multi-stage GWAS
33
Study design
Fasting glucose Normoglycemic individuals Stage
1 French (N654) Stage 2 rs560887
(N9,353) Previously published, Science, May 2007
Fasting glucose Normoglycemic individuals Stage
1 French (N654) Stage 2 rs560887
(N9,353) Previously published, Science, May 2007
Fasting glucose Normoglycemic individuals Stage
1 French (N654) Stage 2 rs560887
(N9,353) Previously published, Science, May 2007
Fasting glucose Normoglycemic individuals Stage
1 French (N654) Stage 2 rs560887
(N9,353) Previously published, Science, May 2007
Stage 1 Genome-wide scan - 392,365 SNPs French
(N1,376) 679 cases, 697 controls
Stage 1 Genome-wide scan - 392,365 SNPs French
(N1,376) 679 cases, 697 controls
Stage 1 Genome-wide scan - 392,365 SNPs French
(N1,376) 679 cases, 697 controls
Stage 1 Genome-wide scan - 392,365 SNPs French
(N1,376) 679 cases, 697 controls
Stage 1 Genome-wide scan - 392,365 SNPs French
(N1,376) 679 cases, 697 controls
Stage 1 Genome-wide scan - 392,365 SNPs French
(N1,376) 679 cases, 697 controls
Stage 1 Genome-wide scan - 392,365 SNPs French
(N1,376) 679 cases, 697 controls
Stage 1 Genome-wide scan - 392,365 SNPs French
(N1,376) 679 cases, 697 controls
Stage 1 Genome-wide scan - 392,365 SNPs French
(N1,376) 679 cases, 697 controls
Stage 1 Genome-wide scan - 392,365 SNPs French
(N1,376) 679 cases, 697 controls
Focused Stage 2 - 16,273 SNPs French
(N4,977) 2,245 cases, 2,732 controls
Fast-track confirmation - 57 SNPs French
(N5,511) 2,617 cases, 2,894 controls Previously
published, Nature, Feb 2007
Focused Stage 2 - 16,273 SNPs French
(N4,977) 2,245 cases, 2,732 controls
Fast-track confirmation - 57 SNPs French
(N5,511) 2,617 cases, 2,894 controls Previously
published, Nature, Feb 2007
Focused Stage 2 - 16,273 SNPs French
(N4,977) 2,245 cases, 2,732 controls
Fast-track confirmation - 57 SNPs French
(N5,511) 2,617 cases, 2,894 controls Previously
published, Nature, Feb 2007
Focused Stage 2 - 16,273 SNPs French
(N4,977) 2,245 cases, 2,732 controls
Fast-track confirmation - 57 SNPs French
(N5,511) 2,617 cases, 2,894 controls Previously
published, Nature, Feb 2007
Focused Stage 2 - 16,273 SNPs French
(N4,977) 2,245 cases, 2,732 controls
Focused Stage 2 - 16,273 SNPs French
(N4,977) 2,245 cases, 2,732 controls
Fast-track confirmation - 57 SNPs French
(N5,511) 2,617 cases, 2,894 controls Previously
published, Nature, Feb 2007
Focused Stage 2 - 16,273 SNPs French
(N4,977) 2,245 cases, 2,732 controls
Fast-track confirmation - 57 SNPs French
(N5,511) 2,617 cases, 2,894 controls Previously
published, Nature, Feb 2007
Focused Stage 2 - 16,273 SNPs French
(N4,977) 2,245 cases, 2,732 controls
Fast-track confirmation - 57 SNPs French
(N5,511) 2,617 cases, 2,894 controls Previously
published, Nature, Feb 2007
Focused Stage 2 - 16,273 SNPs French
(N4,977) 2,245 cases, 2,732 controls
Fast-track confirmation - 57 SNPs French
(N5,511) 2,617 cases, 2,894 controls Previously
published, Nature, Feb 2007
Focused Stage 2 - 16,273 SNPs French
(N4,977) 2,245 cases, 2,732 controls
CASE-CONTROL T2D ASSOCIATION
Focused Stage 3 - 28 SNPs Danish (N7,698) 3,334
cases, 4,364 controls
Focused Stage 3 - 28 SNPs Danish (N7,698) 3,334
cases, 4,364 controls
Stage 4 population effect study - 1 SNP
(rs2943641) Population based study
samples French (N3,351), Finnish (N5,183),
Danish (N5,824)
Stage 4 population effect study - 1 SNP
(rs2943641) Population based study
samples French (N3,351), Finnish (N5,183),
Danish (N5,824)
QT ASSOCIATION IN POPULATIONS
34
Stage 1 samples
  • French individuals 690 cases, 670 controls
  • Criteria for cases
  • T2D
  • First degree relative with T2D
  • Non-obese (BMI lt 31 kg/m² , 25.8 2.8 kg/m²)
  • Controls from DESIR, a prospective French cohort
  • Normal glucose tolerance for the 9 years of the
    study

35
Stage 1 SNPs
  • Tested on Illumina Human1 (100k) and HumanHap300
    (300k)
  • 392,935 unique SNPs from the combined arrays

36
Stage 1 results
37
Fast-track validation
  • Top 57 fast-tracked and tested on a Sequenom
    panel on 2,617 cases, 2,894 controls
  • Relaxed criteria for cases
  • BMI lt 35 kg/m² (28.9 3.7 kg/m²)
  • Sladek et al., Nature 445, 881-885, 2007

38
Results
SNP Chr Position pMAX Closest gene

rs7903146 10 114748339 1.5 x 10-34 TCF7L2
rs13266634 8 118253964 6.1 x 10-8 SLC30A8
rs1111875 10 94452862 3.0 x 10-6 HHEX
rs7923837 10 94471897 7.5 x 10-6 HHEX
rs7480010 11 42203294 1.1 x 10-4 LOC387761
rs3740878 11 44214378 1.2 x 10-4 EXT2
rs11037909 11 44212190 1.8 x 10-4 EXT2
rs1113132 11 44209979 3.3 x 10-4 EXT2
39
SLC30A8
Chimienti et al. Biometals 18313
40
HHEX
41
HHEX controls pancreatic development
Hex homeobox gene-dependent tissue positioning is
required for organogenesis of the ventral
pancreas. Bort (2004) Heart induction by Wnt
antagonists depends on the homeodomain
transcription factor Hex. Foley (2005) The
homeobox gene Hex is required in definitive
endodermal tissues for normal forebrain, liver
and thyroid formation. Martinez Barbera (2000)
Habener Endocrinology 1461025
42
Stage 2
  • Top 5 of GWAS hits were selected for design of a
    focused Stage 2
  • Control for population bias with EIGENSTRAT
  • iSelect array with 16,405 SNPs, tested on 2,245
    cases, 2,732 controls (French)
  • Analysis with EIGENSTRATand selection of 28 SNPs
    for a focused Stage 3

43
QC
Exclusion criterion Samples
Call rate lt 95 27
Continental stratification 296
Sex mismatch 64
Related individuals 70
Total 457
Chromosome SNPs Failed HWE Failed MAF Successful
TOTAL 16,360 48 43 16,273
44
EIGENSTRATcorrection
filters for MAF, HWE, call rate
filters for MAF, HWE, call rate and r2
45
Results - stage 1 vs stage 2
46
Results - taking out known loci
47
(No Transcript)
48
Stage 3
  • The top 28 SNPs were tested using a Sequenom
    panel in 7,700 Danish cases and controls
  • We confirm association of TCF7L2, WFS1, CDKAL1
    and find one new association rs2943641 near IRS1

49
rs2943641
  • We studied the effect of variation in rs2943641
    on T2D risk and metabolic phenotypes in general
    populations
  • DESIR 3,351 French adults
  • Inter99 5,183 Danish adults
  • NFBC 1986 5,824 Finnish adolescents

50
Metabolic traits
  • A variety of indexes to capture b-cell function
    and insulin resistance
  • HOMA-B and HOMA-IR based on fasting levels of
    glucose and insulin
  • For Inter99, we had access to OGTT data and could
    calculate other measures of insulin response
  • time course data
  • AUC
  • corrected insulin response (CIR)
  • disposition indexes

51
Oral Glucose Tolerance Test
52
Metabolic traits 1
Metabolic trait Cohort rs2943641 rs2943641 rs2943641 P add P dom P rec
Metabolic trait Cohort C/C C/T T/T P add P dom P rec
Age NFBC 1986 16 16 16
Age DESIR 47.1  9.8 47.5  9.9 47.6  10.1
Age INTER99 44.9  7.9 45.4  7.8 45.2  7.6
Sex NFBC 1986 1062/1092 1153/1208 322/346
Sex DESIR 645/728 728/812 216/222
Sex INTER99 776/942 974/1070 307/354
BMI (kg/m2) NFBC 1986 21.3  3.8 21.3  3.7 21.1  3.5 0.24 0.43 0.21
BMI (kg/m2) DESIR 24.5  3.7 24.4  3.5 24.4  3.4 0.55 0.63 0.61
BMI (kg/m2) INTER99 25.6  3.9 25.4  4.1 25.7  4.2 0.57 0.094 0.24
Fasting plasma glucose (mmol/l) NFBC 1986 5.13  0.41 5.14  0.40 5.13  0.41 0.77 0.62 0.90
Fasting plasma glucose (mmol/l) DESIR 5.21  0.44 5.20  0.42 5.18  0.43 0.05 0.32 0.07
Fasting plasma glucose (mmol/l) INTER99 5.31  0.40 5.31  0.41 5.33  0.39 0.66 0.93 0.32
Fasting serum insulin (pmol/l) NFBC 1986 78.7  48.6 76.8  44.5 71.7  32.1 0.001 0.03 0.0009
Fasting serum insulin (pmol/l) DESIR 50.6  32.9 48.4  29.7 49.1  29.1 0.05 0.003 0.76
Fasting serum insulin (pmol/l) INTER99 38.8  24.7 36.4  21.9 37.6  23.3 0.018 0.0043 0.49
53
Metabolic traits 2
HOMA-B NFBC 1986 141  95.1 136  80.1 131  91.6 0.006 0.05 0.009
HOMA-B DESIR 109  87.0 103  64.8 108  92.2 0.16 0.006 0.24
HOMA-B INTER99 75.2   65.6 68.3  42.2 71.0  49.9 0.005 0.0011 0.32
HOMA-IR NFBC 1986 2.52  1.63 2.47  1.58 2.29  1.06 0.007 0.07 0.005
HOMA-IR DESIR 1.95  1.35 1.86  1.20 1.88  1.17 0.03 0.004 0.95
HOMA-IR INTER99 1.54  1.00 1.44  0.89 1.49  0.95 0.026 0.0058 0.59
Insulin 30 INTER99 300  183 277  172 281  169 0.0019 8.1 x 10-4 0.14
Insulin 120 INTER99 176  138 163  127 162  124 0.0059 0.011 0.057
AUC insulin INTER99 22000  13800 20300  12900 20500  12700 6.9 x 10-4 2.2 x 10-4 0.12
Glucose 30 INTER99 8.19  1.53 8.17  1.56 8.22  1.50 0.72 0.34 0.55
Glucose 120 INTER99 5.51  1.11 5.51  1.11 5.47  1.15 0.54 0.99 0.23
AUC glucose INTER99 182  101 181  102 180  99.5 0.44 0.48 0.59
AUC insulin / AUC glucose INTER99 32.5  17.4 30.1  16.2 30.6  16.1 6.0 x 10-4 1.6 x 10-4 0.13
CIR INTER99 1140  4210 1000  1130 1000  1060 0.045 0.066 0.17
ISI INTER99 0.151  0.095 0.16  0.098 0.156  0.096 0.026 0.0058 0.59
Disp. Index (CIR ISI) INTER99 180  1610 147  220 143  174 0.73 1.0 0.50
54
IRS1 locus - rs2943641
55
IRS1
  • G972R is a missense polymorphism in IRS1 that is
    known to impair insulin signalling (rs1801278)
    (Almind 1993)
  • G972R associated to insulin resistance and
    insulin release (Clausen 1995, Sesti 2001)
  • In mice, IRS1 disruption causes disrupted insulin
    action, both in target tissues and in b-cells
    (Nandi 2004)
  • Also linked to insulin resistance, glucose
    intolerance, islet hyperplasia (Tamemoto 1994,
    Araki 1994, Terauchi 1997, Withers 1998)
  • G972R not conclusively associated to T2D (Florez
    2004, Florez 2007, Jellema 2003, Zeggini 2004)
  • We detect no epistasis between rs2943641 and
    G972R in DESIR or NFBC, only nominal significance
    in Inter99
  • Evidence for link between rs2943641 and IRS1?

56
rs2943641 - IRS1 protein association
57
rs2943641 - IRS1 protein association
rs2943641 CC rs2943641 CT rs2943641 TT PAdd PDom PRec
n (male/female) 74 (35/39) 88 (51/37) 28 (10/18)
Age (years) 42.5  17.1 43.5  16.9 43.2  17.6
BMI (kg/m2) 25.0  3.8 24.9  3.9 25.3  4.1 0.3 0.7 0.2
Rd insulin clamp (mg/kgFFM/min) 10.4  3.5 11.0  3.2 11.7  3.7 0.2 0.2 0.4
Di (x 10-7) 1.7  1.1 1.8  1.3 1.8  1.1 0.8 0.8 0.9
IRS-1 protein basal (AU) 296.7  167.7 314.0  155.1 413.1  227.6 0.03 0.3 0.009
IRS-1 protein insulin (AU) 276.6  143.6 280.9  156.4 313.3  147.9 0.3 0.7 0.2
IRS-1-associated PI3K activity basal (AU) 25.0  12.6 26.6  15.4 30.1  17.2 0.3 0.4 0.4
IRS-1-associated PI3K activity insulin (AU) 47.1  29.9 56.6  32.1 72.2  41.3 0.001 0.02 0.002
58
Conclusions
  • The multi-stage study detected T2D risk loci that
    were later confirmed in other cohorts (SLC30A8,
    HHEX)
  • Variation in rs2943641 is associated to
  • T2D risk
  • increased insulin levels
  • impaired insulin sensitivity
  • IRS1 protein levels
  • IRS1 activity in insulin signaling pathway
  • Study provided a full story from GWAS scan to
    functional evidence thanks to rich phenotyping

59
Paper
Rung et al., Nature Genetics, 41, 1110-1115, 2009
60
Acknowledgements
Rosalie Frechette Valérie Catudal Philippe
Laflamme Stephane Cauchi Christian Dina David
Meyre Christine Cavalcanti-Proença Anders
Albrechtsen Torben Hansen Knut Borch-Johnsen Torst
en Lauritzen Marjo-Riitta Järvelin Jaana
Laitinen Emmanuelle Durand Paul Elliott Samy
Hadjadj Michel Marre
Alexander Montpetit Charlotta Pisinger Barry
Posner Anneli Pouta Marc Prentki Rasmus
Ribel-Madsen Aimo Ruokonen Anelli Sandbaek Jean
Tichet Martine Vaxillaire Jorgen
Wojtaszewski Allan Vaag
  • Johan Rung
  • Rob Sladek
  • Philippe Froguel
  • Oluf Pedersen
  • Constantin Polychronakos
  • Ghislain Rocheleau
  • Alexander Mazur
  • Lishuang Shen
  • David Serre
  • Philippe Boutin
  • Daniel Vincent
  • Alexandre Belisle
  • Samy Hadjadj
  • Beverley Balkau
  • Barbara Heude
  • Guillaume Charpentier
  • Tom Hudson
  • Sebastien Brunet
  • François Bacot

61
GWAS into context
  • Complexity of interactions in biological
    systems...

62
Complexity
  • ...a lot of complexity

63
B
G
C
A
D
F
E




A
B
64
Redundancy
65
Network structure
  • Biological networks have a scale-free structure

Log( genes)
Most genes have few connections
Few genes have many connections
Log(edges)
66
Signal propagation
  • The structure of biological networks result in
    robustness against random errors
  • Most mutations, even knockouts, can go by
    unnoticed because of redundancy and network
    wiring
  • Low probability to knock out a hub

67
Common diseases
  • What is most common - disease cause by many
    variants with low effect, or few rare variants
    with strong effects?
  • GWAS so far have by necessity focused on common
    variants
  • Many known rare variants associated with common
    diseases - or phenotypes that may contribute and
    progress to disease

68
Common disease / common variant
  • The hypothesis that most common diseases are
    caused by a large number of variants, common in a
    general population, but each adding just a small
    risk
  • GWAS results find many loci for common complex
    diseases, with small risk
  • But... GWAS detected loci so far only explain a
    very small fraction of the observed variation

69
Rare variants
  • With improved and lower cost sequencing, we can
    address rare variants
  • Not just SNPs
  • Utility of extreme cohorts
  • Ex. A new highly penetrant form of obesity due
    to deletions on chromosome 16p11.2 (Nature Feb
    4, 2010)

70
Polygenic contributions
  • Groups of non-genomewide significant SNPs proven
    to be associated with phenotype
  • Individual SNPs can not be inferred, just group
    action
  • Supports the idea of many weak variants
    responsible for effect
  • Ex. Common polygenic variation contributes to
    risk of schizophrenia and bipolar disorder
    (Nature 460, 748-752)

71
Meta-analysis caveats
  • Meta-analysis on heterogeneous data
  • Phenotypes
  • Quality control
  • Platforms
  • Genotype calling
  • Analysis

72
Future directions for GWAS
  • Sequencing is cheaper and yielding higher quality
    data
  • Better basis for studying and detecting rare
    variants and their effect on diseases or
    phenotypes
  • Copy number variants
  • Genetic interactions, GxE interactions
  • More samples gt higher power

73
Future directions for GWAS
  • Complex phenotypes
  • Association of genetic loci to
  • genome-wide expression levels
  • protein levels
  • metabolite levels

74
Future directions for GWAS
  • More data shared gt better quality of results
  • As in other branches of science, data sharing,
    transparency and openness should be promoted

75
Resources
  • Analysis software packages
  • PLINK - http//pngu.mgh.harvard.edu/purcell/plink
    /
  • Abel - http//mga.bionet.nsc.ru/yurii/ABEL/
  • MERLIN - http//www.sph.umich.edu/csg/abecasis/mer
    lin/
  • Imputations
  • IMPUTE - http//mathgen.stats.ox.ac.uk/impute/impu
    te.html
  • MACH - http//www.sph.umich.edu/csg/abecasis/MACH/
  • Population structure
  • Eigenstrat - http//genepath.med.harvard.edu/reic
    h/Software.htm
  • EMMA(X) - http//genetics.cs.ucla.edu/emmax/index.
    html
  • Meta-analysis
  • METAL - http//www.sph.umich.edu/csg/abecasis/META
    L/
  • GWAMA - http//www.well.ox.ac.uk/gwama/index.shtml
  • Data
  • EGA - http//www.ebi.ac.uk/ega/
About PowerShow.com