Gene mapping: Linkage and association methods - PowerPoint PPT Presentation

About This Presentation

Title:

Gene mapping: Linkage and association methods

Description:

Linkage and association methods. Disease gene mapping is one of the main purposes for ... The task of linkage analysis is to find markers that are linked to the ... – PowerPoint PPT presentation

Number of Views:732

Avg rating:3.0/5.0

Slides: 42

Provided by: cscu9

Category:

more less

Transcript and Presenter's Notes

Title: Gene mapping: Linkage and association methods

1
Gene mappingLinkage and association methods

Disease gene mapping is one of the main purposes
for genotyping
Two major approaches linkage and association
analyses

2
Linkage analysis

Try to localize genes affecting specific
phenotypes
Search for co-segregation of disease and marker
alleles

3
Basics of Linkage Analysis

Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Conclusions

4
Basics of Linkage Analysis

Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Conclusions

5
Linkage Analysis

One of the two main approaches in gene mapping.
Uses pedigree data.

6
Genetic linkage and linkage analysis

Two loci are linked if they appear nearby in the
same chromosome.
The task of linkage analysis is to find markers
that are linked to the hypothetical disease locus
Complex diseases in focus ? usually need to
search for one gene at a time
Requires mathematical modelling of meiosis

7
Meiosis and crossover

Number of crossover sites is thought to follow
Poisson distribution.
Their locations are generally random and
independent of each other.

8
The simple idea
Recombination fraction
?
Always 0 ? 0.5

Task Find ? that maximises L(? data )
Obtain measure for degree of evidence in favour
of linkage (LOD score)

9
Markers and inheritance

Polymorphic loci whose locations are known
Most often SNPs or microsatellites
Inherited within the chromosomes

10
Markers and information

Two individuals share same allele label ? they
share the allele IBS (identical by state)
Two individuals share an allele with same
(grand)parental origin ? they share an allele IBD
(identical by descent)
IBS sharing can easily be deduced from genotypes.
IBD sharing requires more information. One can
try to deduce IBD sharing based on family
structure and inheritance.

11
Markers and information
1,2
2,3
The children share allele 1 IBS.
They also share it IBD.
1,2
1,3
12
Markers and information
1,2
1,3
The children share allele 1 IBS.
1,2
1,3
They do not share alleles IBD.
13
Markers and information
1,1
2,3
The children share allele 1 IBS.
1,2
1,3
They either share or do not share it IBD.
14
Building blocks of linkage analysis
Marker maps
Pedigree structures
Genotypes
Phenotypes
15
Building blocks of linkage analysis

Information about disease model (in parametric
analysis)

? ?(aa), probability of a homozygote being
affected
? ?(Aa), probability of a heterozygote being
affected
? ?(AA), probability of a non-carrier being
affected (phenocopy rate)

Assumed disease allele frequency
Marker allele frequencies
Information about environmental variables

16
Basics of Linkage Analysis

Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Conclusions

17
Types of linkage analysis

Parametric vs. non-parametric
Dichotomous vs. continuous phenotypes
Elston-Stewart vs. Lander-Green vs. heuristic
Two-point vs. multipoint
Genome scan vs. candidate gene

18
Basics of Linkage Analysis

Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Conclusions

19
Maximum likelihood estimation

A common approach in statistical estimation
Define hypotheses
Generate likelihood function
Estimate
Test hypotheses
Draw statistical conclusions

20
Hypotheses in linkage analysis

H0
? 0.5
the disease locus is not linked to the marker(s)
HA
? ? 0.5
the disease locus is linked to the marker(s)

21
Likelihood function for a single nuclear family

Lj ?gF P(gF) P(yF gF)?gM P(gM)P(yM
gM)?gOi P(gOi gF, gM) P(yOi gO)

G genotype probabilitiesy phenotype
probabilities
22
Several independent families

The likelihood functions of multiple independent
families are combined
L ? Lj or logL ? log Lj

23
Testing of hypotheses

Compute values of likelihood function under null
and alternative hypotheses.
Their relationship is expressed by LOD score
(essentially derived from the likelihood ratio
test statistic.

24
On significance levels

P-value gives a probability that a null
hypothesis is rejected even though it was true.
A LOD-score threshold of 3 corresponds to a
single-test p-value of approximately 0.0001
Often, the significant areas pointed out are
quite large, from 10-40 cM (millions of basepairs)

25
0.56
0.5
LOD score
0.0
0.0
0.5
0.14
Recombination fraction
LODgt3 taken as evidence of linkage.
26
Basics of Linkage Analysis

Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Conclusions

27
Conclusions

Linkage analysis is a pedigree-based approach to
gene mapping.
Parametric vs. nonparametric methods.
Hypothesis-driven vs. explorative analysis.
Meta-analysis (integration of several studies
into one big study) becoming increasingly
popular.

28
Fine mapping and association analysis

After successful linkage analysis, what to do?
How to refine the linked area where actually
the disease susceptibility locus is?
Outline of the rest of the lecture
Allelic association
?2 test
LD mapping

29
Allelic association

An example A leukaemia study, where a number of
affected and healthy control persons have been
contacted for DNA samples
A candidate gene has been suggested GSTM1, which
functions in the metabolism of benzene
GSTM1 has two different alleles, 1 and 2, where
A person is positive for allele 1 if his
genotype is 1 1 or 1 2
A person is null, if having genotype 2 2
The numbers of leukaemic and control individuals
either positive or null with respect to allele 1
are compared by ?2-test in order to find out,
whether there is statistically significant
difference

30
Allelic assosiation

Results observed frequencies
Expected frequencies

31
Test statistic

The observed are compared to expected
frequencies. (null hypothesis, H0 carrier status
and disease occurrence are independent of each
other )
Test statistic
where
oi is the observed frequency for class i, ei the
expected frequency for class i
k is the number of classes

32
Allelic assosiation

Now, ?2 111,39.
Degrees of freedom for the test df(r-1)(s-1),
where r number of rows, s number of columns
Here, df (2-1)(2-1) 1
The ?2 value is then compared to the null
distribution of critical ?2-test statistic values
(within the given df class)

33
?2-distribution critical values for chosen
significance levels

df\p 0.10 .05 .025 .01 .005
1 2.71 3.84 5.02 6.63 7.88
2 4.61 5.99 7.38 9.21 10.60
3 6.25 7.81 9.35 11.34 12.84
4 7.78 9.49 11.14 13.28 14.86
5 9.24 11.07 12.83 15.09 16.75
6 10.64 12.59 14.45 16.81 18.55
7 12.02 14.07 16.01 18.48 20.28
8 13.36 15.51 17.53 20.09 21.96
9 14.68 16.92 19.02 21.67 23.59
10 15.99 18.31 20.48 23.21 25.19
11 17.28 19.68 21.92 24.73 26.76

When the observed value of test statistic is
greater than the critical value (for the chosen
significance levels) given in the table, the null
hypothesis can be rejected.
34
Allelic association

The value we obtained, ?2 111,39 , exceeds all
critical values with df1 given in the table. We
conclude, that H0 can be rejected and thus, there
is statistically significant difference between
the affected and healthy with respect to GSTM1
genotypes.
The relative frequencies of null and positive
genotypes show the same
It seems that different GSTM1 genotypes, by
changing the benzene metabolism, considerably
affect the probability of getting leukaemia

Note compared to linkage analysis, which is
based on the observed inheritance patterns in
pedigrees, the association analysis studies
correlation of allele presence and a disease in
the level of population
We find an allele or a haplotype overrepresented
in affected individuals ?
BUT the statistical correlation does not
implicate a causal relationship !!!! ?
Quite often, the associating allele or haplotype
is not the cause of the disease itself, but is
merely correlated with the presence of the actual
susceptibility gene in the same chromosome. It is
then said to be in linkage disequilibrium with
the disease gene. ?

36
Original mutation in one chromosome in the
founder population
A
Time
Current generation
C
B
An affected pedigree
37
LD mapping

The marker itself is NOT the reason for the
disease, but its located nearby the disease
susceptibility gene, and there is correlation
between the presence of certain marker allele and
the disease gene allele (LD)
The correlation, i.e. LD, is based on founder
effect the disease allele has been born a long
time ago on a certain ancestral chromosome, and
majority of disease alleles existing presently
predate from that original mutation

38
LD-mapping Utilizing the founder effect
39
Data
Disease locus
Disease status
S2
...
SNP1
...
a ? 2 1 1 a ? 1
2 1
1 2 2 1 1 2 1 2 1 2 1 1
2 2
1 2 2 1 2 1 1
2
2 1 1 1 1 1 1
1
c 2 1 ? ?c 1 1
? ?
1 2 2 1 1 2 1 1 2 2 2 1
1 1
a 1 1 2 1a 1 1
1 2
1 1 2 1 1 2 2 2 2 2 1 1
2 1
2 2 ? 1 1 1 ?
1

40
Many approaches, several programs

old-fashioned allele association with some
simple test (problem multiple testing)
TDT modelling of LD process Bayesian, EM
algorithm, integrated linkage LD

41
Limitations LD is random process

The amount of LD is on a continuous but slow
change, where the natural forces of
genetic drift
population structure
natural selection
new mutations
founder effect
...affect it even if two pairs of loci are in
exactly the same distance from each other, their
amount of LD may vary a lot.
? This limits the accuracy of LD mapping, though
it is much more accurate in pinpointing the
location of a disease gene compared to linkage