Molecular Evolution: Selection - PowerPoint PPT Presentation

About This Presentation
Title:

Molecular Evolution: Selection

Description:

the ratio between the number of non-synonymous substitutions (KA) and synonymous ... O' denotes orangutan, 'M' macaque, 'OGCH' the orangutan gorilla chimpanzee human ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 89
Provided by: fenBilk
Category:

less

Transcript and Presenter's Notes

Title: Molecular Evolution: Selection


1
Molecular Evolution Selection
  • the ratio between the number of non-synonymous
    substitutions (KA) and synonymous substitutions
    (KS) in a gene during a specific evolutionary
    period.
  • Assuming that KS provides an index of the random
    mutation rate, the KA/KS ratio measures whether
    the rate of protein evolution differs from the
    rate expected under neutral drift.
  • If KAgtKS, this is taken to indicate accelerated
    amino-acid change, which might be due to positive
    selection.
  • Conversely, if KAltKS, this suggests purifying
    selection.

2
Brain Development in Primates
  • MCPH1 (the gene that encodes microcephalin)
  • and ASPM (abnormal-spindle-like, microcephaly
    associated)
  • Both MCPH1 and ASPM are evolutionarily ancient,
    with orthologues that are likely to be present in
    all chordates

3
MCPH
4
MCPH
5
(No Transcript)
6
(A) Schematic representation of the alignment.
Promoter regions, exons, and introns are marked
in gray, red, and blue, respectively. White
segments correspond to gaps. (B) Positions of
long (50 bp or longer) insertions/deletions. O
denotes orangutan, M macaque, OGCH the
orangutangorillachimpanzeehuman clade, and
GCH the gorillachimpanzeehuman clade. (C)
Positions of polymorphic bases derived from the
GenBank single nucleotide polymorphism (SNP)
database. (D) Positions of the CpG island. The
approximately 800-bp-long CpG island includes
promoter, 5' UTR, first exon, and a small portion
of the first intron. (E) Location of an
approximately 3-kb-long segmental
duplication. (F) Positions of selected motifs
associated with genomic rearrangements in the
human sequence. Numbers in parentheses reflect
number of allowed differences from the consensus
motif (zero for short or two ambiguous motifs,
two for longer sites). (G) Distribution of
repetitive elements. The individual ASPM genes
share the same repeats except of indels marked in
(B). (H) DNA identity and GC content. Both plots
were made using a 1-kb-long sliding window with
100-bp overlaps. The GC profile corresponds to
the consensus sequence the individual sequences
have nearly identical profiles.
7
(No Transcript)
8
Linkage Studies
  • Monogenic and Complex Studies

9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Nail-Patella Syndrome
  • Nail Patella Syndrome (also called Fong's
    Disease, Hereditary Onycho-Osteodysplasia
    'HOOD' is characterized by several typical
    abnormalities of the arms and legs as well as
    kidney disease and glaucoma

35
Recombination Frequency
  • to determine the linkage distance between the two
    genes (B/O and NP genes). The original mating in
    generation I and the first two matings in
    generation II are test cross. The third mating in
    generation II is not informative because it
    involves the A allele which we are not following.
    We have a total of 16 offspring that are
    informative. Of these three were recombinant. As
    with all test crosses, this gives a genetic
    distance of 18.8 cM 100(3/16).

http//www.ndsu.edu/instruct/mcclean/plsc431/link
age/linkage6.htm
36
Lod Score Method of Estimating Linkage Distances
The following pedigree will be used to
demonstrate a method developed to determine the
distance between genes. This approach has been
widely adapted to various system and genetic
programs have been developed based on this
technique.                                        
                     
37
Pedigree
  • Even though we are working with the same two
    genes, nail-patella and blood type, in this
    pedigree the dominant allele seems to be coupled
    with the A blood type allele.
  • Remember in the previous example, the dominant
    nail-patella allele was linked with the B allele.
    This is an important point in genetics --- not
    all linkages between alleles of two genes are
    found to be constant throughout a species.
  • Why??? Because at some point in the lineage of
    this family, the disease (nail-patella) allele
    recombined and became linked to a different blood
    type allele. In even other lineages, the
    nail-patella causing allele is linked to the O
    blood type allele.

38
Recombination Frequency
  • we have one recombinant among the eight progeny.
    This gives us a recombination frequency of 0.125
    and a distance of 12.5 cM.

39
LOD Score Method
  • developed by Newton E. Morton, and is an
    iterative approach that include a series of lod
    scores calculated from a number of proposed
    linkage distance.

40
LOD Score Method
  • A linkage distance is estimated, and given that
    estimate, the probability of a given birth
    sequence is calculated. That value is then
    divided by the probability of a given birth
    sequence assuming that the genes are unlinked.
    The log of this value is calculated, and that
    value is the lod score for this linkage distance
    estimate.

41
LOD Score Method
42
Example
  • In this first birth sequence, we have an
    individual with a parental genotype. The
    probability of this event is (1 - 0.125). Because
    there are two parental types, this value is
    divided by two to give a value of 0.4375. In this
    pedigree we have a total of seven parental types.
    We also have one recombinant type. The
    probability of this event is 0.125 which is
    divided by two because two recombinant types
    exist.

43
Example
  • What would the sequence of births be if these
    genes were unlinked?
  • When two genes are unlinked the recombination
    frequency is 0.5. Therefore, the probability of
    any given genotype would be 0.25.

44
Linkage Probability
  • The probability of a given birth sequence is the
    product of each of the independent events. So the
    probability of the birth sequence based on our
    estimate of 0.125 as the recombination frequency
    would be equal to (0.4375)7(0.0625)1 0.0001917.

45
Non-linkage Probability
  • The probability of the birth sequence based on no
    linkage would be (0.25)8 0.0000153.

46
Calculation of LOD score
  • Now divide the linkage probability by the
    non-linkage probability and you get a value of
    12.566. Next take the log of this value, and you
    obtain a value of 1.099. This value is the lod
    score.
  • LOD 0.0001917/ 0.0000153log(12.566)

47
In practice, we would like to see a lod score
greater that 3.0. What this means is that the
likelihood of linkage occurring at this distance
is 1000 times greater that no linkage.
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
Case Control Studies
  • Modified from Iris A. Granek, M.D., M.S.

53
Case-control studies
  • Search for differences in allele frequency
    between disease carriers (cases) and non carriers
    (controls) with the assumption differences in
    frequencies are associated with disease outcome.
  • Can be applied to exposure to a chemical or a
    carcinogen instead of allele (genotypes).

54
Case Selection
  • Define the source population
  • residents of a geographic region
  • hospital inpatient or clinic
  • Strict case definition
  • inclusion criteria

55
Control Selection
  • Same source population as the cases
  • Choose the controls by random from the source
    population
  • spouses
  • associates
  • patients within the same facility
  • matched for certain criteria

56
Hospital Controls
  • Without regard to diagnosis
  • Excluding certain diseases
  • Including only diseases believed to be unrelated
    to the exposures (or alleles) being studied
  • Clinic patients from same hospital

57
Case Control Study Design
  • Compares distribution of exposure
  • cases (disease)
  • vs.
  • controls (without disease)

58
Exposure History Cases Controls
  • CASES CONTROLS
  • Exposed a b
  • Not Exposed c d
  • Totals ac bd
  • Proportions a b exposed ac bd

59
Distribution of past benzene exposure among
leukemia cases vs. controls
  • 20 leukemia cases found among large group of
    chemical workers
  • 16 cases had past benzene exposure
  • Proportion of cases exposed to benzene
  • 16/2080
  • 100 healthy controls randomly selected from same
    group of chemical workers
  • 12 controls had past benzene exposure
  • Proportion of controls exposed to benzene
  • 12/10012

60
Odds Ratio Unmatched Analysis
  • CASES CONTROLS
  • EXPOSED a b
  • NOT EXPOSED c d
  • Ratio of odds of exposure in cases a/c
  • odds of exposure in controls b/d
  • Odds Ratio OR ad
  • bc

61
Odds Ratio Unmatched Analysis
  • LUNG CA CONTROLS
  • BENZENE 16 12
  • NO BENZENE 4 88
  • Ratio of odds of exposure in cases 16/4
  • odds of exposure in controls 12/88
  • Odds Ratio OR 16 X 88 29.3
  • 4 X 12

62
Odds Ratios
  • OR gt 1 indicates a positive association between
    the factor and the disease
  • The lung cancer patients were 29 times more
    likely than the controls to have been exposed to
    benzene
  • OR lt 1 indicates the factor is protective
  • OR 1 indicates no association

63
95 Confidence Limits
  • 95 probability that the true value lies within
    the confidence interval or between the confidence
    limits
  • Odds ratios are statistically significant if they
    do not include 1
  • OR 7 (0.5 - 15.0) not statistically significant
  • OR 7 (3.0 - 12.0) is statistically significant

64
Advantages of Case Control
  • Quick and Inexpensive
  • Optimal for rare diseases
  • Useful for diseases of long latency from exposure
    to disease development
  • Can evaluate multiple risk factors

65
Bias in Case Control Studies
  • Bias is a systematic error in the study that
    distorts the results limits the validity of the
    conclusions.
  • Selection Bias
  • Confounding
  • Observation Bias (recall bias, interviewer bias,
    misclassification)

66
Selection Bias
  • Systematic errors arising from the way the
    subjects are selected
  • Study subjects are selected in a way that can
    misleadingly increase or decrease the magnitude
    of an association
  • Exposure of cases differs from exposure of all
    cases in source population or exposure of
    controls selected differs from non diseased in
    source population

67
Selection Bias
Source Population
Study Sample
E E E E X X X X X X X X With disease
E E E E X X Cases
E E E E E E E E X X X X Without disease
E E X X X X Controls
68
Confounding
  • Distortion of the true relationship between the
    exposure and outcome due to a mutual relationship
    with another factor
  • Can be the reason for an apparent association
    also may cause a true association to not be
    observed
  • Confounder must be associated with the outcome
    and the exposure

69
Confounding Factors
Benzene Exposure
Lung Cancer
Cigarette Smoking
(Confounder)
70
Controlling for Confounding
  • The effect of confounding variables
  • can be controlled during the data analysis by
    various methods
  • stratification
  • multivariate analysis
  • can be controlled during the study design by
    matching controls and cases for the factor

71
Matched Case Control Design
  • Controls selected matched to cases on factors
    associated with the disease
  • age, sex, race, socioeconomic status
  • Makes the two groups similar on factors other
    than the exposure of interest
  • Cannot compare groups on matched factors
  • Must used matched analysis

72
Observation Bias
  • Interviewer (data collection) bias
  • keep data collection same for cases and controls
  • Misclassification Bias
  • incorrect characterization of exposure
  • Recall Bias
  • recall of exposures may be influenced by current
    disease status

73
Calculate the Odds Ratio
74
Esophagial cancer and alcohol
75
Fishers Exact Test
http//www.matforsk.no/ola/fisher.htm
76
(No Transcript)
77
(No Transcript)
78
(No Transcript)
79
(No Transcript)
80
(No Transcript)
81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com