Figures of Merits (1) Assessing the quality of a solution - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Figures of Merits (1) Assessing the quality of a solution

Description:

1. Infer the most likely phylogenetic tree. 2. Compute conservation for each site ... is genomically indistinguishable and phylogenetically closely related was found ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 81
Provided by: ronsh
Category:

less

Transcript and Presenter's Notes

Title: Figures of Merits (1) Assessing the quality of a solution


1
TREES
2
Trees
3
Same thing

4
Evaluation of the tree topology
The maximum parsimony principle
5
Genes 0 absent, 1 present Genes 0 absent, 1 present Genes 0 absent, 1 present Genes 0 absent, 1 present Genes 0 absent, 1 present Genes 0 absent, 1 present
species g1 g2 g3 g4 g5 g6
s1 1 0 0 1 1 0
s2 0 0 1 0 0 0
s3 1 1 0 0 0 0
s4 1 1 0 1 1 1
s5 0 0 1 1 1 0
6
Evaluate this tree
s2
s1
s4
s3
s5
7
Gene number 1
s1
s4
s3
s2
s5
1
1
1
0
0
8
Gene number 1, Option number 1.
1
1
s1
s4
s3
s2
s5
1
1
1
0
0
9
Gene number 1, Option number 2.
s1
s4
s3
s2
s5
Number of changes for g1 1
10
Gene number 2, Option number 1.
s2
s1
s4
s3
s5
11
Gene number 2, Option number 2.
s2
s1
s4
s3
s5
12
Gene number 2, Option number 3.
s2
s1
s4
s3
s5
Number of changes for g2 2
13
Gene number 3, Option number 1.
s2
s1
s4
s3
s5
14
Gene number 3, Option number 2.
s2
s1
s4
s3
s5
Number of changes for g3 1
15
Gene number 4, Option number 1.
s2
s1
s4
s3
s5
16
Gene number 4, Option number 2.
s2
s1
s4
s3
s5
Number of changes for g4 2
17
Gene number 5 is the same as Gene number 4
Number of changes for g5 2
18
Gene number 6, 1option only
s2
s1
s4
s3
s5
Number of changes for g6 1
19
Sum of changes
Number of changes for g1 1
Number of changes for g2 2
Number of changes for g3 1
Number of changes for g4 2
Number of changes for g5 2
Number of changes for g6 1
Sum of changes for this tree topology 9
Can we do better ???
20
The MP (most parsimonious) tree
s2
s1
s4
s3
s5
Sum of changes for this tree topology 8
21
TR TREE ROOTED
How many rooted trees?
N2, TR(2) 1
N3, TR(3) 3
N4, TR(4) 15
22
How many rooted trees
2 sequences 1 tree 3 sequences 3 trees 4
sequences 3515 trees 5 sequences 357105
trees. TR(n) 1357..(2n-3)
23
(No Transcript)
24
Rooting...
25
Rooting the tree
26
Rooted vs. unrooted trees
3
1
2
3
1
2
27
Rooted vs. Unrooted
The position of the root does not affect the MP
score.
28
Intuition why rooting doesnt change the score
Gene number 1, Option number 1.
1
1
s1
s4
s3
s2
s5
1
1
1
0
0
The change will always be on the same branch, no
matter where the root is positioned
29
How can we root the tree? we want rooted trees!
30
(No Transcript)
31
(No Transcript)
32
Gorilla gorilla (Gorilla)
Pan troglodytes (Chimpanzee)
Homo sapiens (human)
Gallus gallus (chicken)
33
Evaluate all 3 possible UNROOTED trees
MP tree
34
Rooting based on a priori knowledge
Human
Chicken
Gorilla
Chimp
Human
Chimp
Chicken
Gorilla
35
Ingroup / Outgroup
Chicken
Human
Chimp
Gorilla
INGROUP
OUTGROUP
36
Monophyletic groups
Chicken
Human
Chimp
Gorilla
The GorillaHumanChimp are monophyletic
37
How to efficiently compute the MP score of a tree
38
The Fitch algorithm (1971)
Post-order tree scan. In each node, if the
intersection between the child-nodes is empty we
apply a union operator. Otherwise, an
intersection.
39
Number of changes
Total number of changes number of union
operators.
40
Likelihood
41
  • Parsimony has many shortcomings. To name a few
  • All changes are counted the same, which is not
    true for biological systems (Leu-gtIle is much
    more likely than Leu-gtHis).
  • Cannot take biological context into account
    (secondary structures, dependencies among sites,
    evolutionary distances between the analyzed
    organisms, etc).
  • Statistical basis questionable.

42
Alternative MAXIMUM-LIKELIHOOD METHOD
43
Maximum likelihood uses a probabilistic model of
evolution Each amino acid has a certain
probability to change and this probability
depends on the evolutionary distance. Evolutionar
y distances are inferred from the entire set of
sequences.
44
Evolutionary distances
Positions in an alignment can be conserved due to
two reasons. Either because of functional
constraints, or because a short evolutionary time
elapsed since the divergence of the organisms. 5
replacements in 10 positions between 2 chimps, is
considered very variable. 5 replacements between
human and cucumber, is not considered too
variable Maximum likelihood takes this
information into account.
45
Maximum Parsimony Maximum Likelihood
All changes are considered the same Different probabilities to different types of substitutions
Statistically questionable Statistically robust
Ignores biological context Accounts for biological context
46
The likelihood computations
With likelihood models we can 1. Infer the most
likely phylogenetic tree 2. Compute conservation
for each site
47
Maximum likelihood tree reconstruction
This is incredibly difficult (and challenging)
from the computational point of view, but
efficient algorithms to find approximate
solutions were developed.
48
Tree reconstruction using distance based methods
  • Two steps
  • Compute a distance D(i,j) between any two
    sequences i and j.
  • Find the tree that agrees most with the distance
    table.

49
Neighbor-joining is based on Star decomposition
Red best pair to group together
B
E
A
(C,B)
C
A
D
D
E
In each step we cluster a pair so that the sum of
branches is minimal
A
((C,B),E)
D
50
(No Transcript)
51
A few words on Human Immunodeficiency Virus
(HIV) The virus HIV The disease/syndrome
Aquired Immunodeficiency First recognized
clinically in 1981. By 1992, it had become the
major cause of death in individuals of 25-44
years of age in the U.S.
52
HIV Till Dec 2002 20 million people died of
AIDS. Infected in 2002 5 millions. Number of
currently infected 42 millions
1 out of every 100 adults of age 15-49 in the
world population.
53
HIV HIV is the leading cause of death in
sub-Sharan Africa. In some parts of this region
25-30 of the population is infected.
1 out of 3 children in these areas lost at least
one of his parents.
54
Sub-Saharan Africa refers to the territories
south to the Sahara. In the past the term Black
Africa has also been used to refer to the same
region however today it is obsolete due to its
politically incorrectness Tropical Africa
might be taken as an alternative label of the
same region however it excludes South Africa,
which lies outside the tropics.
55
HIV is a lentivirus Species HIV Genus
Lentiviruses Family Retroviridae Lentiviruses
have long incubation time, and are thus called
slow viruses.
56
HIV-1 and HIV-2 In 1986, a distinct type of HIV
prevalent in certain regions of West Africa was
discovered and was termed HIV type
2. Individuals infected with type 2 also had
AIDS, but had longer incubation time and lower
morbidity.
57
Morbidity vs. Mortality
  • Morbidity the prevalence of a disease
  • ????? ???????

The probability that a randomly selected person
out of the entire population is ill, at time t.
58
Morbidity vs. Mortality
Mortality Deaths from a disease or at general
  • Mortality rate Death rate
  • ????? ??????

59
Origin of HIV-1 in the chimpanzee Pan troglodytes
troglodytes
Nature Vol. 397. Pages 436-441. 1999.
60
Five lines of evidence have been used to
substantiate zoonotic transmission of primate
lentivirus 1. Similarities in viral genome
organization 2. Phylogenetic relatedness 3.
Geographic coincidence 4. Plausible routes of
transmission 5. Prevalence in the natural host.
61
For HIV-2, a virus (SIVsm) that is genomically
indistinguishable and phylogenetically closely
related was found in substantial numbers of
wild-living sooty mangabeys whose natural habitat
coincides with the epicenter of the HIV-2 epidemic
62
?????, ??? ???? ??? ???? ???????? ???? ??????
?????? ?? ??????
63
Close contact between sooty mangabeys and humans
is common because these monkey are hunted for
food and kept as pets. No fewer than six
independent transmissions of SIVsm to humans have
been proposed.
The origin of HIV-1 is much less certain.
64
HIV-1 is most similar in sequence and genomic
organization to viruses found in chimpanzees
(SIVcpz).
65
  • BUT, there are several doubts casting the theory
    that chimpanzees are the natural host and
    reservoir for HIV-1
  • There is a wide spectrum of diversity between
    HIV-1 and SIVcpz.
  • An apparent low prevalence of SIVcpz infection
    in wild-living animals.
  • The presence of chimpanzees in geographic
    regions of Africa where AIDS was not initially
    recognized.

66
Rather, it has been suggested that another, yet
unidentified, primate species could be the
natural host for SIVcpz and HIV-1.
67
Marilyn
We recently identified a fourth chimpanzee with
natural SIVcpz infection This animal
(Marilyn) was wild-caught in Africa (county of
origin unknown), exported to the United States as
an infant, and used as a breeding female in a
primate facility until her death at age 26.
68
HOW was the SIV found
During a serosurvey in 1985, Marilyn was the only
chimpanzee of 98 tested who had antibodies
strongly reactive against HIV-1 by enzyme-linked
immunosorbent assay (ELISA) and western
immunoblot.
69
Maybe Marylin was infected with HIV during her
stay in the U.S.?
She has never been used in AIDS research and had
not received human blood products after 1969. She
died in 1985 after giving birth to still-born
twins.
70
To convince that she did not have AIDS
An autopsy revealed endometritis, retained
placental elements and sepsis as the final cause
of death. Depletion of lymhoid tissues was not
noted.
Endometritis ???? ????? ???? Sepsis ??? ??
71
PCR was used to amplify HIV- or SIV-related DNA
sequences directly from uncultured (frozen)
spleen and lymph-node tissue obtained at the
autopsy in order to characterize the infection
responsible for Marilyns HIV-1 seropositivity.
72
Amplification and sequence analysis of subgenomic
gag (508 base pairs (bp)) and pol (766 bp)
fragments revealed the presence of a virus
related to, but distinct from, known SIVcpz and
HIV-1 strains.
73
PCR was used to amplify and sequence four
overlapping subgenomic fragments that together
comprised a complete proviral genome. The genome
was termed SIVcpzUS.
74
Provirus The "provirus" is the form of the virus
which is capable of being integrated into the
host genome. In the case of HIV it means the
DNA "copy" of the HIV genome (HIV normally
carries its genes around in RNA form).
75
Provirus As far as the host cell's cellular
machinery is concerned, this extra DNA is not
different from the self DNA.
76
Only three other SIVcpz strains have been
reported Two from animals wild-caught in Gabon
(SIVcpzGAB1 and SIVcpzGAB2) One from a
chimpanzee exported to Belgium from Zaire
(SIVcpzANT).
77
SIVcpzGAB1 and SIVcpzANT have been sequenced
completely, but only 280bp of the pol sequence
are available for SIVcpzGAB2.
78
  • To determine the evolutionary relationships of
    SIVcpzUS to these and other HIV and SIV
    sequences
  • Sequences from the HIV sequence database
    (http//hiv-web.lanl.gov/HTML/compendium.html)
    were downloaded.
  • Neighbour-joining was used to construct the tree,
    based on the full-length Pol sequences.
  • Maximum likelihood was also used and yielded
    very similar topologies

79
The neighbour-joining method was applied to
protein-sequence distances calculated by the
method of Kimura. Clade support values were
computed with 1,000 bootstrap replicates. NJ
computations were computed using the CLUSTAL_X
program.
80
These analyses identified SIVcpzUS unambiguously
as a new member of the HIV-1/SIVcpz group of
viruses.
Write a Comment
User Comments (0)
About PowerShow.com