ALIGNMENT OF NUCLEOTIDE - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

ALIGNMENT OF NUCLEOTIDE

Description:

An alignment consists of a series of paired bases, one base from each sequence. ... Sequence alignment = The identification of the ... Manual alignment. ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 66
Provided by: marti285
Category:

less

Transcript and Presenter's Notes

Title: ALIGNMENT OF NUCLEOTIDE


1
ALIGNMENT OF NUCLEOTIDEAMINO-ACID SEQUENCES
2
(No Transcript)
3
Homology The term was coined by Richard Owen in
1843. Definition Similarity resulting from
common ancestry.
4
Homology A qualitative statment
  • Homology designates a relationship of common
    descent between entities
  • Two genes are either homologs or not
  • it doesnt make sense to say two genes are 43
    homologous.
  • it doesnt make sense to say Linda is 24
    pregnant.

5
Homology
By comparing homologous characters, we can
reconstruct the evolutionary events that have led
to the formation of the extant sequences from the
common ancestor.
6
Homology
When dealing with sequences, we are interested in
POSITIONAL HOMOLOGY. We identify positional
homology by ALIGNMENT.
7
ACTGGGCCCAAATC
1 deletion 1 substitution
1 insertion 1 substitution
AACAGGGCCCAAATC
CTGGGCCCAGATC
Correct alignment
Incorrect alignment
CTGGGCCCAGATC-- AACAGGGCCCAAATC ..........
--CTGGGCCCAGATC AACAGGGCCCAAATC ..
8
Unknown!
unknown processes
unknown processes
AACAGGGCCCAAATC
CTGGGCCCAGATC
Correct alignment?
Incorrect alignment?
CTGGGCCCAGATC-- AACAGGGCCCAAATC ..........
--CTGGGCCCAGATC AACAGGGCCCAAATC ..
9
ACCTGAATTTGCCC
T9 G5T ACA12
-A6 -A7 T8A G2
ACCTTAATTGCACACC
AGCCTGATTGCCC
ACCTTAATTGCACACC
AGCCTGATTGCCC---
C2G, T4C, A6G, A12C, -ACC14
10
Positional homology A pair of nucleotides from
two aligned sequences that have descended from
one nucleotide in the ancestor of the two
sequences.
Alignment A hypothesis concerning positional
homology among residues in a sequence.
11
An alignment consists of a series of paired
bases, one base from each sequence. There are
three types of pairs(1) matches the same
nucleotide appears in both sequences. (2)
mismatches different nucleotides are found in
the two sequences. (3) gaps a base in one
sequence and a null base in the other.
GCGGCCCATCAGGTAGTTGGTG-G GCGTTCCATC--CTGGTTGGTGTG
.. ..
12
Sequence alignment The identification of the
location of deletion or insertions that might
have occurred in either of the two lineages since
their divergence from a common ancestor.
Insertion Deletion Indel or Gap
13
Sequence alignment 1. Pairwise alignment 2.
Multiple alignment
14
- Two DNA sequences A and B.- Lengths are m and
n, respectively. - The number of matched pairs
is x. - The number of mismatched pairs is y. -
Total number of bases in gaps is z.
15
There are terminal and internal gaps.
GCGG-CCATCAGGTAGTTGGTG-- GCGTTCCATC--CTGGTTGGTGTG
16
A terminal gap may indicate missing data.
GCGG-CCATCAGGTAGTTGGTG-- GCGTTCCATC--CTGGTTGGTGTG
17
An internal gap indicates that a deletion or an
insertion has occurred in one of the two
lineages.
GCGG-CCATCAGGTAGTTGGTG-- GCGTTCCATC--CTGGTTGGTGTG
18
The alignment is the first step in many
evolutionary and functional studies. Errors in
alignment tend to amplify in later computational
stages.
19
Methods of alignment 1. Manual 2. Dot
matrix 3. Distance Matrix 4. Combined (Distance
Manual)
20
  • Manual alignment. When there are few gaps and the
    two sequences are not too different from each
    other, a reasonable alignment can be obtained by
    visual inspection.

GCG-TCCATCAGGTAGTTGGTGTG GCGTTCCATCAGGTGGTTGGTGTG
.
21
Advantages of manual alignment (1) use of a
powerful and trainable tool (the brain, well,
some brains).(2) ability to integrate
additional data, e.g., domain structure,
biological function.
22
(No Transcript)
23
(No Transcript)
24
Protein Alignment may be guided by Tertiary
Structures
Escherichia coli DjlA protein
Homo sapiens DjlA protein
25
Disadvantages of manual alignment (1) The
method is subjective and unscalable.
26
The dot-matrix method The two sequences are
written out as column and row headings of a
two-dimensional matrix. A dot is put in the
dot-matrix plot at a position where the
nucleotides in the two sequences are identical.
27
The alignment is defined by a path from the
upper-left element to the lower-right element.
28
There are 4 possible steps in the path
  • (1) a diagonal step through a dot match.
  • (2) a diagonal step through an empty element of
    the matrix mismatch.
  • (3) a horizontal step a gap in the sequence on
    the top of the matrix.
  • (4) a vertical step a gap in the sequence on
    the left of the matrix.

29
forbiddendirections
alloweddirections
30
A dot matrix may become cluttered. With DNA
sequences, 25 of the elements will be occupied
by dots by chance alone.
31
window size 1 stringency 1 alphabet size 4
The number of spurious matches is determined by
window size, stringency, alphabet size.
32
window size 1 stringency 1 alphabet size 4
window size 3 stringency 2 alphabet size 4
33
window size 1 stringency 1 alphabet size 20
34
Dot-matrix methodsAdvantages May unravel
information on the evolution of sequences.
35
Window size 60 amino acids Stringency 24
matches
Advantages Highlighting Information
36
Window size 60 amino acids Stringency 24
matches
Advantages Highlighting Information
The two diagonally oriented parallel lines most
probably indicate that a small internal
duplication has occurred in the bacterial gene.
37
Dot-matrix methodsDisadvantage May not
identify the best alignment.
38
Distance and similarity methods
39
The best possible alignment (optimal alignment)
is the one in which the numbers of mismatches and
gaps are minimized according to certain criteria.
40
Unfortunately, reducing the number of mismatches
results in an increase in the number of gaps, and
vice versa.
41
a matches b mismatches g nucleotides in
gaps d gaps
42
Gap penalty (or cost) is a factor (or a set of
factors) by which the gap values (numbers and
lengths of gaps) are multiplied to make the gaps
equivalent in value to the mismatches. The gap
penalties are based on our assessment of how
frequent different types of insertions and
deletions occur in evolution in comparison with
the frequency of occurrence of point
substitutions.
43
Mismatch penalty is an assessment of how
frequently substitutions occur.
44
  • The distance (dissimilarity) index (D) between
    two sequences in an alignment is

where yi is the number of mismatches of type i,
mi is the mismatch penalty for an i-type of
mismatch, zk is the number of gaps of length k,
and wk is a positive number representing the
penalty for gaps of length k.
45
  • The similarity index (S) between two sequences in
    an alignment is

where x is the number of matches, zk is the
number of gaps of length k, and wk is a positive
number representing the penalty for gaps of
length k.
46
The gap penalty has two components a gap-opening
penalty and a gap-extension penalty.
47
Three main systems (1) Fixed gap-penalty
system 0 gap-extension costs. (2) Linear
gap-penalty system the gap-extension cost is
calculated by multiplying the gap length minus 1
by a constant representing the gap-extension
penalty for increasing the gap by 1. (3)
Logarithmic gap-penalty system the
gap-extension penalty increases with the
logarithm of the gap length, i.e., slower.
48
(No Transcript)
49
Further complications Distinguishing among
different matches and mismatches.For example, a
mismatched pair consisting of Leu Ile, which
are very similar biochemically to each other, may
be given a lesser penalty than a mismatched pair
consisting of Arg Glu, which are very
dissimilar from each other.
50
Lesser penalty than
51
Alignment algorithms
52
Aim Find the alignment associated with the
smallest D (or largest S) from among all possible
alignments.
53
The number of possible alignments may be
astronomical. For example, when two sequences
300 residues long each are compared, there are
1088 possible alignments. In comparison, the
number of elementary particles in the universe is
only 1080.
54
There are computer algorithms for finding the
optimal alignment between two sequences that do
not require an exhaustive search of all the
possibilities.
55
The Needleman-Wunsch algorithmuses Dynamic
Programming
56
Dynamic programming a computational technique.
It is applicable when large searches can be
divided into a succession of small stages, such
that (1) the solution of the initial search stage
is trivial, (2) each partial solution in a later
stage can be calculated by reference to only a
small number of solutions in an earlier stage,
and (3) the last stage contains the overall
solution.
57
Multiple Sequence Alignment
58
Alignments can be easy or difficult
GCGGCCCA TCAGGTAGTT GGTGG
GCGGCCCA TCAGGTAGTT GGTGG
Easy
GCGTTCCA TCAGCTGGTT GGTGG
GCGTCCCA TCAGCTAGTT GGTGG
GCGGCGCA TTAGCTAGTT GGTGA
...
... .
TTGACATG CCGGGG---A AACCG
T-GACATG CCGGTG--GT AAGCC
TTGGCATG -CTAGG---A ACGCG
Difficult
TTGACATG -CTAGGGAAC ACGCG
TTGACATC -CTCTG---A ACGCG
.. ...
.
...
59
(No Transcript)
60
Multiple Alignment
  • 2 methods
  • Dynamic programming (exhaustive, exact)
  • Consider 2 protein sequences of 100 amino acids
    in length.
  • If it takes 1002 seconds to exhaustively align
    these sequences, then it will take 1003 seconds
    to align 3 sequences, 1004 to align 4
    sequences...etc.
  • More time than the universe has existed to align
    20 sequences exhaustively.
  • Progressive alignment (heuristic, approximate)

61
Progressive Alignment
  • Devised by Feng and Doolittle in 1987.
  • Essentially a heuristic method and as such is not
    guaranteed to find the optimal alignment.
  • Requires n-1n-2n-3...n-n1 pairwise alignments
    as a starting point
  • Most successful implementation is Clustal (Des
    Higgins)

62
Overview ofClustal Procedure
CLUSTAL
Hbb_Human 1 -
Hbb_Horse 2 .17 -
1. Quick pairwise alignments 2. Distances for
each pair 3. Distance matrix
Hba_Human 3 .59 .60 -
Hba_Horse 4 .59 .59 .13 -
Myg_Whale 5 .77 .77 .75 .75 -
Hbb_Human
4
1
3
Hbb_Horse
Neighbor-joining tree (guide tree)
Hba_Human
2
Hba_Horse
Myg_Whale
1 PEEKSAVTALWGKVN--VDEVGG
Progressive alignment following guide tree
4
1
3
2 GEEKAAVLALWDKVN--EEEVGG
3 PADKTNVKAAWGKVGAHAGEYGA
2
4 AADKTNVKAAWSKVGGHAGEYGA
5 EHEWQLVLHVWAKVEADVAGHGQ
63
Clustal good points/bad points
  • Advantages
  • Speed.
  • Disadvantages
  • No way of knowing if the alignment is correct.

64
Effect of gap penalties on amino-acid alignment
Human pancreatic hormone precursor versus
chicken pancreatic hormone (a) Penalty
for gaps is 0 (b) Penalty for a gap of size k
nucleotides is wk 1 0.1k (c) The same
alignment as in (b), only the similarity between
the two sequences is further enhanced by showing
pairs of biochemically similar amino acids
65
An Alignment
GCGGCTCA TCAGGTAGTT GGTG-G
Spinach
GCGGCCCA TCAGGTAGTT GGTG-G
Rice
GCGTTCCA TC--CT-GTT GGTGTG
Mosquito
GCGTCCCA TCAGCTAGTT GTTG-G
Monkey
GCGGCGCA TTAGCTAGTT GGTG-A
Human
...
. . .
Write a Comment
User Comments (0)
About PowerShow.com