Heuristic approaches

About This Presentation

Title:

Heuristic approaches

Description:

BIOTECHNOLOGY – PowerPoint PPT presentation

Number of Views:39

Slides: 17

Provided by: m.prasadnaidu

Category: Medicine, Science & Technology

Tags: good

more less

Transcript and Presenter's Notes

Title: Heuristic approaches

1
Heuristic approaches scoring matrices

M.Prasad Naidu
MSc Medical Biochemistry, Ph.D,.

2
Introduction

Two algorithms are there in these methods
BLAST
FASTA
FastA is an algorithm developed by Pearson and
Lipman. Its more sensitive than Blast.
Blast is an algorithm developed by Altschul et
al., in 1990. It provides tools for high scoring
local alignment between two sequences. Now a
days, a gapped versions are available.

3
BLASTP algorithm

Blast Algorithm involves the following steps.
Breaking of the sequence into defined word size.
Finding a match or HSP (High Scoring Pair).
Alignment of the word and extending the alignment.

4
Breaking of the sequence into defined word size

Query AILDTGATGDA
Word size 4
AILDTGATGDA

AILD ILDT LDTG DTGA TGAT GATG
ATGD TGDA
5
Finding a High scoring Pair

MQVWGWAILDTVATDAAMLL
AILD

6
Extending the alignment

MQVWGWAILDTVATDAAMLL
..AILDTGATGDA

Parameters in BLAST result Percentage of
Homology Scoring of the alignment No of residues
aligned E-value
7
FastA algorithm

The word size in FastA algorithm is defined as
K-tuple.
Generally the K-tuple for the algorithm is either
3 or 4 for nucleotide sequences and 1 or 2 for
protein sequences.
FastA algorithm also involves the steps similar
to that of the BLAST tool. But the alignment
generation procedure is different.

8
Breaking of the sequence into defined k-tuple

F A M L G F I K Y L P G C M
1 2 3 4 5 6 7 8 9 10 11 12 13 14

A B C D E F G H I K L M
2 13 1 5 7 8 4 3
6 12 10 14
N P Q R S T V W Y Z
11 9

9
A B C D E F G H I K L M
2 13 1 5 7 8 4 3
6 12 10 14
N P Q R S T V W Y Z
11 9

T 1 G 2 F 3 I 4 K 5 Y 6 L 7 P 8 G 9 A 10 C 11 T 12
3 -2 3 3 3 -3 3 -4 -8 2
10 3 3 3
The most occuring number in the algorithm is 3,
so the alignment starts after leaving three
characters or residues
10
Alignment of the sequences

F A M L G F I K Y L P G C M
T G F I K Y L P G A C T

Parameters in FASTA result Percentage of
Homology Scoring of the alignment No of residues
aligned P-Score
11
Scoring schemes

Identity scoring matrix
Residue to residue scores are represented here in
the form of similarity.
A 4 X 4 matrix is built for the nucleotides and
20 X 20 matrix for the amino acids.
For match score is 1 and mismatch is -1

A T G C
A 1 0 0 0
T 0 1 0 0
G 0 0 1 0
C 0 0 0 1
12
PAM Matrices

These were first developed by Margaret Dayhoff
and co-workers in 1978.
This model assumes that evolutionary changes
follow the markov model i.e. residual changes
occur independent on the previous mutation. One
PAM is a unit of evolutionary divergence in which
there is 1 amino acid change but it doesnt
imply that 100 PAM results in different
aminoacids.
Dayhoff and coworkers have calculated the
frequencies of accepted mutations for 1PAM by
analyzing closely related families of sequences.
The scores are represented as log odd ratios.
The 1PAM can be extended to any no of PAMS. For
example, 1PAM table is extended to N X 1PAM.
For closely related protein sequences, lower
distance PAM is used and higher PAM is used for
variying proteins.
PAM 30 is used for closer proteins and PAM 250
for divergent ones.

13
PAM 250 scoring matrix
14
BLOSUM Matrices

These matrices are developed by Heinkoff and
Heinkoff in 1991.
The matrices have been constructed in a similar
fashion as PAM matrices.
The data was derived for local alignment of
distantly related proteins deposited in the
BLOCKS database.
BLOSUM 30 is used for comparing highly divergent
sequences and BLOSUM 90 is used for closely
related proteins.
Commonly used BLOSUM matrix is BLOSUM 62 that is
used for proteins with 62 identities.

15
BLOSUM 62 Matrix
16
THANK YOU

Write a Comment

User Comments (0)

About PowerShow.com

Heuristic approaches - PowerPoint PPT Presentation

Heuristic approaches

BIOTECHNOLOGY – PowerPoint PPT presentation