Heuristic approaches - PowerPoint PPT Presentation

About This Presentation
Title:

Heuristic approaches

Description:

BIOTECHNOLOGY – PowerPoint PPT presentation

Number of Views:39
Slides: 17
Provided by: m.prasadnaidu
Tags: good

less

Transcript and Presenter's Notes

Title: Heuristic approaches


1
Heuristic approaches scoring matrices
  • M.Prasad Naidu
  • MSc Medical Biochemistry, Ph.D,.

2
Introduction
  • Two algorithms are there in these methods
  • BLAST
  • FASTA
  • FastA is an algorithm developed by Pearson and
    Lipman. Its more sensitive than Blast.
  • Blast is an algorithm developed by Altschul et
    al., in 1990. It provides tools for high scoring
    local alignment between two sequences. Now a
    days, a gapped versions are available.

3
BLASTP algorithm
  • Blast Algorithm involves the following steps.
  • Breaking of the sequence into defined word size.
  • Finding a match or HSP (High Scoring Pair).
  • Alignment of the word and extending the alignment.

4
Breaking of the sequence into defined word size
  • Query AILDTGATGDA
  • Word size 4
  • AILDTGATGDA

AILD ILDT LDTG DTGA TGAT GATG
ATGD TGDA
5
Finding a High scoring Pair
  • MQVWGWAILDTVATDAAMLL
  • AILD

6
Extending the alignment
  • MQVWGWAILDTVATDAAMLL
  • ..AILDTGATGDA

Parameters in BLAST result Percentage of
Homology Scoring of the alignment No of residues
aligned E-value
7
FastA algorithm
  • The word size in FastA algorithm is defined as
    K-tuple.
  • Generally the K-tuple for the algorithm is either
    3 or 4 for nucleotide sequences and 1 or 2 for
    protein sequences.
  • FastA algorithm also involves the steps similar
    to that of the BLAST tool. But the alignment
    generation procedure is different.

8
Breaking of the sequence into defined k-tuple
  • F A M L G F I K Y L P G C M
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14

A B C D E F G H I K L M
2 13 1 5 7 8 4 3
6 12 10 14
N P Q R S T V W Y Z
11 9

9
A B C D E F G H I K L M
2 13 1 5 7 8 4 3
6 12 10 14
N P Q R S T V W Y Z
11 9

T 1 G 2 F 3 I 4 K 5 Y 6 L 7 P 8 G 9 A 10 C 11 T 12
3 -2 3 3 3 -3 3 -4 -8 2
10 3 3 3
The most occuring number in the algorithm is 3,
so the alignment starts after leaving three
characters or residues
10
Alignment of the sequences
  • F A M L G F I K Y L P G C M
  • T G F I K Y L P G A C T

Parameters in FASTA result Percentage of
Homology Scoring of the alignment No of residues
aligned P-Score
11
Scoring schemes
  • Identity scoring matrix
  • Residue to residue scores are represented here in
    the form of similarity.
  • A 4 X 4 matrix is built for the nucleotides and
    20 X 20 matrix for the amino acids.
  • For match score is 1 and mismatch is -1

A T G C
A 1 0 0 0
T 0 1 0 0
G 0 0 1 0
C 0 0 0 1
12
PAM Matrices
  • These were first developed by Margaret Dayhoff
    and co-workers in 1978.
  • This model assumes that evolutionary changes
    follow the markov model i.e. residual changes
    occur independent on the previous mutation. One
    PAM is a unit of evolutionary divergence in which
    there is 1 amino acid change but it doesnt
    imply that 100 PAM results in different
    aminoacids.
  • Dayhoff and coworkers have calculated the
    frequencies of accepted mutations for 1PAM by
    analyzing closely related families of sequences.
  • The scores are represented as log odd ratios.
  • The 1PAM can be extended to any no of PAMS. For
    example, 1PAM table is extended to N X 1PAM.
  • For closely related protein sequences, lower
    distance PAM is used and higher PAM is used for
    variying proteins.
  • PAM 30 is used for closer proteins and PAM 250
    for divergent ones.

13
PAM 250 scoring matrix
14
BLOSUM Matrices
  • These matrices are developed by Heinkoff and
    Heinkoff in 1991.
  • The matrices have been constructed in a similar
    fashion as PAM matrices.
  • The data was derived for local alignment of
    distantly related proteins deposited in the
    BLOCKS database.
  • BLOSUM 30 is used for comparing highly divergent
    sequences and BLOSUM 90 is used for closely
    related proteins.
  • Commonly used BLOSUM matrix is BLOSUM 62 that is
    used for proteins with 62 identities.

15
BLOSUM 62 Matrix
16
THANK YOU
Write a Comment
User Comments (0)
About PowerShow.com