Sequence%20alignment - PowerPoint PPT Presentation

About This Presentation
Title:

Sequence%20alignment

Description:

Sequence alignment Gabor T. Marth Department of Biology, Boston College marth_at_bc.edu Biologically significant alignment Biologically plausible alignment Spurious ... – PowerPoint PPT presentation

Number of Views:226
Avg rating:3.0/5.0
Slides: 27
Provided by: Gabo88
Category:

less

Transcript and Presenter's Notes

Title: Sequence%20alignment


1
Sequence alignment
BI420 Introduction to Bioinformatics
Gabor T. Marth
Department of Biology, Boston College marth_at_bc.edu
2
Biologically significant alignment
hba_human
hbb_human
http//artedi.ebc.uu.se/programs/pairwise.html
3
Biologically plausible alignment
4
Spurious alignment
(BRCA1 variant)
Examples from Biological sequence analysis.
Durbin, Eddy, Krogh, Mitchison
5
Alignment types
How do we align the words CRANE and FRAME?
CRANE FRAME
3 matches, 2 mismatches
How do we align words that are different in
length?
COELACANTH P-ELICAN--
COELACANTH -PELICAN--
5 matches, 2 mismatches, 3 gaps
In this case, if we assign 1 points for matches,
and -1 for mismatches or gaps, we get 5 x 1 1 x
(-1) 3 x (-1) 0. This is the alignment score.
Examples from BLAST. Korf, Yandell, Bedell
6
Finding the best alignment
COELACANTH PE-LICAN--
COELACANTH P-EL-ICAN-
COELACANTH PELICAN--
S-6
S-10
S-2
COELACANTH P-ELICAN--
S0
7
Global alignment Needleman-Wunsch
Aligning words SHAKE and SPEARE
Example from Higgs and Attwood
8
Local alignment Smith-Waterman
Example from Higgs and Attwood
9
Visualizing pair-wise alignments
10
Sequence similarity and scoring
Match-mismatch-gap penalties e.g. Match 1
Mismatch -5 Gap -10
Scoring matrices
11
Multiple alignments
clustalW
12
Anchored multiple alignment
13
Similarity searching vs. alignment
Alignment
Similarity search
query
database
14
The BLAST algorithms
Program Database Query Typical Uses
BLASTN Nucleotide Nucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying related transcripts.
BLASTP Protein Protein Identifying common regions between proteins. Collecting related proteins for phylogenetic analysis.
BLASTX Protein Nucleotide Finding protein-coding genes in genomic DNA.
TBLASTN Nucleotide Protein Identifying transcripts similar to a known protein (finding proteins not yet in GenBank). Mapping a protein to genomic DNA.
TBLASTX Nucleotide Nucleotide Cross-species gene prediction. Searching for genes missed by traditional methods.
15
BLAST report
16
BLAST report
gi7428631
http//www.ncbi.nih.gov/BLAST/
17
The BLAST algorithm
Sequence alignment takes place in a 2-dimensional
space where diagonal lines represent regions of
similarity. Gaps in an alignment appear as broken
diagonals. The search space is sometimes
considered as 2 sequences and somtimes as query x
database.
  • Global alignment vs. local alignment
  • BLAST is local
  • Maximum scoring pair (MSP) vs. High-scoring pair
    (HSP)
  • BLAST finds HSPs (usually the MSP too)
  • Gapped vs. ungapped
  • BLAST can do both

18
The BLAST algorithm
BLOSUM62 neighborhood of RGD
RGD 17 KGD 14 QGD 13 RGE 13 EGD 12 HGD 12 NGD 12 R
GN 12 AGD 11 MGD 11 RAD 11 RGQ 11 RGS 11 RND 11 RS
D 11 SGD 11 TGD 11
  • Speed gained by minimizing search space
  • Alignments require word hits
  • Neighborhood words
  • W and T modulate speed and sensitivity

T12
19
Word length
20
2-hit seeding
  • Alignments tend to have multiple word hits.
  • Isolated word hits are frequently false leads.
  • Most alignments have large ungapped regions.
  • Requiring 2 word hits on the same diagonal (of 40
    aa for example), greatly increases speed at a
    slight cost in sensitivity.

21
Extension of the seed alignments
  • Alignments are extended from seeds in each
    direction.
  • Extension is terminated when the maximum score
    drops below X.

The quick brown fox jumps over the lazy dog. The
quiet brown cat purrs when she sees him.
Text example match 1 mismatch -1 no gaps
22
BLAST statistics
gtgi23098447refNP_691913.1 (NC_004193)
3-oxoacyl-(acyl carrier protein)
reductase Oceanobacillus iheyensis
Length 253 Score 38.9 bits (89), Expect
3e-05 Identities 17/40 (42), Positives
26/40 (64) Frame -1Query 4146
VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027
VTGA GGAI A G V DN GA
VISbjct 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGA
QSVVEEI 49
How significant is this similarity?
23
Scoring the alignment
Query 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVK
QI 4027 VTGA GGAI A G V DN
GA VISbjct 10 VTGAASGMGKAIATLYASEGAKVIVAD
LNEEGAQSVVEEI 49
4
-1
4
S (score)
24
The Karlin-Altschul equation
The Expect or E-value
Scaling factor
A minor constant
Normalized score
Expected number of alignments
Raw score
Length of query
Length of database
Search space
The P-value
25
The sum-statistics
Sum statistics increases the significance
(decreases the E-value) for groups of consistent
alignments.
26
The sum-statistics
The sum score is not reported by BLAST!
Write a Comment
User Comments (0)
About PowerShow.com