Title: Interactive tools and programming environments for sequence analysis
1Interactive tools and programming environments
for sequence analysis
TATACATAAAGACCCAAATGGAACTGTTCTAGATGATACACTAGCATTAA
GAGAAAAATTCGAAGAATCAGTCGATAAATACAAACTTCATTTTACTGGA
TTAATCGCTGACAAAATTGCAAAAGAAAAACTGAATACTTACGTCCTCAC
TTATAAAAAAGCAGACGAAGCTATGCCTGCAGACGAAGCTATGCCAACTG
ATGTACCTAGTACTTCTGTTACTGGATCAACAATGGCAAAC.
- Bernardo Barbiellini
- Northeastern University
2Overview
- Matlab and Darwin bioinformatics tools
- Dotplot and Statistical signifance of alignments
- Scoring Matrices from Evolution Model
- Evolutionary Distances and Phylogenetic Trees.
- Unified approach for the sequence alignment and
structure prediction
3Matlab toolbox and Darwin
- Computer language appropriate for bioinformatics
- A workbench to automate repetitive tasks
- Based on Linear Algebra Statistics
- Matlab toolbox developed by Mathworks
- Darwin developed by Gaston Gonnet (ETHZ)
4Extra features
- Loading of and retrieval in sequence databases
- Fast searching for sequence fragments
- Sequence alignment
- Generation of random sequences, distributions and
mutations - Creation of Phylogenetic trees
- Plotting functions - matrix and vector arithmetic
- I/O comunicate with other programs
5Calling Bioperl functions in MATLAB
Documentation by Brian Madsen (NU and coop at
the Mathworks) gtgt help perl PERL calls perl
script using appropriate operating system
PERL(PERLFILE) calls perl script specified by the
file PERLFILE using appropriate perl
executable. PERL(PERLFILE,ARG1,ARG2,...)
passes the arguments ARG1,ARG2,... to the
perl script file PERLFILE, and calls it by using
appropriate perl executable.
RESULTPERL(...) outputs the result of attempted
perl call.
6Visual Tool Dotplot (1)
Pairwise sequence comparison
7Visual Tool Dotplot (2)
Filtered Image
The best alignment is achieved with dynamic
programming . A score is obtained
8Quantitative Tools To CheckStatistical
Significance
extreme value distribution.
Score in bits
Simulation with random sequences
9PAM Evolution Model
The score of a paiwise alignment is obtained by
using a scoring matrix. We need a model to build
scoring matrices. This model is based on
evolution in order to calculate evolution
distances between species.
PAM means Accepted Point Mutation
10Step1 Order of the Amino-Acids
11Step 2 Mutation Matrices
Markov Model pamX(pam1)X Stochastic matrices
12Step 3 Distribution of Amino Acids
Eigenvector of the mutation matrix (eigenvalue 1)
13Step 4 Evolutionary time vs. sequences
differences
14Step 5 Scoring Matrix
The Dayhoff scoring matrix is symmetric
15Tree Construction 1Evolutionary distance
calculations
Maximum Likelihood
16Tree Construction 2Table of distances
17Tree Construction 3Neighbor joining algorithm
18Unified approach for the sequence alignment and
structure prediction
19Conclusions
- The highly efficient dynamic programming
algorithms, used in this integrated environment,
are particularly suitable for the high
performance computers. - Trees constructed using optimal PAM distances
are better than the routinesingle distance
scores obtained using a single scoring matrix. - The unified approach for the sequence alignment
and structure prediction provides a powerful
formalism for biologists.
20ASCC Northeastern University
21Northeastern University (NU)/Hewlett-Packard (HP)
Company Collaborative Research Program on
Bioinformatics
Bernardo Barbiellini, Assoc. Director, ASCC Arun
Bansil, Professor of Physics Director
ASCC. Bill Detrich, Prof. Biochem. Marine
Biology, Director Bioinformatics M. S. Kostia
Bergman, Prof. Biology Mike Malioutov, Stone
Professor of Applied Statistics Mary Jo
Ondrechen, Professor of Chemistry Nagarajan
Sankrithi, graduate student NU Imtiaz Khan,
graduate student NU Alper Uzun, graduate student
NU Larry Weissman, staff HP/Compaq Barry Latham,
staff HP/Compaq Bob Morgan, staff HP/Compaq
22Other Bioinformatics activities at ASCC
- BIO3580 DNA and Protein Sequence Analysis (2001,
2002) - MATLAB BIOINFORMATICS TOOL presentation (Robert
Henson) - Summer Institute of Mathematical Studies on
Bioinformatics (2002) (Professor Mike Malioutov) - Student projects proposed by Dr. Matteo
Pellegrini, (Proteinpathways/UCLA).