Interactive tools and programming environments for sequence analysis - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Interactive tools and programming environments for sequence analysis

Description:

Matlab and Darwin bioinformatics tools. Dotplot and Statistical signifance of alignments ... MATLAB BIOINFORMATICS TOOL presentation (Robert Henson) ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 23
Provided by: bioin1
Learn more at: http://nuweb.neu.edu
Category:

less

Transcript and Presenter's Notes

Title: Interactive tools and programming environments for sequence analysis


1
Interactive tools and programming environments
for sequence analysis
TATACATAAAGACCCAAATGGAACTGTTCTAGATGATACACTAGCATTAA
GAGAAAAATTCGAAGAATCAGTCGATAAATACAAACTTCATTTTACTGGA
TTAATCGCTGACAAAATTGCAAAAGAAAAACTGAATACTTACGTCCTCAC
TTATAAAAAAGCAGACGAAGCTATGCCTGCAGACGAAGCTATGCCAACTG
ATGTACCTAGTACTTCTGTTACTGGATCAACAATGGCAAAC.
  • Bernardo Barbiellini
  • Northeastern University

2
Overview
  • Matlab and Darwin bioinformatics tools
  • Dotplot and Statistical signifance of alignments
  • Scoring Matrices from Evolution Model
  • Evolutionary Distances and Phylogenetic Trees.
  • Unified approach for the sequence alignment and
    structure prediction

3
Matlab toolbox and Darwin
  • Computer language appropriate for bioinformatics
  • A workbench to automate repetitive tasks
  • Based on Linear Algebra Statistics
  • Matlab toolbox developed by Mathworks
  • Darwin developed by Gaston Gonnet (ETHZ)

4
Extra features
  • Loading of and retrieval in sequence databases
  • Fast searching for sequence fragments
  • Sequence alignment
  • Generation of random sequences, distributions and
    mutations
  • Creation of Phylogenetic trees
  • Plotting functions - matrix and vector arithmetic
  • I/O comunicate with other programs

5
Calling Bioperl functions in MATLAB
Documentation by Brian Madsen (NU and coop at
the Mathworks) gtgt help perl PERL calls perl
script using appropriate operating system
PERL(PERLFILE) calls perl script specified by the
file PERLFILE using appropriate perl
executable. PERL(PERLFILE,ARG1,ARG2,...)
passes the arguments ARG1,ARG2,... to the
perl script file PERLFILE, and calls it by using
appropriate perl executable.
RESULTPERL(...) outputs the result of attempted
perl call.
6
Visual Tool Dotplot (1)
Pairwise sequence comparison
7
Visual Tool Dotplot (2)
Filtered Image
The best alignment is achieved with dynamic
programming . A score is obtained
8
Quantitative Tools To CheckStatistical
Significance
extreme value distribution.
Score in bits
Simulation with random sequences
9
PAM Evolution Model
The score of a paiwise alignment is obtained by
using a scoring matrix. We need a model to build
scoring matrices. This model is based on
evolution in order to calculate evolution
distances between species.
PAM means Accepted Point Mutation
10
Step1 Order of the Amino-Acids
11
Step 2 Mutation Matrices
Markov Model pamX(pam1)X Stochastic matrices
12
Step 3 Distribution of Amino Acids
Eigenvector of the mutation matrix (eigenvalue 1)
13
Step 4 Evolutionary time vs. sequences
differences
14
Step 5 Scoring Matrix
The Dayhoff scoring matrix is symmetric
15
Tree Construction 1Evolutionary distance
calculations
Maximum Likelihood
16
Tree Construction 2Table of distances

17
Tree Construction 3Neighbor joining algorithm
18
Unified approach for the sequence alignment and
structure prediction
19
Conclusions
  • The highly efficient dynamic programming
    algorithms, used in this integrated environment,
    are particularly suitable for the high
    performance computers.
  • Trees constructed using optimal PAM distances
    are better than the routinesingle distance
    scores obtained using a single scoring matrix.
  • The unified approach for the sequence alignment
    and structure prediction provides a powerful
    formalism for biologists.

20
ASCC Northeastern University
21
Northeastern University (NU)/Hewlett-Packard (HP)
Company Collaborative Research Program on
Bioinformatics
Bernardo Barbiellini, Assoc. Director, ASCC Arun
Bansil, Professor of Physics Director
ASCC. Bill Detrich, Prof. Biochem. Marine
Biology, Director Bioinformatics M. S. Kostia
Bergman, Prof. Biology Mike Malioutov, Stone
Professor of Applied Statistics Mary Jo
Ondrechen, Professor of Chemistry Nagarajan
Sankrithi, graduate student NU Imtiaz Khan,
graduate student NU Alper Uzun, graduate student
NU Larry Weissman, staff HP/Compaq Barry Latham,
staff HP/Compaq Bob Morgan, staff HP/Compaq
22
Other Bioinformatics activities at ASCC
  • BIO3580 DNA and Protein Sequence Analysis (2001,
    2002)
  • MATLAB BIOINFORMATICS TOOL presentation (Robert
    Henson)
  • Summer Institute of Mathematical Studies on
    Bioinformatics (2002) (Professor Mike Malioutov)
  • Student projects proposed by Dr. Matteo
    Pellegrini, (Proteinpathways/UCLA).
Write a Comment
User Comments (0)
About PowerShow.com