Basic Overview of Bioinformatics Tools and Biocomputing Applications I - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Basic Overview of Bioinformatics Tools and Biocomputing Applications I

Description:

Look up Codon Usage table. Assign Amino acid residue. Slide window to next 3 bases. Proceed till stop codon detected. Repeat whole procedure for six frames ... – PowerPoint PPT presentation

Number of Views:476
Avg rating:3.0/5.0
Slides: 24
Provided by: cac6
Category:

less

Transcript and Presenter's Notes

Title: Basic Overview of Bioinformatics Tools and Biocomputing Applications I


1
Basic Overview of Bioinformatics Tools and
Biocomputing Applications I
  • Dr Tan Tin Wee
  • Director
  • Bioinformatics Centre

2
Software Tools
  • Data stored in retrievable forms in database
    systems
  • Data generated by machines, DNA / Protein
    sequencers, automated systems

AutomatedMachines
ResearchLabs
Biological Data
Analytical Tools
Databases
New Knowledge
3
Common Computational Analyses
  • Sequence Assembly
  • Simple sequence analysis
  • Translation and reverse Complement, ORF
  • Composition statistics (protein DNA)
  • Molecular mass
  • Total charge and pI local hydropathy
  • Simple determination of secondary structures
  • Restriction site analysis
  • Internal repeat analysis
  • Detection of active sites, functional residues,
    characteristic structures, substrates, and
    processing signals

4
Common Computational Analyses
  • Database sequence search
  • Multiple alignment
  • 2 and 3 Structure prediction transmembrane
    helix detection
  • Structure modeling
  • Docking prediction and design
  • Hidden Markov model searches

5
Sequence Assembly
  • Fragmented data from DNA sequencers
  • Detection of Overlap
  • Merging of Contigs
  • Assembly into continuous sequence

3'
5'
6
Sequence Format Interconversion
  • DNA/Protein and other sequence data come in
    different formats.
  • Annotations
  • Different programs use different formats
  • Interconversion utility tools
  • eg. READSEQ, TOGCG, TOSTADEN, etc

7
Simple Sequence Analysis
1. Linear Sequence eg. DNA/ Protein
2. Open a Window - n 1 n
variable n sliding
8
Some Simple Sequence Analysis Applications
  • DNA complementary strand eg. COMPLEMENT REVERSE
  • Open window size 1
  • A---gtT
  • C ---gtG
  • T ---gt A
  • G ---gt C
  • Slide to next Window of 1
  • Proceed to end of sequence
  • Reverse order of complement
  • 5' ...ATCTCGATACTACTACG...3'
  • 3' ...TAGAGCTATGATGATGC...5'

9
Some Simple Sequence Analysis Applications
  • DNA to Protein sequence translation, e.g.
    TRANSLATE
  • Open window of 3 bases
  • Look up Codon Usage table
  • Assign Amino acid residue
  • Slide window to next 3 bases
  • Proceed till stop codon detected.
  • Repeat whole procedure for six frames

ATACTACTGAGATCTAGGCTAGTACTGCGTGCG
Frame 1 Frame 2 Frame 3
Complement - Frames 4-6
10
Some Simple Sequence Analysis Applications
  • Detect Open Reading Frame e.g.ORF
  • Translate sequence, report long stretches of
    start and stop codons
  • Compositional analysis
  • eg. Calculate total A, T, G, C
  • eg. Calculate total molecular mass of protein,
    analysis percentages of amino acids
  • eg. Total Charge composition, pI

11
Some Simple Sequence Analysis Applications
  • Simple prediction of secondary structure of
    Protein sequence
  • decide a window size
  • compute for each window of amino acids
    statistical potential to form helix, beta sheet,
    turn, etc. Chou-Fasman, GOR etc algorithms
  • use a statistical potential chart
  • plot potentials in graphical or pictorial format

12
Some Simple Sequence Analysis Applications
  • Restriction Mapping eg. MAP, MAPPLOT,MAPSORT,
    PLASMIDMAP etc
  • Table of Restriction Enzymes and cut siteseg.
    EcoRI, BamHI AluIand their cut sites eg.
    GAATTC , AATT
  • Take a DNA sequence
  • Pattern match against the list of cut sites
  • For each match, assign Restriction enzyme
  • Calculate distance between cut sites
  • Display in table, graphical, or restriction map,
    etc

gel
Plasmidmap
13
Some Simple Sequence Analysis Applications
  • Protein sequence Motifs pattern matching eg.
    PROSITEMAP, MOTIFS, BLOCKS etc
  • Table/Database of Sequence Patterns/Motifs and
    their signature sequence eg. Arg-Gly-Asp (RGD)
    or consensus sequence (eg. PROSITE, BLOCKS db)
  • Take Protein sequence
  • Pattern match against the list of signature sites
  • For each match, assign potential function
    according to database
  • Display in table or graphically, or hyperlinked

14
Some Simple Sequence Analysis Applications
  • Peptide Cleavage Maps eg. PEPTIDESORT, PEPTIDE
    MAP
  • Table of Protease vs Cleavage sites eg. Trypsin,
    chymotrypsin, and Chemical cleavage sites
    cyanogen bromide
  • Pattern match with entire protein sequence
  • Calculate size of peptide fragments
  • Sort and Map, Plot as electrophoretic patterns on
    a log-linear simulated digest.
  • Compute Partial Digest patterns

15
Some Simple Sequence Analysis Applications
  • DOTPLOT- selfcomparison
  • Take a Window size
  • Compare against entire length of own sequence
  • Report matches above a threshold
  • Plot on Graph
  • Slide window, repeat till end of sequence
  • Detection of Internal repeats
  • Pairwise comparison - detection of homology

Sequence A
Sequence A
16
Some Simple Sequence Analysis Applications
  • RNA secondary structure analysis
  • Mfold, PlotFold, FoldRNA, Squiggles, Circles,
    Domes, Mountains, StemLoop
  • Folding of RNA into stems, loops
  • Calculation of energy - prediction of stability
    of structure
  • Display of structure and alternatives

AUCG U G G A
AUGC
UACG
---- -- --
...AUCGA
AUCUC...
17
Database Searching
  • Text-based Database Searching -using a text
    string to match an annotation in a sequence
    database record, ie. Keyword search
  • Sequence-based Database Searching -using a
    biological sequence to match its whole or parts
    of its sequence to the sequences of every
    sequence database records

18
Text-Based Database Searching
  • Examples Entrez, SRS, DBGET, AceDB- common
    integrated database systems
  • Search Concepts
  • Boolean Search - AND, OR, NOT
  • Broadening Search
  • Narrowing the Search
  • Proximity searching, soundex
  • Wild Card, Stemming eg. Thala for thalasemia,
    thalassemia, thalassemic
  • Use standard string search algorithms and boolean
    operations, vocabulary matches

19
Text-based Database Searching
  • Example To find the human homolog of the
    Drosophila per gene
  • Procedure
  • Web to Entrez
  • All Fields enter "human" "per"
  • Hits returned, irrelevant - broaden search
  • "human" "period" - more hits
  • check every one, find the human RIGUI gene
  • Hit and miss, clever guess work, free form or
    controlled vocabulary (MeSH terms)?Use Boolean
    searches?

20
Sequence-based Database Searching
  • Homology Search
  • Global or Local Sequence Alignment
  • Needleman-Wunch Algorithm
  • Smith-Waterman Algorithm
  • Lipman - Pearson FASTA
  • Altschul's BLAST
  • Take a sequence, pairwise comparison with each
    sequence in the database

21
Sequence-based Database Searching
  • Basic Assumptions
  • Sequences of homologous Genes/Protein diverge
    over time even though structure and/or function
    change little
  • Significant sequence similarity inferred as
    potential structural /functional similarity or
    common evolutionary origin
  • Based on well-characterised protein, infer the
    function of an unknown sequence at gene or
    protein sequence level.

22
Sequence-based Database Searching
  • Global Alignmentforces complete alignment of the
    pairwise comparison of the two input sequences
  • Local Alignmentlooks for local stretches of
    similarity and tries to align the most similar
    segments
  • Algorithms used may be similar, but output
    different, statistics needed to assess results

23
Sequence-based Database Searching
  • Alignment Scoring
  • Substitution score and substitution matrixPAM,
    BLOSUM
  • affine gap costs/gap penalty and gap scores
  • Optimal alignments, dynamic programmingNeedleman-
    Wunsch algorithm,Smith-Waterman algorithm
    (SSEARCH)
  • Additional heuristics - FASTA, BLAST
Write a Comment
User Comments (0)
About PowerShow.com