Sequence Alignment Algorithms - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Sequence Alignment Algorithms

Description:

Title: Introduction to C++ Software evolution Author: Physics Last modified by: partha Created Date: 8/31/2000 7:11:56 AM Document presentation format – PowerPoint PPT presentation

Number of Views:190
Avg rating:3.0/5.0
Slides: 73
Provided by: physics
Category:

less

Transcript and Presenter's Notes

Title: Sequence Alignment Algorithms


1
Sequence Alignment Algorithms Application to
Bioinformatics Tool Development
  • Dr. S. Parthasarathy
  • Reader and Head
  • Department of Bioinformatics
  • Bharathidasan University
  • Tiruchirappalli 620 024
  • (E-mail partha_at_cnld.bdu.ac.in)

2
Plan
  • Introduction to Bioinformatics
  • Sequence alignment algorithms
  • Global alignment Needleman - Wunsch algorithm
  • Local alignment Smith Waterman algorithm
  • Predict Fold to a protein
    sequence
  • Methodology
  • Algorithm, Coding Tool Development
  • Benchmarking
  • Conclusions

PredictFold
3
Introduction
  • Why do we need Bioinformatics?
  • What is Bioinformatics?
  • Where is Bioinformatics used?

4
Why?
  • Biological Data Explosion
  • How did Biological Data Explosion happen?
  • Sequence Databases are HUGE than the Structure
    Databases
  • Why so?

5
Introduction Biological Data Genome Projects
  • Latest Revolution
  • On 26 June, 2000 - Announcement of completion of
    the draft of the Human Genome
  • Genetic Code of Human Life is Cracked by
    Scientists
  • Human Genome contains 3.2 x 109 bps
  • Unit of (Genome) sequence length
  • bps (base pairs)
  • Mbps (Mega base pairs) 106 bps
  • Gbps (Giga base pairs) 109 bps
  • huge (human genome equivalent) 3.2 Gbps
  • Unit of Genetic distance
  • centiMorgan (cM) - arbitrary unit Named for
    Thomas Hunt Morgan
  • (e.g. 1 cM 0.01 recombinant frequency)

6
Introduction Biological Data Genome Projects
16 February 2001
15 February 2001
7
Biological Data Recombinant DNA Technology
  • Old Revolution
  • 1940 Role of DNA as the genetic material was
    confirmed
  • 1953 Discovery of DNA structure by James Watson
    Francis Crick
  • 1966 Establishment of the Genetic Code
  • 1967 DNA ligase was isolated (join two
    strands of DNA together)
  • Molecular Glue
  • 1970 Isolation of Restriction enzyme
    Molecular Scissors
  • 1972 Recombinant DNA molecules were generated
    at Stanford
  • University, USA
  • 1973 Joining DNA fragments to the plasmid
    pSC101 isolated from
  • E.Coli. They could replicate when
    introduced into E.Coli.
  • The discoveries of 1972 1973 triggered off
    the biggest scientific revolution Genetic
    Engineering

8
Biological Data explosion
  • GenBank, NCBI, USA
  • 44 Gbps of DNA 40 Million Sequences (upto
    2004)
  • GenBank, National Center for Biotechnology
    Information, USA
  • Protein Data Bank (PDB), RCSB, USA
  • 29,000 structures (2004)
  • PDB, Research Collaboratory for Structural
    Bioinformatics, USA
  • QUALITY of Data - HIGH
  • Experimental error in modern genomic sequencing
    is extremely low
  • QUANTITY of Data - HUGE
  • With Recombinant DNA technology genomic
    sequencing, size of sequence data bases is
    increasing very rapidly
  • SEQUENCE Versus STRUCTURE Databases
  • Sequence Databases are HUGE than Structure
    Databases
  • Leads to Bioinformatics

9
What?
  • What is Bioinformatics?
  • Define Bioinformatics

10
Bioinformatics - Definition
F(i,j) max F(i-1, j-1)s(xi,yj), F(i-1, j)
d, F(i, j-1) d.
Bioinformat ics
atcggcatgcatcagtcatgcaactg
PEPTIDESE QSEDITPEP
Bioinformatics is an integration of mathematical,
statistical and computer methods to analyze
biological data. We use computer programs to
make inference from the biological data, to make
connections among them and to derive useful and
interesting predictions. The marriage of biology
and computer science has created a new field
called Bioinformatics. - Arthur M. Lesk
11
Biology Basic Definitions
  • Cell - It is the building block of living
    organisms
  • Eukaryotic Cells or organisms have the nucleus
    separated from the cytoplasm by a nuclear
    membrane and the genetic material borne on a
    number of chromosomes consisting of DNA and
    Protein
  • Chromosome
  • The physical basis of heredity. Deeply staining
  • rod-like structures present with the nuclei of
    eukaryotes
  • Contains DNA and protein arranged in compact
    manner
  • Replicate identically during cell division
  • Same number of chromosomes present in cells of a
    particular species (e.g. Human 22, X and Y)

12
GenomeBasic Definitions
  • Genome
  • A complete set of chromosomes inherited from one
    parent
  • Gene
  • One of the units of inherited material carried on
    by chromosomes. They are arranged in a linear
    fashion on DNAs. Each represents one character,
    which is recognized by its effect on the
    individual bearing the gene in its cells. There
    are many thousand genes in each nucleus.
  • DNA (Deoxyribo Nucleic Acid)
  • DNA is made up of FOUR bases
  • a t g c adenine, thymine, guanine,
    cytosine
  • Protein
  • Protein is made up of TWENTY different amino
    acids
  • A T G C ... Alanine, Threonine, Glycine,
    Cysteine,

13
Central Dogma
CCTGAGCCAACTATTGATGAA
CCUGAGCCAACUAUUGAUGAA
PEPTIDE
14
Genome DataHuman Model Organisms
  • Most mapping and sequencing technologies were
    developed from studies of simpler non-human
    organisms
  • Non-Human/Model organisms
  • Bacterium Escherichia Coli - 4.6 Mbp
  • Yeast Saccharomyces Cerevisiae - 12.1 Mbp
  • Fruit Fly Drosophila melanogaster - 180.0 Mbp
  • Roundworm C. elegans - 95.5 Mbp
  • Laboratory Mouse Mus musculus - 3.0 Gbp
  • Human more complex genome
  • Human Homo sapiens - 3.2 Gbp

15
Genome DataHuman (Homo Sapiens)
  • Genome 1
  • Chromosomes 23
  • Genes / DNAs 30,000
  • Nucleotides 3.2 x 109 bps

16
Bioinformatics in Genome Research
  • Data Collection and Interpretation
  • Collecting and Storing Data
  • Sequence generated by genome research will be
    used as primary information source for human
    biology and medicine
  • The vast amount of data produced will first need
    to be collected, stored and distributed
  • Interpretation of Data
  • Recognizing where genes begin and end
  • Searching a database for a particular DNA
    sequence may uncover these homologous sequences
    in a known gene from a model organism, revealing
    insights into the function of the corresponding
    human gene

17
Understanding Gene Function
  • Correct protein function depends on the 3-D or
    folded structure the protein assumes in
    biological environments
  • Understanding protein structure will be essential
    in determining gene function

Gene Protein Function
Structure
18
Where?
  • Where is Bioinformatics used?
  • What are the uses of Bioinformatics?
  • Applications of Bioinformatics

19
Bioinformatics Tasks
  • Sequence Analysis (Protein sequences)
  • Similarity Homology
  • pairwise local/global alignment
  • GCG Seqlab Seqweb
  • Scoring Matrices - PAM, BLOSUM
  • Database Search
  • BLAST, FASTA
  • Multiple alignment
  • ClustalW, PRINTS, BLOCKS
  • Secondary Structure Prediction (from Sequence)
  • Proteins ?-Helix, ß-Sheet, Turn or coil
  • Protein Folding

20
Bioinformatics Tasks
  • Structure analysis Experimental Determination
  • X-ray crystallography 3 dimensional coordinates
    Structure
  • Nuclear Magnetic Resonance (NMR)
  • PDB Protein Data Bank
  • RasMol Molecular Viewing Software
  • High-throughput crystallographic structure
    determination
  • High flux synchrotron radiation sources (data
    collection)
  • Multiple anomalous diffraction method (data
    interpretation)
  • Bioinformatics - Structure Prediction
  • Homology Modelling InsightII, SwissPDBViewer,
    Biosuite
  • ab initio method - Monte Carlo Simulation
  • Protein Structure Classification
  • SCOP - Structural Classification Of Proteins
  • CATH - Class, Architecture, Topology, Homologous
    superfamily
  • FSSP - Fold Classification based on Structure-
    Structure alignment
  • of Proteins obtained by DALI
    (Distance-matrix
  • ALIgnment)

21
Bioinformatics Tasks
  • Protein Engineering
  • Mutations
  • Alter particular amino acid/base for desired
    effect
  • Site directed mutagenesis
  • Identify the potential sites where we can do
    alterations
  • Applications
  • Agricultural Genetically Modified Plants,
    Vegetables, GM Food
  • Pharmaceutical Molecular Modelling base Drug
    Design
  • Medical Gene Therapy
  • DNA Bending
  • Application to Genomes
  • (Ref M.G.Munteanu, K.Vlahovicek,
    S.Parthasarathy, I.Simon and S.Pongor, Rod Models
    of DNA Sequence-dependent anisotropic elastic
    modelling of local phenomena, Trends in
    Biochemical Sciences, 23 (1998) 341-347)

22
Bioinformatics TasksGenomics Proteomics
  • Genomics is the study of the structure, content,
    evolution and functions of genes in genomes
  • Aims of Genomics
  • To establish an integrated web based database and
    research interface
  • To assemble Physical,Genetic and Cytological maps
    of the Genome
  • To identify and annotate the complete set of
    genes encoded within a genome
  • To provide the resources for comparison with
    other genomes

23
Proteomics Proteome
  • Proteome is the complete collection of proteins
    in a cell/tissue/organism at a particular time.
    Unlike genomes, which are stable over the life
    time of the organism, proteomes change rapidly as
    each cell response to its changing environment
    and produces new proteins and at different
    amounts.
  • Genome is a more stable entity. An organism has
    only one genome but many proteomes.
  • For an organism, there may be
  • one body wide proteome,
  • about 200 tissue proteomes
  • about a trillion (1012) individual cell
    proteomes.

24

Proteomics Definition
  • The study of proteomes that includes determining
    the 3D shapes of proteins, their roles inside
    cells, the molecules with which they interact,
    and defining which proteins are present and how
    much of each is present at a given time.

25

Proteomics Applications
  • To correlate proteins on the basis of their
    expression profiles.
  • To observe patterns in protein synthesis and this
    observed pattern changes can be used as an
    indicator of the state of cell and its gene
    expression.
  • To characterize bacterial pathogens and to
    develop novel antimicrobials.
  • To identify regions of the bacterial genome that
    encode pathogenic determinants.
  • To develop drugs and in toxicology Structural
    Proteomics
  • Proteomics as a tool for plant genetics and
    breeding

26
Systems Biology
  • Systems Biology is a new perspective and emerging
    field for research in the post-genomic era.
  • It aims at system level understanding of
    biological systems.
  • It studies whole cells/tissues/organisms not by a
    traditional reductionists approach but by
    holistic means in a reiterative attempt to model
    the complete cell/tissue/organism.
  • It is an integrated and interacting network of
    genes, proteins and biochemical reactions which
    give rise to life.

27
Systems Biology
28
Sequence Alignment Algorithms
  • Similarity and Homology
  • Sequence Comparison - Issues
  • Types of alignments
  • Algorithms Used

29
Sequence similarity and homology
  • Nature is a tinkerer and not an inventor. New
    sequences are adapted from pre-existing sequences
    rather than invented de novo . There exists
    significant similarity between a new sequence and
    already known sequences. Fortunate for
    computational sequence analysis
  • Similarity Measurement of resemblance and
    differences, independent of the source of
    resemblance.
  • Homology The sequences and the organisms in
    which they occur are descended from a common
    ancestor.
  • If two related sequences are homologous, then we
    can transfer information about structure and/or
    function, by homology.

30
3-D Structure and Homology
  • 3-D structure patterns (motifs) of proteins are
    much more evolutionarily conserved than amino
    acid sequences - This type of Homology search
    could prove more fruitful
  • Particular motifs may serve similar functions in
    several different proteins, information that
    would be valuable in genome analysis
  • Only a few protein motifs can be recognised at
    the sequence level
  • Development of more analytic capabilities to
    facilitate grouping protein sequences into motif
    families will make homology searches more useful

31
Sequence ComparisonIssues
  • Types of alignment
  • Global end to end matching
    (Needleman-Wunsch)
  • Local portions or subsequences matching
    (Smith-Waterman)
  • Scoring system used to rank alignments
  • PAM BLOSUM matrices
  • Algorithms used to find optimal (or good) scoring
    alignments
  • Heuristic
  • Dynamic Programming
  • Hidden Markov Model (HMM)
  • Statistical methods used to evaluate the
    significance of an alignment score
  • Z- score, P- value and E- value

32
Substitution Matrices
  • PAM (Point Accepted Mutation)
  • BLOSUM (BLOcks SUbstitution Matrix)

40
90
Close
62
Default
250
Distant
500
30
33
Types of Algorithms
  • Heuristic
  • A heuristic is an algorithm that will yield
    reasonable results, even if it is not provably
    optimal or lacks even a performance guarantee.
  • In most cases, heuristic methods can be very
    fast, but they make additional assumptions and
    will miss the best match for some sequence pairs.
  • Dynamic Programming
  • The algorithm for finding optimal alignments
    given an additive alignment score dynamically
  • (We are going to discuss about it soon.)
  • These type of algorithms are guaranteed to
    find the optimal scoring alignment or set of
    alignments.
  • HMM - Based on Probability Theory very
    versatile.

34
Global AlignmentNeedleman-Wunsch Algorithm
  • Formula
  • F(i-1,j-1)
    s(xi,yj) D
  • F(i, j) max F(i-1 , j) - d
    H
  • F(i , j-1) - d
    V

F(i-1,j-1) D F(i,j-1) V
F(i-1,j) H F(i,j)
35
Global AlignmentNeedleman-Wunsch Algorithm
  • Gap penalties
  • Linear score f(g) - gd
  • Affine score f(g) - d (g-1) e
  • d gap open penalty e gap extend penalty
  • g gap length
  • Trace back
  • Take the value in the bottom right corner and
    trace back till the end. (i.e. align end end
    always).
  • Algorithm complexity
  • It takes O(nm) time and O(nm) memory, where n and
    m are the lengths of the sequences.

36
Local AlignmentSmith-Waterman Algorithm
  • Same as Global alignment algorithm with
  • TWO differences.
  • F(i,j) to take 0 (zero), if all other options
    have value less than 0.
  • Alignment can end anywhere in the matrix.
  • Take the highest value of F(i,j) over the whole
  • matrix and start trace back from there.

37
Local AlignmentSmith-Waterman Algorithm
  • Formula
  • F(i-1,j-1) S(xi,yj)
    D
  • F(i, j) max F(i-1 , j) - d
    H
  • F(i , j-1) - d
    V
  • 0 (if all other
    value is lt 0)

F(i-1,j-1) D V F(i,j-1)
F(i-1,j) H F(i,j)
38
Web based server development
  • Design the web page to get the data
  • Use cgi-bin or Perl script to parse the submitted
    data
  • Invoke the corresponding program to get the
    appropriate results
  • Send the results either by e-mail or to the web
    page directly

39
Application to Bioinformatics Tool Development
  • To predict a fold to protein sequence

PredictFold
40
To predict a fold to protein sequence
PredictFold
  • To predict possible folds for a given protein
    sequence, whose structure is not known
  • To develop a fold recognition technique / tool
    that is sensitive in detecting folds of given
    protein sequences in the twilight zone (sequences
    sharing less than 25 identity)
  • Application of the fold recognition strategy to
    genomic annotation

41
Twilight Zone sequencesExampleCytochrome
Sequences
  • 256b
  • gt256BA CYTOCHROME B562 (OXIDIZED) - CHAIN A
    ADLEDNMETLNDNLKVIEKADNAAQVKDALTKMRAAALDAQKATPPKLED
    KSPDSPEMKD FRHGFDILVGQIDDALKLANEGKVKEAQAAAEQLKTTRN
    AYHQKYR
  • gt256BB CYTOCHROME B562 (OXIDIZED) - CHAIN B
    ADLEDNMETLNDNLKVIEKADNAAQVKDALTKMRAAALDAQKATPPKLED
    KSPDSPEMKD FRHGFDILVGQIDDALKLANEGKVKEAQAAAEQLKTTRN
    AYHQKYR
  • 2ccy
  • gt2CCYA CYTOCHROME C(PRIME) - CHAIN A
    QQSKPEDLLKLRQGLMQTLKSQWVPIAGFAAGKADLPADAAQRAENMAMV
    AKLAPIGWAK GTEALPNGETKPEAFGSKSAEFLEGWKALATESTKLAAA
    AKAGPDALKAQAAATGKVCKA CHEEFKQD
  • gt2CCYB CYTOCHROME C(PRIME) - CHAIN B
    QQSKPEDLLKLRQGLMQTLKSQWVPIAGFAAGKADLPADAAQRAENMAMV
    AKLAPIGWAK GTEALPNGETKPEAFGSKSAEFLEGWKALATESTKLAAA
    AKAGPDALKAQAAATGKVCKA CHEEFKQD

42
ExampleSequences similarity
  • lalign output
  • for
  • 256b 2ccy
  • follows

43
ExampleCytochrome Structures
256b
CYTOCHROME STRUCTURES (seq. similarity 24)
2ccy
44
Goals
  • Exploration of suitable fold recognition
    techniques that are sensitive in detecting
    similar folds despite low sequence similarity
  • Identification of functional motifs in proteins
    at sequence (1D) and structure (3D) level
  • Development of a protocol that aid in the rapid
    classification and annotation of genomic data
    based on functional motifs

45
Methodology
  • Reduction of 3D-structure to 1D-environment
    string. Environment at each residue position is a
    function of local secondary structure and extent
    of exposure to the solvent (based on 3D-1D
    profile method developed by Eisenberg et al.,
    1991).
  • Extract residue environment profiles of the
    available protein structures.
  • A scoring matrix is generated from a library of
    profiles. Each matrix element is the information
    value of a residue in the given environment.
  • A library of environment strings is created for
    the available protein fold structures.
  • The probe sequence is queried against this
    library to look for best matches.

46
Workflow
47
Residue Environments
_Helix
Partially buried
_Exposed
_Coil
Strand_
Buried_
48
Residue Environments
  • The residue environments are described by
  • the area (A) of the residue buried in the protein
  • the fraction (f) of side-chain area that is
    covered by polar atoms (O and N)
  • the local secondary structure

49
Residue Environments
  • CLASS Area (A) Å2 FRACTION (f)
  • BURIED 1 (B1) A gt 114 f lt 0.45
  • BURIED 2 (B2) 0.45 lt f lt 0.58
  • BURIED 3 (B3) f gt 0.58
  • PARTIAL 1 (P1) 40 lt A lt 114 f lt 0.67
  • PARTIAL 2 (P2) f gt 0.67
  • EXPOSED (E0) A lt 40 f gt 0.67

50
Residue Environment classes
  • We have 6 classes based on the extend of exposure
    to solvent
  • We have 3 classes based on secondary structure
    Alpha Helix(A), Beta Sheet (B) Coil(C)
  • Total 6 x 3 18 environments
  • B1A,B1B,B1C, B2A,B2B,B2C, B3A,B3B,B3C
    P1A,P1B,P1C, P2A,P2B,P2C, E0A,E0B,E0C.
  • For example
  • B1A - Buried 1 Alpha Helix
  • P2B - Partially Buried 2 Beta Sheet
  • E0C - Exposed 0 Coil

51
Scoring Table
  • The scoring table used in this case is a 20 x 18
    matrix, constructed from a statistical analysis
    of the profile library (consisting of 1200
    protein structures) provided by PROFILES_3D
    module of Insight II (Accelrys Inc.)
  • The scores Sij are calculated using the formula
  • Sij ln P(i j) / Pi x 100
  • where P(i j) is the probability of
    finding residue i in the environment j and Pi
    is the overall probability of finding residue i
    in any environment.

52
Scoring Table
  • The scoring table contains measure of the
    compatibility of the 20 amino acids with the 18
    environmental classes.
  • The individual matrix elements are propensities
    (information values) for the amino acid residues.

53
Scoring Table
54
Fold Library
1565 Functional forms
Scan PDB to identify all the structures having
these folds
Identify a representative structure with
resolution 2.5Å or better
Quality of the structure (Occupancy, R-Factor,
Stereochemistry)
968 Chains
55
DALI / FSSP Fold Library
  • DALI http//www.ebi.ac.uk/dali
  • Touring protein fold space with DALI/FSSP. Lisa
    Holm and Chris Sander, Nucleic Acid Research,
    (1998), 26, 316-319
  • Mapping the Protein Universe, Lisa Holm and
    Chris Sander, Science, (1996), 273, 595-602

56
Sequence ComparisonDetails
  • Type of Alignment
  • Local - portions or subsequences matching
  • Smith-Waterman Algorithm
  • Scoring Table 3D-1D matrix
  • Algorithm used Dynamic Programming
  • Alignment Score Z- Score

57
Local AlignmentSmith-Waterman Algorithm
  • Formula
  • F(i-1,j-1) S(xi,yj)
    D
  • F(i, j) max F(i-1 , j) - d
    H
  • F(i , j-1) - d
    V
  • 0 (if all other
    value is lt 0)

F(i-1,j-1) D V F(i,j-1)
F(i-1,j) H F(i,j)
58
Gap Penalties
  • Gap penalties
  • Linear score f(g) - gd
  • Affine score f(g) - d (g-1) e
  • d gap open penalty e gap extend penalty
  • g gap length
  • Gap penalty values used are
  • d 500
  • e 50

59
Local Alignment
  • Trace back
  • Alignment can end anywhere in the matrix
  • Take the highest value of F(i,j) over the whole
    matrix and start trace back from there.
  • Algorithm complexity
  • It takes O(nm) time and O(nm) memory, where n and
    m are the lengths of the sequences.

60
Significance of an Alignment Score
  • Statistical methods used to evaluate the
    significance of an alignment score
  • Z-score, P-value and E-value
  • Significance of Score
  • Z- score (score mean)/std. dev
  • Measures how unusual our original match is.
  • Z ? 5 are significant.
  • P- value measures probability that the alignment
    is no better than random. (Z and P depends on the
    distribution of the scores)
  • P ? 10-100 exact match.
  • E- value is the expected number of sequences that
    give the same Z- score or better. (E P x size
    of the database)
  • E ? 0.02 sequences probably homologous

61
Benchmarking
  • All 968 proteins in the fold library were
    profiled on each of the other members
  • A histogram indicating the rank and the number of
    sequences which got the self score as the
    highest, is shown in Figure.

62
Benchmarking
63
Benchmarking
  • Report
  • 797 retain the self as the highest score
  • 63 report the self to have the second highest
    score
  • There were about 100 proteins that have ranks
    between 5 and 100.
  • Limitations
  • Prediction is restricted to the 968 folds in the
    library
  • The algorithm is insensitive to partially folded
    sequences
  • Specific to globular proteins and not for
    membrane proteins
  • Sequences that fold in the presence of cofactors
    and ligands are not accounted for

64
Web based server development
  • Design the web page to get the data
  • Use cgi-bin or Perl script to parse the submitted
    data
  • Invoke the corresponding program to get the
    appropriate results
  • Send the results either by e-mail or to the web
    page directly
  • Prepare a user manual to describe the salient
    features of the server

65
Conclusions
  • PredictFold A program to predict possible folds
    for a new protein sequence based on the 3D-1D
    profile method
  • Benchmarking results show the reliability of the
    method
  • There are lot of scopes for further improvements

66
Future Directions
  • To update the fold library by including more
    known folds
  • To use the predicted secondary structure
    information of the given sequence also
  • To optimise the source code for efficient
    handling of genome sequences, automatically
  • To combine results from other algorithms ORF,
    HMM, etc. to detect remote homologs
  • To develop maintain a web-based sever for fold
    recognition

67
BT versus IT
  • Bioinformatics including Biotechnology (BT)
    requires lot of Information Technology (IT)
    skills for Genomic annotation projects
  • Bioinformatics is one of the potential areas for
    IT professionals also
  • Genome Projects will be the next huge task for IT
    industries (like the Y2K problem in the past)
  • BT will take on IT soon in the near future

68
Conclusions
  • Developing Web based Bioinformatics tools
  • Develop/modify useful algorithms
  • Generate computer source codes
  • Create/Maintain Web based server
  • Using existing Web based tools efficiently
  • Ethical issues
  • Bioethics Biosafety Ensure always that any
    bioinformatics tool harmful to environment
    society has neither been developed nor been used
    by you
  • Cloning of human, Terminator technology, GM Food,
    etc.

69
References (latest)
  • Arthur M. Lesk, Introduction to Bioinformatics,
    Oxford University Press, New Delhi (2003).
  • D. Higgins and W. Taylor (Eds), Bioinformatics-
    Sequence structure and databanks, Oxford
    University Press, New Delhi (2000).
  • R.Durbin, S.R.Eddy, A.Krogh and G.Mitchison,
    Biological Sequence Analysis, Cambridge Univ.
    Press, Cambridge, UK (1998).
  • A. Baxevanis and B.F. Ouellette, Bioinformatics
    A Practical Guide to the Analysis of Genes and
    Proteins, (Third Edition) Wiley-Interscience,
    Hoboken, NJ (2005).
  • G.Gibson and S.V.Muse, A Primer of Genome
    Science, Sinauer Associates, USA (2002).
  • N. C. Jones and P. A. Pevzner, An Introduction to
    Bioinformatics Algorithms, Ane Books, New Delhi
    (2005).
  • Michael S. Waterman, Introduction to
    computational Biology, Chapman Hall, (1995).
  • J. A. Clasel and M. P. Deutscher (Eds),
    Introduction to Biophysical Methods for Protein
    and Nucleic Acid Research, Academic press, New
    York (1995).
  • D.S. T.Nicholl, An Introduction to Genetic
    Engineering, (Second Edition) Cambrdige Univ.
    Press, UK (2002).

70
References
  • 3D-1D Profile method
  • J.U.Bowie, E.Luthy D.Eisenberg, Science, 253,
    164-170 (1991).
  • Ostensible Recognition of Folds (ORF) method
  • Rajeev Aurora and George D.Rose, Proc. Natl.
    Acad. Sci. (USA), 95(6), 2818-2823 (1998).
  • Superfamily Hidden Markov Model (SHMM) method
  • A.Krogh, M.Brown, IS.Mian, K.Sjolander and
    D.Haussler, J. Mol. Biol. 235(5), 1501-31 (1994).

71
ImportantBioinformatics Resources
  • Databases Tools
  • NCBI, NIH - www.ncbi.nlm.nih.gov
  • EMBL, EBI - www.ebi.ac.uk
  • ExPasy, Swiss - www.expasy.org
  • DDBJ - www.ddbj.nig.ac.jp
  • PDB - www.rcsb.org/pdb
  • Software
  • Accelrys - www.accelrys.com/products
  • GCG, Insight II, Cerius II, Discovery Studio
  • TCS - www.atc.tcs.co.in/biosuite/
  • BIOSUITE
  • Jalaja Technologies - www.jalaja.com
  • GENOCLUSTER

72
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com