CS 7010: Computational Methods in Bioinformatics (course review) - PowerPoint PPT Presentation

View by Category
About This Presentation

CS 7010: Computational Methods in Bioinformatics (course review)


Bioinformatics: 'research, development, or application of computational tools ... Major research centers (1) ... Scope of the new biology (large-scale) ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 31
Provided by: win95c


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CS 7010: Computational Methods in Bioinformatics (course review)

CS 7010 Computational Methods in
Bioinformatics(course review)
Dong Xu Computer Science Department 271C
Life Sciences Center 1201 East Rollins
Road University of Missouri-Columbia Columbia, MO
65211-2060 E-mail xudong_at_missouri.edu 573-882-706
4 (O) http//digbio.missouri.edu
Technical Definitions
  • NIH (http//www.bisti.nih.gov/)
  • Bioinformatics research, development, or
    application of computational tools and approaches
    for expanding the use of biological, medical,
    behavioral or health data, including those to
    acquire, represent, describe, store, analyze, or
    visualize such data.
  • Computational Biology the development and
    application of data-analytical and theoretical
    methods, mathematical modeling and computational
    simulation techniques to the study of biological,
    behavioral, and social systems.

Course Topics
  • Data interpretation in analytical technologies
  • Data management and computational infrastructure
  • Discovery from data mining
  • Modeling, prediction and design
  • Theoretical in silico biology
  • Cover classical/mainstream bioinformatics
    problems from computer science prospective

Discovery from Data Mining (I)
Discovery from Data Mining (II)
  • Data source
  • Genomic / protein sequence
  • Microarray data
  • Protein interaction
  • Complicated data
  • Large-scale, high-dimension
  • Noisy (false positives and false negatives)

Discovery from Data Mining (III)
  • Pattern/knowledge discovery from data
  • many biological data are generated by biological
    processes which are not well understood
  • interpretation of such data requires discovery of
    convoluted relationships hidden in the data
  • which segment of a DNA sequence represents a
    gene, a regulatory region
  • which genes are possibly responsible for a
    particular disease

Modeling, Prediction and Design (I)
  • Modeling and prediction of biological
  • Sequence comparison
  • Secondary structure prediction
  • Gene finding
  • Regulatory sequence
  • identification

Modeling, Prediction and Design (II)
  • Prediction of outcomes of biological processes
  • computing will become an integral part of modern
    biology through an iterative process of
  • From prediction to engineering design
  • Drug design
  • Protein structure prediction to protein
  • Design genetically modified species

Scope of Bioinformatics
  • data management data mining modeling
    prediction theory formulation

an indispensable part of biological science
scientific aspect
engineering aspect
computer science, biology, statistics mathematics,
physics, chemistry, engineering,
Bioinformatics Foundations
  • Technology
  • Biology/medicine
  • Computer Science
  • Statistics
  • From interdisciplinary field to a distinct

Course Coverage
  • A general introduction to the field of
  • problems definitions from biological problem to
    computable problem
  • key computational techniques
  • A way of thinking tackling biological problem
  • how to look at a biological problem from a
    computational point of view
  • how to formulate a computational problem to
    address a biological issue
  • how to collect statistics from biological data
  • how to build a computational model
  • how to design algorithms for the model
  • how to test and evaluate a computational
  • how to access confidence of a prediction result

Dongs top 10 list forcomputational methods in BI
  1. Dynamic programming
  2. Neural network
  3. Hidden Markov Model
  4. Hypothesis test
  5. Bayesian statistics
  6. Clustering
  7. Information theory
  8. Support Vector Machine
  9. Maximum likelihood
  10. Sampling search (Gibbs, Monte Carlo, etc)

Research Areas
  1. Solved problems
  2. Developed areas with remaining challenges hard
    to solve
  3. Developing areas
  4. Emergent areas
  5. Future directions

Solved Problems
  • DNA sequence base calling and assembly
  • Pairwise sequence comparison
  • Protein secondary structure prediction
  • Disordered region in proteins
  • Transmembrane segment prediction
  • Subcellular localization
  • Signal peptide prediction
  • Protein geometry
  • Homology modeling
  • Physical/genetic mapping informatics

Developed areas with remaining challenges
  • Gene finding
  • Phylogenetic tree construction and evolution
  • Protein docking
  • Drug design
  • Protein design
  • Linkage analysis and quantitative traits (QTL)
  • Microarray data collection
  • Gene expression clustering

Developing Areas
  • Multiple sequence comparison and remote homolog
  • Repetitive sequence analysis
  • Protein structure comparison
  • Protein tertiary structure prediction
  • RNA secondary structure prediction
  • Regulatory sequence analysis
  • Computational proteomics
  • Protein interaction networks
  • Gene ontology and function prediction
  • Computational neural science and applications in
    various species and systems (e.g., cancer)

Emergent Areas
  • Pathway (regulatory network) prediction
  • ChIP-chip analysis
  • Tiling array analysis
  • Haplotype/SNP analysis
  • Computational comparative genomics
  • Text (literature) mining
  • Small RNA and anti-sense regulation
  • Alternative splicing prediction
  • Computational metabolomics

Possible future directions
  • Genome semantics
  • Membrane protein structure prediction
  • RNA tertiary structure prediction
  • Post-translational modification
  • Dynamics of regulatory networks
  • Virtual cell/organism modeling
  • Phenotype-genotype relationship
  • (nobody knows)

Where the science is going? (1)
  • Bioinformatics has been a technology to
    biological research Interpretation of data
    generated by bench biologists
  • We start to see a trend that computational
    predictions can guide experimental design
  • With more high-throughput technologies become
    available, discovery-driven science will play
    increasingly more important roles in biology
  • With computational techniques continue to mature
    for biological applications, we will see more and
    more computational applications with powerful
    prediction capabilities

Where the science is going? (2)
  • Like physics, where general rules and laws are
    taught at the start, biology will surely be
    presented to future generations of students as a
    set of basic systems ....... duplicated and
    adapted to a very wide range of cellular and
    organismic functions, following basic
    evolutionary principles constrained by Earths
    geological history.
  • --Temple Smith, Current Topics in
    Computational Molecular Biology

Major research centers (1)
  • National Center for Biotechnology Information
    (NCBI) of NIH (http//www.ncbi.nlm.nih.gov/)
  • the home of many important databases including
  • the home of many important bioinformatics tools
    including BLAST

Major research centers (2)
  • European Molecular Biology Laboratory (EMBL)
  • has some of the most powerful research groups in
  • Has numerous tools and databases

Major research centers (3)
  • Sanger Institute (http//www.sanger.ac.uk/)
  • The Institute for Gonomic Research (TIGR,
  • Swiss-Prot (http//www.tigr.org/)

Major Universities in US
  • University of California at Santa Cruz
  • University of California at San Diego
  • Washington University
  • University of Southern California
  • Stanford University
  • Columbia University
  • Boston University
  • Harvard University
  • MIT
  • Virginia Tech

Major journals
  • Bioinformatics
  • Nucleic Acids Research
  • Genome Research
  • Journal of Computational Biology
  • Journal of Bioinformatics and Computational
  • In silico Biology
  • Briefings in bioinformatics
  • Applied Bioinformatics
  • IEEE/ACM Transactions on Computational Biology
    and Bioinformatics
  • Proteins structure, function and bioinformatics
  • Journal of Computer Science and Technology
  • Genomics, Proteomics and Bioinformatics

Major conferences
  • Intelligent Systems for Molecular Biology (ISMB)
  • Annual Conference on Computational Biology
  • IEEE/Computational Systems Bioinformatics
    Conference (CSB)
  • Pacific Symposium on Biocomputing (PSB)
  • European Conference on Computational Biology
  • IEEE Conference on Biotechnology and
    Bioinformatics (BIBE)
  • International Workshop on Genome Informatics
  • Asia-Pacific Bioinformatics Conference (APBC)

  • Michael Waterman
  • Phil Green
  • Gene Myers
  • Barry Honig
  • No Nobel Price Winner yet

  • Scope of the new biology (large-scale)
  • Technology (tool development) vs. science
    (biological application)
  • Knowledge vs. prediction
  • Experimental vs. computational/theoretical
  • First principle vs. empirical / statistical
  • Automated vs. curated
  • One machine can do the work of fifty ordinary
    men. No machine can do the work of one
    extraordinary man.

Choosing Bioinformaticsas Career - 1
  • Field outlook
  • Must be a believer of bioinformatics (for its
    value to science)
  • Must have a strong motivation and willing to walk
    extra miles (learn more disciplines)
  • Technologist vs. technician

Choosing Bioinformaticsas Career - 2
  • Molecular cellular and evolutionary biology
  • understanding the science
  • Computational, mathematical, and statistical
  • mastering the techniques
  • High-throughput measurement technologies
  • Knowing what biological data are obtainable
About PowerShow.com