Title: Biology as an Information Science Rob Rutherford St' Olaf College
1Biology as an Information ScienceRob
RutherfordSt. Olaf College
2 1. The Biologists perspective2.
Genome Biology3. A survey of Challenges
3- If the biota, in the course of eons, has built
something ..who but a fool would discard
seemingly useless parts? To keep every cog and
wheel is the first precaution of intelligent
tinkering. - -Aldo Leopold (1887 - 1948)
4Figure 1.18 Careful observation and measurement
provide the raw data for science
5PubMed had 400,000 new research articles entered
in 2002.NCBI-NLM, 2003
Productive Tinkerers
6NIH-NLM 2003
7NIH-NLM 2003
8Historically Diverse opinions on math and biology
9If your experiment needs statistics, you ought
to have done a better experiment.-Rutherford
(the other one)
10To consult the statistician after an experiment
is finished is to ask him to conduct a post
mortem examination. He can perhaps say what the
experiment died of.
RA Fisher 1956, University of Adeliade Archives
11Part 2Genome Biology
12The human genome
- 3.2 x 109 bp
-
- If each base were one mm long
- 2000 miles, across the center of Africa
13Human Genes
Rutherford
14Imagine we found an ancient document 1,000x
longer than the King James bible, and you can
read 10-20 of the words.
15Things we dont know-The sequence(s) of some of
the genes -What many/most of them do at a
basic level - How they work together
dynamically -Very little about areas outside
genes which control them and show what is most
importantEach organism has its own genome, and
they work together at many levels to build an
ecosystem.
16(No Transcript)
17Part 3A Survey of Problems/Opportunities
18The Central Dogma
- DNA Information Warehouse
- (4 nucleic acid letters ATGC)
-
- RNA Temporary copy of a gene
-
-
- Protein Working Cellular Machine
- (20 amino acid letters)
RNA polymerase PDB
19A Survey of Problems
Finding and Understanding Genes Molecular
Structure and Function Evolution Gene
Expression Networks Other areas
20Finding and Understanding Genes
21(No Transcript)
22Finding Conserved Regions/Domains
HIV protein Comparing your sequence versus
models derived from curated known protein
families
23Determining Molecular Structure
Imaging Experimental data Predicting structure
in silico from sequence
24HIV reverse transcriptase
Structure is Function
- DNA (human genome)
- RNA (HIV virus)
-
- Protein
Goodsell, PDB
25Goodsell, PDB
26Experimental structures in the Protein Data Bank
27Evolution and Phylogenetics
Thanks to Porterfield
28Gene ExpressionWhich gene products
(RNA/protein) are being used when?
29MicroArray
One spot for each gene
30Comprehensive Experiments Data Explosion
- 30,000 Genes on one microarray
- x 100 Microarry Experiments
-
- 3,000,000 Data-points
How to mine this Data for biologically
important trends?
31 Experimental Conditions
4000 Genes
Gene turned on Gene turned off
32Figure 1.23x1 Biotechnology laboratory
33NetworksBiochemical PathwaysSignaling
NetworksTranscriptional Networks Computational
Neuroscience
34Metabolic Pathway Map
35- If the biota, in the course of eons, has built
something ..who but a fool would discard
seemingly useless parts? To keep every cog and
wheel is the first precaution of intelligent
tinkering. - -Aldo Leopold (1887 - 1948)
36Other Opportunities Organismal
Physiology Populations Communities
Ecosystems
37Figure 1.3 Some properties of life
38- Same issues in Macro Biology
- Long history of mathematical
- modeling
- Huge datasets from
- GPS/GIS
- Remote sensing
39Where is all this leading?
40Part 3How do we prepare our students for this
future?
41The systems biologist.
- Biologist who is an intelligent and skeptical
- consumer of large data sets
- Probability and Statistics
- SQL and database basics
- Equilibrium and rates of change (Calculus)
- Exposure to system level data
- And who knows how and when to collaborate(!)
42The Tool Builders
- Excellent mathematical skills
- (algorithms, linear algebra, data structures)
- Be comfortable in a Linux/Unix environment, and
know Perl and C/C. - A deep background in 2 advanced areas of biology
with chemistry prerequisites. - Graduate training
43end
44(No Transcript)
452 Wings of Bioinformatics
- Housekeeping Bioinformatics
- Representation, storage, and distribution of data
- Analytical Bioinformatics
- New tools for the discovery of knowledge in data
46CASPCommunity Wide Assessment of techniques for
Protein Structure Prediction
An example
- Every two years, contest to test protein
structure prediction from primary sequence