Title: DNA in the chromosomes of the genome contains all the information to develop an organism and operate all its cell types.
1The Elements of Molecular Biology
A principal goal is to understand cells and
organisms as molecular systems / machines. The
basic classes of molecules are
- DNA in the chromosomes of the genome contains all
the information to develop an organism and
operate all its cell types. - RNA serves both short-term informational roles
and structural roles. - Proteins execute the functions of a cell and
provides its structural integrity. - Small metabolites (fats, sugars, etc.) provide
energy, raw materials, and serve some limited
structural roles.
2Cells As Molecular Machines
3Understanding Cells at the Molecular Level
- Determining the DNA sequences of the chromosomes
of a species. Sequencing - An accurate parts list of all the proteins and
RNAs in the cell. Annotation - A graph of all the interactions taking place
between these agents. Pathways - What is happening during each interaction. Fun
ction - Where each interaction is taking
place. Subcellular Localization - Simulating the system to predict behavior.
- Cellular Dynamics
4Current State
- We can sequence the euchromatic portions of
genomes. - We can recognize 75 of the genes but not
accurately unless they have been experimentally
verified. We dont know much about alternate
splicing. - We can crudely observe expression of mRNAs and
with even greater difficulty observe the more
abundant proteins. - Most accurate molecular biological information is
still being verified one hypothesis at a time. - We must either coordinate efforts or reduce
experimental costs to the point where each
investigator is greatly empowered.
5Current Technologies
- Sequencing Randomly sample and sequence 600bp
stretches from the ends of segments of a given
length and assemble, followed by a directed
finishing phase. - Expression Assays High density arrays where
each spot is a set of 18-50bp DNAs complementary
to the RNA sequence to be measured, or geometric
amplification from a pair of DNA probes
complementary to the RNA sequence (quantitative
PCR). - Proteomics Mass spectrometers can measure the
amount and atomic weight of ionized protein
pieces (peptides) allowing complex mixtures to be
analyzed. - Light Microscopy With confocal microscopes and
antibody, or RNA, or organo-metallic staining,
phenomenon involving but a few particles are
being observed. - All of these technologies involve interesting
problems in the interpretation of the data. - Data Analysis vs. Data Mining
6On Systems Biology
- The goal is to understand cells and organisms as
molecular systems. An objective that can be
reached in the next 10-25 years is the
elucidation of the topology of the circuit
diagram of a species cells. - A graph whose nodes are proteins, RNAs, DNAs, and
metabolites - Edges or hyper edges describe relationships, e.g.
- Catalyzes A-gtB, Kinase (adds phosphate group),
Complexes with, etc. - Nodes carry attributes describing where and when
the agent is present - Believe that such knowledge will result in
tremendous understanding in that it will
generate many hypotheses that are true. - Many circuits when modeled as stochastic diff.
eq.s are so robust to their parameters that they
have only a handful of operating ranges and the
biologically relevent one(s) is readily evident.
7Apoptosis Programmed cell death
8A Proposal based on Drosophila
- Goal Develop a comprehensive set of accurate
genomics-based data and biological resources for
the advancement of cellular biology. Make them
openly and readily accessible and available to
all. - Sequence 10-20 species of Drosophila, 3-5 to a
nearly-finished state and 10-15 to an assembled
state. - Understand the evolution of the genomes and
accurately annotate the genes and regulatory
signals using comparative informatics. - Develop a comprehensive, experimentally verified
annotation of all possible transcripts of the
nearly-finished species. Capture each in a
gate-way clone library. - Build state of the art expression assays for
these genomes and apply them in a number of
surveys. - Utilize mass spec. technology to begin
systematically exploring complex formation and
signalling cascades of Drosophila cells.
9Why Drosophila?
- Sufficiently interesting model development,
differentiation, behavior - Easy to stock lines and many experimental
methods. - Compact genome 20 Dros. Genomes 1 mammalian
genome - Excellent evolutionary and molecular model for
humans - Large community coordinated efforts are possible
10Comparative Gene Finding
Require ab initio models to predict orthologs
consistent with the evolutionary ladder amongst
all strains.
Given a ladder of 5-10 Drosophila genomes we
should be able to accurately discern most protein
encoding genes, many RNA genes, and a host of
regulatory signals
11The Role of Informatics
- We need to make computers easier to program
i.e. we need to put scientific computing in the
hands of the scientists. - Our information management technologies are
inadequate huge data sets, semi-structured,
data contains errors, not integrated we need to
model these and develop flexible data mining
capabilities over them. - There will be a continued need for new algorithms
and tools as driven by new technologies and
protocols. - Physical simulations systems of various types
will be needed docking, ligand binding,
stochastic differential equations. - Experimental design, driven by analysis and
simulation, should be a part of our discipline
and is an area where we can but are not
contributing.