Title: Some Biology That Computer Scientists Need for Bioinformatics
1Some Biology That Computer Scientists Need for
Bioinformatics
Lenwood S. Heath Virginia Tech Blacksburg, VA
24061 heath_at_cs.vt.edu
University of Maryland December 14, 2001
2Overview
- Some Molecular Biology and Genomics
- Language of the New Biology
- Existing bioinformatics tools
- Bioinformatics challenges
- Bioinformatics at Virginia Tech
3I. Some Molecular Biology
- The instruction set for a cell is contained in
its chromosomes. - Each chromosome is a long molecule called DNA.
- Each DNA molecule contains 100s or 1000s of
genes. - Each gene encodes a protein.
- A gene is transcribed to mRNA in the nucleus.
- An mRNA is translated to a protein on ribosomes.
4Transcription and Translation
Transcription
Translation
DNA
mRNA
Protein
5Elaborating Cellular Function
Regulation
Degradation
Transcription
Translation
DNA
mRNA
Protein
(Genetic Code)
Reverse Transcription
- Functions
- Structure
- Catalyze chemical reactions
- Respond to environment
Thousands of Genes!
6Chromosomes
- Long molecules of DNA 104 to 108 base pairs
- 26 matched pairs in humans
- A gene is a subsequence of a chromosome that
encodes a protein. - Proteins associated with cell function,
structure, and regulation. - Only a fraction of the genes are in use at any
time. - Every gene is present in every cell.
7DNA Strand
2-deoxyribose (sugar)
5 End
3 End
C
T
C
A
A
T
T
G
A
G
C
G
Bases
A (adenine) complements T (thymine)
C (cytosine) complements G (guanine)
8Complementary DNA Strands
C
T
C
A
A
T
T
G
A
G
C
G
T
G
A
G
C
C
T
T
A
A
C
G
C
C
C
A
T
T
T
A
A
G
G
G
T
T
T
G
G
G
A
A
A
C
C
C
Double-Stranded DNA
9RNA Strand
Ribose (sugar)
5 End
3 End
C
U
C
A
A
U
U
G
A
G
C
G
Bases
U (uracil) replaces T (thymine)
10Transcription of DNA to mRNA
Coding DNA Strand
C
C
C
A
T
T
T
A
A
T
T
T
G
G
G
A
A
A
C
C
C
Template DNA Strand
mRNA Strand
C
U
C
A
A
U
U
G
A
G
C
G
T
G
A
G
C
C
T
T
A
A
C
G
Template DNA Strand
11Proteins and Amino Acids
- Protein is a large molecule that is a chain of
amino acids (100 to 5000). - There are 20 common amino acids
- (Alanine, Cysteine, , Tyrosine)
- Three bases --- a codon --- suffice to encode an
amino acid. - There are also START and STOP codons.
12Genetic Code
13Translation to a Protein
mRNA Strand
C
U
C
A
A
U
U
G
A
G
C
G
Arginine
Histidine
Alanine
Phenylalanine
Nascent Polypeptide Amino Acids Bound Together
by Peptide Bonds
Unlike DNA, proteins have three-dimensional
structure essential to protein function.
Protein folds to a three-dimensional shape that
cannot yet be predicted from the primary sequence.
14Transcription and Translation
Transcription
Translation
DNA
mRNA
Protein
15Transcription of DNA to mRNA
Coding DNA Strand
C
C
C
A
T
T
T
A
A
T
T
T
G
G
G
A
A
A
C
C
C
Template DNA Strand
mRNA Strand
C
U
C
A
A
U
U
G
A
G
C
G
T
G
A
G
C
C
T
T
A
A
C
G
Template DNA Strand
16Translation to a Protein
mRNA Strand
C
U
C
A
A
U
U
G
A
G
C
G
Arginine
Histidine
Alanine
Phenylalanine
Nascent Polypeptide Amino Acids Bound Together
by Peptide Bonds
17Cells Fetch-Execute Cycle
- Stored Program DNA, chromosomes, genes
- Fetch/Decode RNA, ribosomes
- Execute Functions Proteins --- oxygen
transport, cell structures, enzymes - Inputs Nutrients, environmental signals,
external proteins - Outputs Waste, response proteins, enzymes
18II. The Language of the New Biology
A new language has been created. Words in the
language that are useful for todays talks.
Genomics
Functional Genomics Proteomics
cDNA Microarrays
Global Gene Expression Patterns
19Genomics
- Discovery of genetic sequences and the ordering
of those sequences into - individual genes
- gene families
- chromosomes.
- Identification of
- sequences that code for gene products/proteins
- sequences that act as regulatory elements.
20Genome Sequencing Projects
- Drosophila
- Yeast
- Mouse
- Rat
- Arabidopsis
- Human
- Microbes
21Drosophila Genome
22Functional Genomics
- The biological role of individual genes.
- Mechanisms underlying the regulation of their
expression. - Regulatory interactions among them .
23Glycolysis, Citric Acid Cycle, and Related
Metabolic Processes
24Gene Expression
- Only certain genes are turned on at any
particular time. - When a gene is transcribed (copied to mRNA), it
is said to be expressed. - The mRNA in a cell can be isolated. Its
contents give a snapshot of the genes currently
being expressed. - Correlating gene expressions with conditions
gives hints into the dynamic functioning of the
cell.
25Gene ExpressionControl Points
26Responses to Environmental Signals
27Intracellular Decision Making
28Microarray Technology
- In the past, gene expression and gene
interactions were examined known gene by known
gene, process by process. - With microarray technology
- Simultaneous examination of large groups of genes
and associated interactions - Possible discovery of new cellular mechanisms
involving gene expression
29Flow of a Microarray Experiment
Replication and Randomization
PCR
Select cDNAs
Robotic Printing
Test of Hypotheses
Reverse Transcription and Fluorescent Labeling
Extract RNA
30Relative Abundance Detection
Detection
Treatment
Control
1
2
1
2
1
2
2
3
1
3
3
3
Emission
Excitation
Mix
1
2
3
1
2
3
Spots (Sequences affixed to slide)
Hybridization
31Gene Expression Varies
Cy5 to Cy3 ratios
32III. Existing Computational Tools in
Bioinformatics
- Sequence similarity
- Multiple sequence alignments
- Database searching
- Evolutionary (phylogenetic) tree construction
- Sequence assemblers
- Gene finders
33Existing Biological Databases
- Molecular Sequences Genomic DNA, mRNA, ESTs,
proteins - Protein domains, motifs, or blocks
- Protein families
- Genomes
- Nomenclature and ontologies
- Biological literature
34IV. Challenges for Bioinformatics
- Analyzing and synthesizing complex experimental
data - Representing and accessing vast quantities of
information - Pattern matching
- Data mining --- whole genome analysis
- Gene discovery
- Function discovery
- Modeling the dynamics of cell function
35V. Bioinformatics at Virginia Tech
Computer science interacts with the life
sciences.
- Computer Science in Bioinformatics
- Joint research with plant biologists, microbial
biologists, biochemists, cell-cycle biologists,
animal scientists, crop scientists,
statisticians. - Projects Expresso Nupotato MURI Arabidopsis
Genome Barista Cell-Cycle Modeling - Graduate option in bioinformatics
- Virginia Bioinformatics Institute (VBI)
36Expresso A Problem Solving Environment (PSE) for
Microarray Experiment Design and Analysis
- Integration of design and procedures
- Integration of image analysis tools and
statistical analysis - Data mining using inductive logic programming
(ILP) - Closing the loop
- Integrating models
37Nupotato
- Potatoes originated in the Andes, where there are
many varieties. - Many varieties survive at high altitude in cold,
dry conditions. - Microarray technology can be used to investigate
genes that are responsible for stress resistance
and that are responsible for the production of
nutrients.
38MURI
- Some microorganisms have the ability to survive
drying out or intense radiation. - Their genomes are just being sequenced.
- Using microarrays and proteomics, we will try to
correlate computationally the genes in the
genomes with the special traits of the
microorganisms. - We are currently using multiple genome analysis.
39Arabidopsis Genome Project
- Arabidopsis is a model higher plant.
- It is the first higher plant whose genome has
been fully sequenced. - Gene finder software has been used to identify
putative genes. - We are computationally mining the regulatory
regions of these genes for promoter patterns.
40Barista
- Barista serves Expresso!
- Software development team across projects to
minimize duplication of effort. - Work with Linux, Perl, C, Python, cvs, Apache,
PHP,
41Virginia Bioinformatics Institute (VBI)
- Research institute based at Virginia Tech
- Established July 1, 2000, with 3 million
- Will occupy 2 building and have 100 employees
in 4 years
42Getting Into Bioinformatics
- Learn some biology --- genetics, cell biology
- Study computational (molecular) biology
- Get involved with bioinformatics research in
interdisciplinary teams - Work with biologists to solve their problems