Some Biology That Computer Scientists Need for Bioinformatics - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Some Biology That Computer Scientists Need for Bioinformatics

Description:

Unlike DNA, proteins have three-dimensional structure essential to protein function. ... Molecular Sequences: Genomic DNA, mRNA, ESTs, proteins. Protein ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 37
Provided by: drrutha
Learn more at: https://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Some Biology That Computer Scientists Need for Bioinformatics


1
Some Biology That Computer Scientists Need for
Bioinformatics
Lenwood S. Heath Virginia Tech Blacksburg, VA
24061 heath_at_cs.vt.edu
University of Maryland December 14, 2001
2
Overview
  • Some Molecular Biology and Genomics
  • Language of the New Biology
  • Existing bioinformatics tools
  • Bioinformatics challenges
  • Bioinformatics at Virginia Tech

3
I. Some Molecular Biology
  • The instruction set for a cell is contained in
    its chromosomes.
  • Each chromosome is a long molecule called DNA.
  • Each DNA molecule contains 100s or 1000s of
    genes.
  • Each gene encodes a protein.
  • A gene is transcribed to mRNA in the nucleus.
  • An mRNA is translated to a protein on ribosomes.

4
Transcription and Translation
Transcription
Translation
DNA
mRNA
Protein
5
Elaborating Cellular Function
Regulation
Degradation
Transcription
Translation
DNA
mRNA
Protein
(Genetic Code)
Reverse Transcription
  • Functions
  • Structure
  • Catalyze chemical reactions
  • Respond to environment

Thousands of Genes!
6
Chromosomes
  • Long molecules of DNA 104 to 108 base pairs
  • 26 matched pairs in humans
  • A gene is a subsequence of a chromosome that
    encodes a protein.
  • Proteins associated with cell function,
    structure, and regulation.
  • Only a fraction of the genes are in use at any
    time.
  • Every gene is present in every cell.

7
DNA Strand
2-deoxyribose (sugar)
5 End
3 End
C
T
C
A
A
T
T
G
A
G
C
G
Bases
A (adenine) complements T (thymine)
C (cytosine) complements G (guanine)
8
Complementary DNA Strands
C
T
C
A
A
T
T
G
A
G
C
G
T
G
A
G
C
C
T
T
A
A
C
G
C
C
C
A
T
T
T
A
A
G
G
G
T
T
T
G
G
G
A
A
A
C
C
C
Double-Stranded DNA
9
RNA Strand
Ribose (sugar)
5 End
3 End
C
U
C
A
A
U
U
G
A
G
C
G
Bases
U (uracil) replaces T (thymine)
10
Transcription of DNA to mRNA
Coding DNA Strand
C
C
C
A
T
T
T
A
A
T
T
T
G
G
G
A
A
A
C
C
C
Template DNA Strand
mRNA Strand
C
U
C
A
A
U
U
G
A
G
C
G
T
G
A
G
C
C
T
T
A
A
C
G
Template DNA Strand
11
Proteins and Amino Acids
  • Protein is a large molecule that is a chain of
    amino acids (100 to 5000).
  • There are 20 common amino acids
  • (Alanine, Cysteine, , Tyrosine)
  • Three bases --- a codon --- suffice to encode an
    amino acid.
  • There are also START and STOP codons.

12
Genetic Code
13
Translation to a Protein
mRNA Strand
C
U
C
A
A
U
U
G
A
G
C
G
Arginine
Histidine
Alanine
Phenylalanine
Nascent Polypeptide Amino Acids Bound Together
by Peptide Bonds
Unlike DNA, proteins have three-dimensional
structure essential to protein function.
Protein folds to a three-dimensional shape that
cannot yet be predicted from the primary sequence.
14
Transcription and Translation
Transcription
Translation
DNA
mRNA
Protein
15
Transcription of DNA to mRNA
Coding DNA Strand
C
C
C
A
T
T
T
A
A
T
T
T
G
G
G
A
A
A
C
C
C
Template DNA Strand
mRNA Strand
C
U
C
A
A
U
U
G
A
G
C
G
T
G
A
G
C
C
T
T
A
A
C
G
Template DNA Strand
16
Translation to a Protein
mRNA Strand
C
U
C
A
A
U
U
G
A
G
C
G
Arginine
Histidine
Alanine
Phenylalanine
Nascent Polypeptide Amino Acids Bound Together
by Peptide Bonds
17
Cells Fetch-Execute Cycle
  • Stored Program DNA, chromosomes, genes
  • Fetch/Decode RNA, ribosomes
  • Execute Functions Proteins --- oxygen
    transport, cell structures, enzymes
  • Inputs Nutrients, environmental signals,
    external proteins
  • Outputs Waste, response proteins, enzymes

18
II. The Language of the New Biology
A new language has been created. Words in the
language that are useful for todays talks.
Genomics
Functional Genomics Proteomics
cDNA Microarrays
Global Gene Expression Patterns
19
Genomics
  • Discovery of genetic sequences and the ordering
    of those sequences into
  • individual genes
  • gene families
  • chromosomes.
  • Identification of
  • sequences that code for gene products/proteins
  • sequences that act as regulatory elements.

20
Genome Sequencing Projects
  • Drosophila
  • Yeast
  • Mouse
  • Rat
  • Arabidopsis
  • Human
  • Microbes

21
Drosophila Genome
22
Functional Genomics
  • The biological role of individual genes.
  • Mechanisms underlying the regulation of their
    expression.
  • Regulatory interactions among them .

23
Glycolysis, Citric Acid Cycle, and Related
Metabolic Processes
24
Gene Expression
  • Only certain genes are turned on at any
    particular time.
  • When a gene is transcribed (copied to mRNA), it
    is said to be expressed.
  • The mRNA in a cell can be isolated. Its
    contents give a snapshot of the genes currently
    being expressed.
  • Correlating gene expressions with conditions
    gives hints into the dynamic functioning of the
    cell.

25
Gene ExpressionControl Points
26
Responses to Environmental Signals
27
Intracellular Decision Making
28
Microarray Technology
  • In the past, gene expression and gene
    interactions were examined known gene by known
    gene, process by process.
  • With microarray technology
  • Simultaneous examination of large groups of genes
    and associated interactions
  • Possible discovery of new cellular mechanisms
    involving gene expression

29
Flow of a Microarray Experiment
Replication and Randomization
PCR
Select cDNAs
Robotic Printing
Test of Hypotheses
Reverse Transcription and Fluorescent Labeling
Extract RNA
30
Relative Abundance Detection
Detection
Treatment
Control
1
2
1
2
1
2
2
3
1
3
3
3
Emission
Excitation
Mix
1
2
3
1
2
3
Spots (Sequences affixed to slide)
Hybridization
31
Gene Expression Varies
Cy5 to Cy3 ratios
32
III. Existing Computational Tools in
Bioinformatics
  • Sequence similarity
  • Multiple sequence alignments
  • Database searching
  • Evolutionary (phylogenetic) tree construction
  • Sequence assemblers
  • Gene finders

33
Existing Biological Databases
  • Molecular Sequences Genomic DNA, mRNA, ESTs,
    proteins
  • Protein domains, motifs, or blocks
  • Protein families
  • Genomes
  • Nomenclature and ontologies
  • Biological literature

34
IV. Challenges for Bioinformatics
  • Analyzing and synthesizing complex experimental
    data
  • Representing and accessing vast quantities of
    information
  • Pattern matching
  • Data mining --- whole genome analysis
  • Gene discovery
  • Function discovery
  • Modeling the dynamics of cell function

35
V. Bioinformatics at Virginia Tech
Computer science interacts with the life
sciences.
  • Computer Science in Bioinformatics
  • Joint research with plant biologists, microbial
    biologists, biochemists, cell-cycle biologists,
    animal scientists, crop scientists,
    statisticians.
  • Projects Expresso Nupotato MURI Arabidopsis
    Genome Barista Cell-Cycle Modeling
  • Graduate option in bioinformatics
  • Virginia Bioinformatics Institute (VBI)

36
Expresso A Problem Solving Environment (PSE) for
Microarray Experiment Design and Analysis
  • Integration of design and procedures
  • Integration of image analysis tools and
    statistical analysis
  • Data mining using inductive logic programming
    (ILP)
  • Closing the loop
  • Integrating models

37
Nupotato
  • Potatoes originated in the Andes, where there are
    many varieties.
  • Many varieties survive at high altitude in cold,
    dry conditions.
  • Microarray technology can be used to investigate
    genes that are responsible for stress resistance
    and that are responsible for the production of
    nutrients.

38
MURI
  • Some microorganisms have the ability to survive
    drying out or intense radiation.
  • Their genomes are just being sequenced.
  • Using microarrays and proteomics, we will try to
    correlate computationally the genes in the
    genomes with the special traits of the
    microorganisms.
  • We are currently using multiple genome analysis.

39
Arabidopsis Genome Project
  • Arabidopsis is a model higher plant.
  • It is the first higher plant whose genome has
    been fully sequenced.
  • Gene finder software has been used to identify
    putative genes.
  • We are computationally mining the regulatory
    regions of these genes for promoter patterns.

40
Barista
  • Barista serves Expresso!
  • Software development team across projects to
    minimize duplication of effort.
  • Work with Linux, Perl, C, Python, cvs, Apache,
    PHP,

41
Virginia Bioinformatics Institute (VBI)
  • Research institute based at Virginia Tech
  • Established July 1, 2000, with 3 million
  • Will occupy 2 building and have 100 employees
    in 4 years

42
Getting Into Bioinformatics
  • Learn some biology --- genetics, cell biology
  • Study computational (molecular) biology
  • Get involved with bioinformatics research in
    interdisciplinary teams
  • Work with biologists to solve their problems
Write a Comment
User Comments (0)
About PowerShow.com