Introduction to Bioinformatics - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Introduction to Bioinformatics

Description:

Introduction to Bioinformatics – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 53
Provided by: off669
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Bioinformatics


1
Introduction to Bioinformatics
BCB 444/544
Instructor Drena Dobbs
ddobbs_at_iastate.edu
TAs Michael Terribilini
terrible_at_iastate.edu Jeff Sander
jdsander_at_iastate.edu Pete Zaback
petez_at_iastate.edu Lab MBB 106, 4-4991
2
BCB 444/544 Introduction to Bioinformatics
  • Website
  • http//bindr.gdcb.iastate.edu/bcb544
  • Syllabus Schedules
  • Lecture PPTs
  • Lab Exercises
  • Practice Exams, etc.
  • Check regularly for updates!

3
BCB 444/544 Introduction to Bioinformatics
Textbook Discovering Genomics, Proteomics
Bioinformatics Campbell Heyer,
2nd Edition, 2007 Textbook Companion Website
http//www.aw-bc.com/geneticsplace/ (click on
purple textbook cover)
4
BCB 444/544 Introduction to Bioinformatics
Computer Laboratory Meets in 1304
MBB Current schedule Thurs 1-3
PM Conflicts? Alternatives?
5
BCB 444/544 - Introduction to Bioinformatics
Lecture 1 What is Bioinformatics? (
Computational Biology) 1_Aug21
6
Reading Exercises (before lecture)
  • Chp 1 What's Wrong with my Child?
  • Wed Aug 23
  • CH Chp 1.1 pp ii-9
  • Discovery Questions (DQ) 1-13
  • Access website while reading complete DQs
    (But, do not submit answers by email)
  • Math Minute (MM) 1.1 DQs
  • Fri Aug 25
  • CH Chp 1.2 pp 9-19
  • DQs 14-31
  • MM 1.2 1.3

7
What is Bioinformatics?( What is Computational
Biology?)
  • Wikipedia
  • Bioinformatics computational biology involve
    the use of techniques from mathematics,
    informatics, statistics, and computer science (
    engineering) to solve biological problems

8
What is Bioinformatics?( What is Computational
Biology?)
  • Gerstein
  • (Molecular) Bioinformatics is conceptualizing
    biology in terms of molecules applying
    informatics techniques - derived from
    disciplines such as mathematics, computer
    science, and statistics - to organize and
    understand information associated with these
    molecules, on a large scale

Modified from Mark Gerstein
9
What is the Information?Biological Sequences,
Structures, Processes
  • Central Dogma of Molecular Biology
  • DNA sequence -gt RNA -gt Protein -gt
    Phenotype
  • Molecules
  • Sequence, Structure, Function
  • Processes
  • Mechanism, Specificity, Regulation
  • Central Paradigm for Bioinformatics
  • Genomic (DNA) Sequence
  • -gt mRNA other RNA Sequences
  • -gt Protein Sequences -gt RNA
    Protein Structures -gt RNA Protein
    Functions -gt Phenotype
  • Large Amounts of Information
  • Standardized
  • Statistical

Modified from Mark Gerstein
10
Explosion of "Omes" "Omics!"Genome,
Transcriptome, Proteome
  • Genome - the complete collection of DNA (genes
    and "non-genes") of an organism
  • Transcriptome - the complete collection of RNAs
    (mRNAs others) expressed in an organism
  • Proteome - the complete collection of proteins
    expressed in an organism

11
Genome ConstantTranscriptome Proteome
Variable
  • Genome - the complete collection of DNA (genes
    and "non-genes") of an organism

Note Although the DNA is "identical" in all
cells of an organism, the sets of RNAs or
proteins expressed in different cells tissues
of a single organism vary greatly -- and depend
on variables such as environmental conditions,
age. developmental stage disease state, etc.
  • Transcriptome - the complete collection of RNAs
    (mRNAs others) expressed in an organism
  • Proteome - the complete collection of proteins
    expressed in an organism

12
Molecular Biology Information DNA RNA
Sequences
  • Functions
  • Genetic material
  • Information transfer (mRNA)
  • Protein synthesis (tRNA/mRNA)
  • Catalytic regulatory activities
  • (some very new!)
  • Information
  • 4 letter alphabet
  • (DNA nucleotides AGCT)
  • 1,000 base pairs in a small gene
  • 3 X 109 bp in a genome (human)

DNA sequence atggcaattaaaattggtatcaatggttttggtc
gtat gcacaacaccgtgatgacattgaagttgtaggtattaa atggct
tatatgttgaaatatgattcaactcacggtcg aaagatggtaacttagt
ggttaatggtaaaactatccg Gcaaacttaaactggggtgcaatcggtg
ttgatatcgctttaactgatgaaactgctcgtaaacatatcactgcaggc
gcaaaaaaagtt RNA sequence has "U" instead of "T"
  • Where are the genes?
  • Which DNA sequences encode mRNA?
  • Which DNA sequences are "junk"?
  • Which RNA sequences encode protein?

Modified from Mark Gerstein
13
Molecular Biology Information Protein Sequences
Functions Most cellular functions are performed
or facilitated by proteins
Protein sequences d1dhfa_ LNCIVAVSQNMGIGKNGDLPWP
PLRNEFRYFQRMTT d8dfr__ LNSIVAVCQNMGIGKDGNLPWPPLRNE
YKYFQRMTS d4dfra_ ISLIAALAVDRVIGMENAMPWN-LPADLAWFK
RNTL d3dfr__ TAFLWAQDRDGLIGKDGHLPWH-LPDDLHYFRAQTV
  • Biocatalysis
  • Cofactor transport/storage
  • Mechanical motion/support
  • Immune protection
  • Regulation of growth and differentiation
  • Information
  • 20 letter alphabet (amino acids)
  • ACDEFGHIKLMNPQRSTVWY
  • (but not BJOUXZ)
  • 300 aa in an average protein
  • (in bacteria)
  • 3 X 106 known protein sequences
  • What is this protein?
  • Which amino acids are most important -- for
    folding, activity, interaction with other
    proteins?
  • Which sequence variations are harmful (or
    beneficial)?

Modified from Mark Gerstein
14
Molecular Biology InformationMacromolecular
Structures
  • DNA/RNA/Protein Structures
  • How does a protein (or RNA) sequence fold into an
    active 3-dimensional structure?
  • Can we predict structure from sequence?
  • Can we predict function from structure (or
    perhaps, from sequence alone?)

Modified from Mark Gerstein
15
We don't yet understand the protein folding code
- but we try to engineer proteins anyway!
Modified from Mark Gerstein
16
Molecular Biology InformationBiological
Processes
  • Functional Genomics
  • How do patterns of gene expression determine
    phenotype?
  • Which genes and proteins are required for
    differentiation during during development?
  • How do proteins interact in biological networks?
  • Which genes and pathways have been most highly
    conserved during evolution?

17
On a Large Scale?Whole GenomeSequencing
Genome sequence now accumulate so quickly that,
in less than a week, a single laboratory can
produce more bits of data than Shakespeare
managed in a lifetime, although the latter make
better reading. -- G A Pekso, Nature 401
115-116 (1999)
Modified from Mark Gerstein
18
Automated Sequencing for Genome Projects
Another recent improvement rapid high
resolution separation of fragments in capillaries
instead of gels (E Yeung,Ames Lab, ISU)
Modified from Eric Green
19
1st Draft Human Genome - "Finished" in 2001
Modified from Eric Green
20
Human Genome Sequencing
  • Two approaches
  • Public (government) - International Consortium
  • (6 countries, NIH-funded in US)
  • "Hierarchical" cloning BAC-by-BAC sequencing
  • Map-based assembly
  • Private (industry) - Celera - Craig Venter,
    CEO
  • Whole genome random "shotgun" sequencing
  • Computational assembly
  • (took advantage of public maps
    sequences,too)
  • Guess which human genome they sequenced? Craig's

21
Public Sequencing - International Consortium
Modified from Eric Green
22
Comparison of Sequenced Genome Sizes
Plants? Some have much larger genomes than
human!
Modified from Eric Green
23
"Complete" Human Genome Sequence - What next?
from Eric Green
24
Next Step after the Sequence?
Understanding Gene Function on a Genomic Scale
  • Expression Analysis
  • Structural Genomics
  • Protein Interactions
  • Pathway Analysis
  • Systems Biology
  • Evolutionary Implications of
  • Introns Exons
  • Intergenic Regions as "Gene Graveyard"

Modified from Mark Gerstein
25
Interpreting the Human Genome Sequence!
from Eric Green
26
Comparative Genomics compare entire genomic
sequences
from Eric Green
27
Comparing Genomes Functional Elements
from Eric Green
28
Gene Expression Data the Transcriptome
MicroArray Data
  • Yeast Expression Data
  • Levels for all 6,000 genes!
  • Experiments to investigate how genes respond to
    changes in environment or how patterns of RNA
    expression change in normal vs cancerous tissue

ISU's Biotechnology Facilities include
state-of-the-art Microarray Instrumentation
Modified from Mark Gerstein
29
Other "Omes"?Proteome, Metabolome, etc.
ISU's Biotechnology Facilities include
state-of-the-art Proteomics Instrumentation
ISU's Biotechnology Facilities include
state-of-the-art Metabolomics Instrumentation
30
Systems Biology attempts to integrate all of
these - more - to explain the complex behaviors
of whole systems (cells, organisms, ecosystems)
How are "Omes" related?
31
Other Genome-Scale Experiments
Systematic Knockouts Make "knockout" (null)
mutations in every gene - one at a time - and
analyze the resulting phenotypes! For yeast
6,000 KO mutants!
2-hybrid Experiments For each (and every)
protein, identify every other protein with which
it interacts! For yeast 6000 x 6000 / 2
18M interactions!!
Modified from Mark Gerstein
32
Molecular Biology InformationIntegrating Data
  • Understanding the function of genomes requires
    integration of many diverse and complex types of
    information
  • Metabolic pathways
  • Regulatory networks
  • Whole organism physiology
  • Evolution, phylogeny
  • Environment, ecology
  • Literature (MEDLINE)

Modified from Mark Gerstein
33
Storing Analyzing Large-scale
InformationExponential Growth of Data Coupled
with Development of Fast Computer Technology
  • CPU vs Disk Net
  • Increases in computer speed starage capacity
    have been dramatic
  • Improved computing resources have been a driving
    force in Bioinformatics

ISU's supercomputer "CyBlue" is among 100 most
powerful in the world!
Modified from Mark Gerstein
34
Bioinformatics is born! more Bioinformaticists
are needed!
(Internet picture adaptedfrom D Brutlag,
Stanford)
Modified from Mark Gerstein
35
Informatics techniquesin Bioinformatics
  • Databases
  • Building, Querying
  • Object-oriented DB
  • String Comparison
  • Text search
  • Alignment
  • Significance statistics
  • Finding Patterns
  • Machine Learning
  • Data Mining
  • Statistics
  • Linguistics
  • Geometry
  • Robotics
  • Graphics (Surfaces, Volumes)
  • Comparison 3D Matching
  • Simulation Modeling
  • Newtonian Mechanics
  • Electrostatics
  • Numerical Algorithms
  • Simulation
  • Network modeling

36
Challenges in Organizing InformationRedundancy
and Multiplicity
  • Different sequences can have the same structure
  • Organism has many similar genes
  • Single gene may have multiple functions
  • Genes and proteins function in genetic and
    regulatory pathways
  • How do we organize all this information so that
    we can make sense of it?

Integrative Genomics sequences ltgt motifs ltgt
genes gtlt structures ltgt functions ltgt pathways ltgt
expression levels ltgtregulatory systems ltgt .
Modified from Mark Gerstein
37
Molecular Parts Conserved Domains
Modified from Mark Gerstein
38
"Parts List" approach to bike maintenance
How many roles can these play? How flexible and
adaptable are they mechanically?
What are the shared parts (bolt, nut, washer,
spring, bearing), unique parts (cogs, levers)?
What are the common parts -- types of parts (nuts
washers)?
Where are the parts located?
Modified from Mark Gerstein
39
World of structures is also finite,providing a
valuable simplification
(human)
30,000 genes
2,000 folds
(T. pallidum)
2,000 genes
Global Surveys of a Finite Set of Parts from Many
Perspectives Same logic for pathways, functions,
sequence families, blocks, motifs....
Modified from Mark Gerstein
40
Is this Bioinformatics? (1,with Answers)
  • Creating digital libraries
  • Automated bibliographic search and textual
    comparison
  • Knowledge bases for biological literature
  • Methods for structure determination
  • Computational X-ray crystallography
  • NMR structure determination
  • Distance Geometry
  • Metabolic pathway simulation
  • The DNA Computer

YES
YES
YES
No
Modified from Mark Gerstein
41
Is this Bioinformatics? 2
  • Gene identification by sequence inspection
  • Prediction of splice sites, promoters, etc.
  • DNA methods in forensics
  • Modeling populations of organisms
  • Ecological Modeling
  • Genomic sequencing methods
  • Assembling contigs
  • Physical and genetic mapping
  • Linkage analysis
  • Linking specific genes to various traits

YES
YES
YES
YES
YES
Modified from Mark Gerstein
42
Is this Bioinformatics? 3
  • Rational drug design
  • RNA structure prediction
  • Protein structure prediction
  • Radiological image processing
  • Computational representations for human anatomy
  • (e.g., Visible Human)
  • Artificial life simulations
  • Artificial immunology / Computer security

YES
No
Modified from Mark Gerstein
43
So, this is Bioinformatics
What is it good for?
44
Application IDesigning Drugs
  • Understanding how proteins bind other molecules
  • Docking structure modeling
  • Designing inhibitors

Figures adapted from Olsen Group Docking Page at
Scripps, Dyson NMR Group Web page at Scripps,
and from Computational Chemistry Page at Cornell
Theory Center).
Modified from Mark Gerstein
45
Application II Finding homologs
Modified from Mark Gerstein
46
Finding WHAT? Homologs - "same genes" in
different organisms
  • Human vs Mouse vs Yeast
  • Much easier to do experiments on yeast!

Best Sequence Similarity Matches to Date Between
Positionally Cloned Human Genes and S. cerevisiae
Proteins Human Disease
MIM Human GenBank BLASTX Yeast
GenBank Yeast Gene
Gene Acc for P-value
Gene Acc for Description
Human cDNA
Yeast cDNA Hereditary
Non-polyposis Colon Cancer 120436 MSH2
U03911 9.2e-261 MSH2 M84170 DNA
repair protein Hereditary Non-polyposis Colon
Cancer 120436 MLH1 U07418 6.3e-196 MLH1
U07187 DNA repair protein Cystic Fibrosis
219700 CFTR M28668
1.3e-167 YCF1 L35237 Metal resistance
protein Wilson Disease
277900 WND U11700 5.9e-161 CCC2
L36317 Probable copper transporter Glycerol
Kinase Deficiency 307030 GK
L13943 1.8e-129 GUT1 X69049 Glycerol
kinase Bloom Syndrome
210900 BLM U39817 2.6e-119 SGS1
U22341 Helicase Adrenoleukodystrophy,
X-linked 300100 ALD Z21876
3.4e-107 PXA1 U17065 Peroxisomal ABC
transporter Ataxia Telangiectasia
208900 ATM U26455 2.8e-90 TEL1
U31331 PI3 kinase Amyotrophic Lateral
Sclerosis 105400 SOD1 K00065
2.0e-58 SOD1 J03279 Superoxide
dismutase Myotonic Dystrophy
160900 DM L19268 5.4e-53 YPK1
M21307 Serine/threonine protein kinase Lowe
Syndrome 309000 OCRL
M88162 1.2e-47 YIL002C Z47047 Putative
IPP-5-phosphatase Neurofibromatosis, Type 1
162200 NF1 M89914 2.0e-46 IRA2
M33779 Inhibitory regulator
protein Choroideremia
303100 CHM X78121 2.1e-42 GDI1
S69371 GDP dissociation inhibitor Diastrophic
Dysplasia 222600 DTD U14528
7.2e-38 SUL1 X82013 Sulfate
permease Lissencephaly
247200 LIS1 L13385 1.7e-34 MET30
L26505 Methionine metabolism Thomsen Disease
160800 CLC1 Z25884
7.9e-31 GEF1 Z23117 Voltage-gated
chloride channel Wilms Tumor
194070 WT1 X51630 1.1e-20 FZF1
X67787 Sulphite resistance
protein Achondroplasia
100800 FGFR3 M58051 2.0e-18 IPL1
U07163 Serine/threoinine protein
kinase Menkes Syndrome
309400 MNK X69208 2.1e-17 CCC2
L36317 Probable copper transporter
Modified from Mark Gerstein
47
Application IIIGenome/Transcriptome/ProteomeCha
racterization Comparison
  • Databases, statistics
  • Occurrence of a specific genes or features in a
    genome
  • How many kinases in yeast?
  • Compare Tissues
  • Which proteins are expressed in cancer vs normal
    tissues?
  • Diagnostic tools
  • Drug target discovery

Modified from Mark Gerstein
48
Web Resources for Bioinformatics Computational
Biology
  • Wikipedia Bioinformatics
  • NCBI - National Center for Biotechnology
    Information
  • ISCB - International Society for Computational
    Biology
  • JCB - Jena Center for Bioinformatics
  • UBC - Bioinformatics Links Directory

49
ISU Resources Experts
  • ISU Research Centers Graduate Training
    Programs
  • BCB - Bioinformatics Computational Biology
  • L.H.Baker Center - Bioinformatics Biological
    Statistics
  • CIAG - Center for Integrated Animal Genomics
  • CILD - Computational Intelligence, Learning
    Discovery
  • NSF IGERT Training Grant - Computational
    Molecular Biology
  • ISU Facilities
  • Biotech - Instrumentation Facilities
  • CIAG - Center for Integrated Animal Genomics
  • PSI - Plant Sciences Institute
  • PSI Centers

50
For fun DNA Interactive "Genomes" A tutorial
on genomic sequencing, gene structure, genes
prediction Howard Hughes Medical Institute
(HHMI) Cold Spring Harbor Laboratory (CSHL)
http//www.dnai.org/c/index.html
51
Building Designer Zinc Finger DNA-binding
Proteins J Sander, P Zaback, Fengli Fu, J
Townsend, R Winfrey D Wright, K Joung, D Voytas
D Dobbs
52
Predicting DNA-Protein RNA-Protein Binding
Sites M Terribilini, P Zaback, J Sander, JH Lee,
C Yan, F Wu, V Honavar, R Jernigan, D Dobbs
Write a Comment
User Comments (0)
About PowerShow.com