CS 598SS Probabilistic Methods in Biological Sequence Analysis - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

CS 598SS Probabilistic Methods in Biological Sequence Analysis

Description:

... making a single stranded mRNA using double stranded DNA as template ... The above two make up 5%of human genome. What's the rest doing? We don't know for sure ' ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 52
Provided by: Office2004316
Category:

less

Transcript and Presenter's Notes

Title: CS 598SS Probabilistic Methods in Biological Sequence Analysis


1
CS 598SSProbabilistic Methods in Biological
Sequence Analysis
  • Saurabh Sinha

2
What is the course about?
  • Bioinformatics / Computational Biology
  • Algorithms for analyzing genomes
  • Probabilistic methods

3
What is the course format?
  • Research course
  • Lectures by instructor
  • Student presentations of research papers
  • 1 students per paper
  • 2 papers in a session, typically
  • Research project presentation
  • 1-3 students per project
  • 20 - 30 mins presentation at end of course.

4
Grading
  • Project 60
  • Paper presentation 20
  • Participation (including assignments or quizzes,
    if any) 20

5
Expectations
  • Programming skills (for the project)
  • Basic exposure to probability theory
  • Basic exposure to algorithm

6
What you can do at the end of the course
  • Start working on research projects in
    bioinformatics biological sequence analysis
  • Use principled approaches, supported by
    probability theory, instead of ad hoc methods
  • If project succeeds, publish a paper in
    bioinformatics
  • Join me as a graduate advisee ?

7
Administrative Details
  • Instructor
  • Saurabh Sinha
  • Room 2122, Siebel Center
  • Email sinhas_at_cs.uiuc.edu
  • Class hrs Wed Fri, 1230pm-145pm, 1111SC
  • Office hrs Thursdays, 2pm - 3pm, 2122SC
  • CRN 46042
  • Credits 4 graduate hrs
  • Welcome to sit in, if not taking for credit

8
Recommended books
  • Not required, but recommended
  • Biological Sequence Analysis Probabilistic
    Models of Proteins and Nucleic Acids
  • -- Durbin, Eddy, Krogh, Mitchison
  • Bioinformatics The Machine Learning Approach
  • -- Baldi, Brunak

9
Companion Course
  • CS498-CXZ Algorithms in Bioinformatics (Fall
    2005)
  • T/Th 1230pm 145pm
  • http//sifaka.cs.uiuc.edu/course/498cxz05f/info.ht
    ml

10
Why study bioinformatics?
  • Molecular biology is the new frontier of 21st
    century science
  • Computer science is the crown prince of 20th
    century engineering
  • Bioinformatics is the application and development
    of computer science with the goal of supporting
    molecular biology

11
Why study bioinformatics?
  • Flood of data several Gigabytes of sequence, and
    gene expression data.
  • Noise in the data
  • Biological
  • Experimental
  • Algorithms needed to make discoveries
  • Probabilistic methods
  • Need for efficiency

12
Why study bioinformatics?
  • The big picture
  • Human health and quality of life
  • Fundamental science
  • Billions of dollars being spent
  • Health research gets the major chunk of the US
    Govts funds
  • Fundamental health research is at the molecular
    level
  • Molecular biology research increasingly a
    quantitative science

13
Fundamental Science
  • Recent issue of Science top 25 questions
  • gtWhat Is the Universe Made Of?gtWhat is the
    Biological Basis of Consciousness?gtWhy Do Humans
    Have So Few Genes?gtTo What Extent Are Genetic
    Variation and Personal Health Linked?gtCan the
    Laws of Physics Be Unified?gtHow Much Can Human
    Life Span Be Extended?gtWhat Controls Organ
    Regeneration?gtHow Can a Skin Cell Become a Nerve
    Cell?gtHow Does a Single Somatic Cell Become a
    Whole Plant?gtHow Does Earth's Interior Work?gtAre
    We Alone in the Universe?gtHow and Where Did Life
    on Earth Arise?gtWhat Determines Species
    Diversity?gtWhat Genetic Changes Made Us Uniquely
    Human?gtHow Are Memories Stored and Retrieved?gtHow
    Did Cooperative Behavior Evolve?gtHow Will Big
    Pictures Emerge from a Sea of Biological
    Data?gtHow Far Can We Push Chemical
    Self-Assembly?gtWhat Are the Limits of
    Conventional Computing?gtCan We Selectively Shut
    Off Immune Responses?gtDo Deeper Principles
    Underlie Quantum Uncertainty and Nonlocality?gtIs
    an Effective HIV Vaccine Feasible?gtHow Hot Will
    the Greenhouse World Be?gtWhat Can Replace Cheap
    Oil -- and When?gtWill Malthus Continue to Be
    Wrong?

14
  • Let the fun begin

15
Heredity and DNA
  • Heredity children resemble parents
  • Easy to see
  • Hard to explain
  • DNA discovered as the physical (molecular)
    carrier of hereditary information
  • Watson Crick explain DNA structure in 1953.

16
Life, Cells, Proteins
  • The study of life ? the study of cells
  • Cells are born, do their job, duplicate, die
  • All these processes controlled by proteins

17
Protein functions
  • Enzymes (catalysts)
  • Control chemical reactions in cell
  • E.g., Aspirin inhibits an enzyme that produces
    the inflammation messenger
  • Transfer of signals/molecules between and inside
    cells
  • E.g., sensing of environment
  • Regulate activity of genes

18
DNA
  • DNA is a molecule deoxyribonucleic acid
  • Double helical structure (discovered by Watson,
    Crick Franklin)
  • Chromosomes are densely coiled and packed DNA

19
Chromosome
DNA
SOURCE http//www.microbe.org/espanol/news/human_
genome.asp
20
The DNA Molecule
5
G -- C A -- T T -- A G -- C C -- G G -- C T -- A
G -- C T -- A T -- A A -- T A -- T C -- G T -- A
?
3
Base Nucleotide
21
Protein
  • Protein is a sequence or chain of amino-acids
  • 20 possible amino acids
  • The amino-acid sequence folds into a 3-D
    structure called protein

22
Protein Structure
Protein
PNAS cover, courtesy Amie Boal
DNA
The DNA repair protein MutY (blue) bound to DNA
(purple).
23
From DNA to Amino-acid sequence
Cell
SRChttp//www.biologycorner.com/resources/DNA-RNA
.gif
24
From DNA to Protein In words
  • DNA nucleotide sequence
  • Alphabet size 4 (A,C,G,T)
  • DNA ? mRNA (single stranded)
  • Alphabet size 4 (A,C,G,U)
  • mRNA ? amino acid sequence
  • Alphabet size 20
  • Amino acid sequence folds into 3-dimensional
    molecule called protein

25
What about RNA ?
  • RNA ribonucleic acid
  • U instead of T
  • Usually single stranded
  • Has base-pairing capability
  • Can form simple non-linear structures
  • Life may have started with RNA

26
Central Dogma
  • Information flows from DNA to RNA to Protein
  • Once information it has passed on to protein, it
    cannot come back to DNA

27
DNA and genes
  • DNA is a very long molecule
  • If kept straight, will cover 5cm (!!) in human
    cell
  • DNA in human has 3 billion base-pairs
  • String of 3 billion characters !
  • DNA harbors genes
  • A gene is a substring of the DNA string
  • A gene codes for a protein

28
Genes code for proteins
  • DNA ? mRNA ? protein can actually be written as
    Gene ? mRNA ? protein
  • A gene is typically few hundred base-pairs (bp)
    long

29
Transcription
  • Process of making a single stranded mRNA using
    double stranded DNA as template
  • Only genes are transcribed, not all DNA
  • Gene has a transcription start site and a
    transcription stop site

30
Step 1 From DNA to mRNA
Transcription
SOURCE http//academy.d20.co.edu/kadets/lundberg/
DNA_animations/rna.dcr
31
Translation
  • Process of making an amino acid sequence from
    (single stranded) mRNA
  • Each triplet of bases translates into one amino
    acid
  • Each such triplet is called codon
  • The translation is basically a table lookup

32
(No Transcript)
33
The Genetic Code
SOURCE http//www.bioscience.org/atlases/genecode
/genecode.htm
34
Step 2 mRNA to Amino acid sequence
Translation
SOURCE http//bioweb.uwlax.edu/GenWeb/Molecular/T
heory/Translation/translation.htm
35
Gene structure
SOURCE http//www.wellcome.ac.uk/en/genome/thegen
ome/hg02b001.html
36
Gene structure
  • Exons and Introns
  • Introns are spliced out, and are not part of
    mRNA
  • Promoter (upstream) of gene

37
Gene expression
  • Process of making a protein from a gene as
    template
  • Transcription, then translation
  • Can be regulated

38
Gene Regulation
  • Chromosomal activation/deactivation
  • Transcriptional regulation
  • Splicing regulation
  • mRNA degradation
  • mRNA transport regulation
  • Control of translation initiation
  • Post-translational modification

39
Transcriptional regulation
TRANSCRIPTION FACTOR

40
Transcriptional regulation
TRANSCRIPTION FACTOR
41
The importance of gene regulation
42
Genetic regulatory network controlling the
development of the body plan of the sea urchin
embryo Davidson et al., Science,
295(5560)1669-1678.
43
  • That was the circuit responsible for
    development of the sea urchin embryo
  • Nodes genes
  • Switches gene regulation
  • Change the switches and the circuit changes
  • Gene regulation significance
  • Development of an organism
  • Functioning of the organism
  • Evolution of organisms

44
Genome
  • The entire sequence of DNA in a cell
  • All cells have the same genome
  • All cells came from repeated duplications
    starting from initial cell (zygote)
  • Human genome is 99.9 identical among individuals
  • Human genome is 3 billion base-pairs (bp) long

45
Genome features
  • Genes
  • Regulatory sequences
  • The above two make up 5of human genome
  • Whats the rest doing?
  • We dont know for sure
  • Annotating the genome
  • Task of bioinformatics

46
Some genome sizes
Organism Genome size (base pairs) Virus, Phage
F-X174 5387 - First sequenced genome Virus,
Phage ? 5104 Bacterium, Escherichia
coli 4106 Plant, Fritillary assyrica 131010
Largest known genome Fungus,Saccharomyces
cerevisiae 2107 Nematode, Caenorhabditis
elegans 8107 Insect, Drosophila
melanogaster 2108 Mammal, Homo
sapiens 3109 Note The DNA from a single human
cell has a length of 1.8m.
47
Evolution
  • A model/theory to explain the diversity of life
    forms
  • Some aspects known, some not
  • An active field of research in itself
  • Bioinformatics deals with genomes, which are
    end-products of evolution. Hence bioinformatics
    cannot ignore the study of evolution

48
endless forms most beautiful and most
wonderful - Charled Darwin
49
Evolution
  • All organisms share the genetic code
  • Similar genes across species
  • Probably had a common ancestor
  • Genomes are a wonderful resource to trace back
    the history of life
  • Got to be careful though -- the inferences may
    require clever techniques

50
Evolution
  • Lamarck, Darwin, Weissmann, Mendel

51
Some mechanisms of evolution
  • New or different genes
  • Gene duplication new gene
  • Gene mutation different gene
  • New or different regulation of genes
  • Switches change, therefore circuit changes, even
    though genes are same
  • A difference of time scales
Write a Comment
User Comments (0)
About PowerShow.com