Title: CS273A Computational Tour of the Human Genome, Gill Bejerano, Fall 200910, Stanford
1This Friday 10am Beckman B-200 Introduction to
text processing lingos.
2Lecture 3
- Genome Content
- Repetitive Sequences
- Genes
3Our Place in the Tree of Life
? you are here
Human Molecular Genetics, 3rd Edition
4Metazoans (multi-cellular organisms)
? you are here
Human Molecular Genetics, 3rd Edition
5Vertebrates
, Stickleback
, Lizard
, Opossum
? you are here
Human Molecular Genetics, 3rd Edition
6INTERSPECIES VARIATION IN GENOME SIZE WITHIN
VARIOUS GROUPS OF ORGANISMS
Figure from Ryan Gregory (2005)
7Meet Your Genome Continues
Human Molecular Genetics, 3rd Edition
8(No Transcript)
9Repeats / obile Elements ("selfish DNA")
Human Genome 3109 letters
1.5 known function
gt50 junk
10Adapted from Lunter
11(No Transcript)
12(No Transcript)
13TE composition and assortment vary among
eukaryotic genomes
100
80
60
DNA transposons
LTR Retro.
40
Non-LTR Retro.
20
Rice
Fugu
Mouse
Human
Mosquito
Nematode
Slime mold
Neurospora
Arabidopsis
Drosophila
Fission yeast
Budding yeast
Feschotte Pritham 2006
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Assemby Challenges
21Inferring Phylogeny Using Repeats
Nishihara et al, 2006
22Functional elements from obile Elements
Co-option event, probably due to favorable
genomic context
Yass is a small town in New South Wales,
Australia.
Bejerano et al., Nature 2006
23The amount of TE correlate positively with genome
size
Mb
Genomic DNA
3000
2500
TE DNA
2000
Protein-coding DNA
1500
1000
500
0
Rice
Maize
Mosquito
Slime mold
Brassica
Plasmodium
Sea squirt
Neurospora
Arabidopsis
Fugu
Drosophila
Nematode
Zebrafish
Mouse
Fission yeast
Budding yeast
Human
Feschotte Pritham 2006
24The proportion of protein-coding genes decreases
with genome size, while the proportion of TEs
increases with genome size
TEs
Protein-coding genes
Gregory, Nat Rev Genet 2005
25Genome Size Variability
1pg 978 Mb
26Simple Repeats
- Every possible motif of mono-, di, tri- and
tetranucleotide repeats is vastly overrepresented
in the human genome. - These are called microsatellites,Longer
repeating units are called minisatellites,The
real long ones are called satellites. - Highly polymorphic in the human population.
- Highly heterozygous in a single individual.
- As a result microsatellites are used in paternity
testing, forensics, and the inference of
demographic processes. - There is no clear definition of how many
repetitions make a simple repeat, nor how
imperfect the different copies can be. - Highly variable between genomes e.g., using the
same search criteria the mouse rat genomes have
2-3 times more microsatellites than the human
genome. Theyre also longer in mouse rat.
27(No Transcript)
28(No Transcript)
29(No Transcript)
30- Restriction enzymes recognize and make a cut
within specific palindromic sequences, known as
restriction sites, in the DNA. This is usually a
4- or 6 base pair sequence.
blunt end sticky end
31DNA Fingerprint Basics
- DNA fragments of different size will be produced
by a restriction enzyme that cuts at the points
shown by the arrows.
32DNA fragments are then separated based on size
using gel electrophoresis.
33DNA Fingerprinting can be used in paternity
testing or murder cases.
34(No Transcript)
35- From an evolutionary point of view transposons
and simple repeats are very different. - Different instances of the same transposon share
common ancestry (but not necessarily a direct
common progenitor). - Different instances of the same simple repeat
most often do not.
36The Gene-ome makes lt 2 of the H.G.
Human Molecular Genetics, 3rd Edition
37Gene Structure
- Signal a string of DNA recognized by the
cellular machinery
38Gene Processing
Eukaryotic Gene Structure
39Gene Finding The Practice
Challenge The genes, the whole genes, and
nothing but the genes Problems spliced ESTs ?
legitimate gene isoform? predicting gene
isoforms tissue/condition-specific genes / gene
isoforms single exon genes pseudogenes Practice
40Evolution of Gene Finding Tools
etc
41The Human Gene Set
HGC, 2001
42Celera, 2001
43wrong!
44Signal Transduction
45Ancient Origins of Important Gene Families
46- Multigene families due to
- Single gene duplication
- Segment duplication Tandem duplication or
duplication transposition - a b c d e f g
- a b c d e f b c d g
- Horizontal gene transfer
- Genome-wide doubling event
47Horizontal Gene Transfer
48Horizontal Gene Transfer in the H.G.
HGC, 2001
49Or is it?
Kurland et al., 2003
50HGT between fish their parasites
51Retroposed Genes and Pseudogenes