Genome Organization and Genomics - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Genome Organization and Genomics

Description:

Genome Organization and Genomics – PowerPoint PPT presentation

Number of Views:639
Avg rating:3.0/5.0
Slides: 38
Provided by: pat8151
Category:

less

Transcript and Presenter's Notes

Title: Genome Organization and Genomics


1
Genome Organization and Genomics
  • Reading
  • Weaver Chapter 24
  • Nature 409 860
  • Science 291 1304

2
(No Transcript)
3
Human Genome Project timeline
4
Increases in sizes of genomes sequenced Genome
sequenced Year Genome size
Comment Bacteriophage ?X174 1977
5.38 kb 1st genome sequenced Bacteriopha
ge ? 1982 48.5
kb Epstein-Barr virus 1984
172 kb Yeast chromosome III 1992 315
kb 1st chromosome sequenced Haemophilus
influenzae 1995 1.8 Mb 1st
genome of cellular org. Saccharomyces cerevisiae
1996 12 Mb 1st eukaryotic
genome Ceanorhabditis elegans 1998
97 Mb 1st multicellular organism Drosophila
melanogaster 2000 165 Mb
Arabidopsis thaliana 2000 125 Mb
1st plant genome sequenced Homo sapiens
2001 3000 Mb 1st
mammalian genome Rice (Oryza sativa)
2002 430 Mb 1st crop plant Pufferfish
(Fugu rubripes) 2002 400 Mb Smallest
known

vertebrate genome Mouse (Mus musculus)
2002.3 2700 Mb Closest model organism

to man
5
Genome Sequencing
Genome 3 Gb
6
What does the sequence mean?
TCACAATTTAGACATCTAGTCTTCCACTTAAGCATATTTAGATTGTTTCC
AGTTTTCAGCTTTTATGACTAAATCTTCTAAAATTGTTTTTCCCTAAATG
TATATTTTAATTTGTCTCAGGAGTAGAATTTCTGAGTCATAAAGCGGTCA
TATGTATAAATTTTAGGTGCCTCATAGCTCTTCAAATAGTCATCCCATTT
TATACATCCAGGCAATATATGAGAGTTCTTGGTGCTCCACATCTTAGCTA
GGATTTGATGTCAACCAGTCTCTTTAATTTAGATATTCTAGTACATACAA
AATAATACCTCAGTGTAACCTCTGTTTGTATTTCCCTTGATTAACTGATG
CTGAGCACATCTTCATGTGCTTATTGACCATTAATTAGTCTTATTTGTTA
AATGTCTCAAATATTTTATACAGTTTTACATTGTGTTATTCATTTTTTAA
AAAATTCATTTTAGGTTATATGTATGTGTGTGTCAAAGTGTGTGTACATC
TATTTGATATATGTATGTCTATATATTCTGGATACCATCTCTGTTTCATG
CATTGCATATATATTTGCCTATTTAGTGGTTTATCTTTTCATTTTCTTTT
GGTATCTTTTCATTAGAAATGTTATTTATTTTGAGTAAGTAACATTTAAT
ATATTCTGTAACATTTAATGAATCATTTTATGTTATGTTTAGTATTAAAT
TTCTGAAAACATTCTATGTATTCTACTAGAATTGTCATAATTTTATCTTT
TATATACATTGATATTTTTATGTCAAATATGTAGGTATGTGATATTATGC
ACATGGTTTTAATTCAGTTAATTGTTCTTCCAGATGTTTGTACCATTCCA
ACATCATTTAAATCATTAAATGAAAAGCCTTTCCTTACTAGCTAGCCAGC
TTTGAAAATCCATTCATAGGGTTTGTGTTAATATATTTTTGTTCTTTTTT
TTCCTTTCTACTGATCTCTTTATATTAATACCTACTGTGGCTTTATATGA
AGTCATGGAATAATACGTAGTAAGCCCTCTAACACTGTTCTGTTACTGTT
GTTATTGTTTTCTCAGGGTACTTTGAAATATTCGAGATTTTATTATTTTT
TAGTAGCCTAGATTTCAAGATTGTTTTGACGATCAATTTTTGAATCAATT
GTCAATATTTTTAGTAATAAAATGATGATTTTTGATTGGAAATACATTAA
ATCTATAAGCCAAATTGGAGATTATTGATATATTAACAAAAATGAGTTTT
CCAGTCCATGAATGTATGCACATTATAAAATTCATTCTTAAGTATGTCAT
TTTTTAAGTTTTAGTTTCAGCAGTATATGTTTGTTACATAGGTAAACTCC
TGTCATGGGGGTTAGTTGTACAGGTTATTTTATCATCCAGGCATAAAGCC
CAGTACCCAGTAGTTATCTTTTCTGCTCCTCTCCCTCCTGTCACCCTCCA
CTCTCAAGTAGACCCCAGTTTCTGTTGTTCTCTTCTTTGCATTAATGACT
TCTCATCATTTAGATTGCACTTGTAAGTGAGAACAGGACGTATGTGGTTT
TCTACTCCTGTGTTAGTTTGCTAAGGATAACCACCTCCATCTCCATCCAT
GTTCCCACAAAAGACATGATCTCCTTTTTTATGGCTGCATATTATTCCAT
GGTATATATGTACCACATTTTCTTTATCCAATCTGTCATTGATGGACATT
TAGGTTGTTTCCACATCATTGCCGTTGTAAATACTGCTGCAGTGAATATT
CGTGTGTATGTCTTTATGGTAGAATGATTTATATTCCTCTGGGTATATTT
CCAAGTAATGGGATGGTTGGGTCAAATGGTAATTCTGCTTTTAGCTTTTT
GAGGAATTGCCATATTGCCTTTCACAACGGTTGAACTAATTTATACTCCC
AAGAGTGTATAAGTTGTTCCTTTTTCTCTGCAACCTCGACATCACCTGTT
ATTTATGACTTTTATATAATAGCCATTCTGCTGGTCTGAGATGGTATCTC
ATTATGATTTTGATTTGCATTTCTCTAATGCTCAGTGATATTGAGCTTGG
CTGCATATATGTCTTCTTTTAAAAATATCTGTTCATGTCCTTTGCCTAAT
TTATAACGGGGTTGTTTGTTTTTCTCTTGTAAATTTGTTTAAGTTCCTTA
TAGATTCTAGGTATTAAACCTTTTTTCAGAGGCGTGGCTTGCAAATATTT
TCTCCCATTCTATAGGTTGTCTGTTTATTCTGTTGATAGTTTCCCTTGCT
GTGCAGAAGCTCTTAACTTTAATTAGATCCGACTTGTCAATTTTTGCTTT
GGTCGCAATTGCTTTTGATGTTATTGTCGTGAAATCTTTGCTAGTTCTTA
GGTCCAGGATGATATTGCCCAAGTTGTCTTCCAGGGCTTTTATAATTTTG
GATTTTACATTTAAGTCTTAATATATTTATTAAATTTGTTAGGGTTTCAG
GATACAAGGACAATATAGCAGCAAACAATGTAAAAGTAAAATCTGAAAAA
TAATAGAAAACAGTTTAATTGAACACTTTACCATTATGTAATGCCCTTCT
TTGTCTTTCCTGATCTTTGTTGGTTTGAAGTTCAAAAAAGACAAACTTAA
TGGTACAATAGGTATTGTAGATTTCAGGACTTTCTGTATAAAATATTTTG
TATATATGAATAGATCATTTTTTATTTCCAGTCTTTAAACATTTTCTTAA
CATTTTCTTCTATTGCTTCACTTCACTCGCTAGGACCATCAGGACAGTGT
TGAACAGAAATTGTCAGACTGATCATCACAACTTTTTCTAGATTTTAGAA
GGAAATTTTTCTTTATTTCAACATAAAGCAGCATGTTAATGCCAAGTTTT
AATATGTGTTATCAGATTGAAATTTTTTTGTATATTTCTACATTACCAAG
AATTTTTAGCAAGAGTTTTTGTTGAGTTTTAATTTAAAAATCATTTGTTA
ATTTCATCTGATTTTTTTATTTCTCTTTTTACCTTAAGAGATTAAACTGA
CTACAGATTGAATATAAACAAACAAACAAACAAACAAAAACTCTAAAATG
CTGTGGATCAACACCACTTAGTAATTTGTATACTTGGATTCAATTTGCTG
AAATTTTGTTAGACATTTTTGCGTCGATATTTATGAGGGATGTTGATCTG
TAAAAGTATTAAAATGCCTTTGACAGATTTTGATAGCAGTGTTATTCTGG
CCTAATAAATCAAACTGAGGTATGATCCTTCCTTTTCTATTTCTTAATAG
CATTTTTAAAATTGGTGGTTTTTTCCTTCCTTAGTGAAATTTACCAGCAA
AGTAACAGGCCTTATATTTCTCTTGTGGAAATATTTTAATTTCAAATTAA
TGGTATTTTGTTCTTGTAGGGTGGTAATTTTCTCTGTGTTTGGTCTTAAT
GGACTCTTAGCTGATCACCCAGTTACTCAGCGAGGTCTCTTCACTCTGGA
AGAGCTGGAACTCCAGTGTGTTTTAGTGCAGCATGACCACGGGTATTACC
GTTCAACATTTAGGCTTTATCAGTGATAACTATTTGTCCTCATGGAGTTT
TTGCCGCTGGGCCTACACAGTTTAGGCTTCAGCTTAGAACACATAATGAA
TTCTTATGCAGATTTCTGCCCACCTTTGACCTTTCATGATTTCCTCTTCT
TGGGTAAGCTGCCTTATTAATCTGATACACTTCAGCAGTCCAGAACTACA
CTCTTTCCCTTCTCTGCTCTTGGAGATGACTCTTTTGTCTGAGATTCACT
TTGCTGTGCTGAAAAAGAAAAGTGCTTCAAGGAAGATACCAAGGAAAATC
ACAGGGCTCATTTATGTATTTCTCTTCTTTCAAGGACTACAGCTTTGTGT
TGCCTATGTTCAATTTCTGAAAATAATTAGAGCATATATACTCTGTGTGA
GAAGGCAAATCCAGACAGTTAGTTTGTATGACTAGAAGCAGAAGTCTACA
TGGAGAATTTTACTTAACTGTGTTATAGTTTCTTTAATTATTTCAAGAGT
ATGTTTAATGTTCCACAGATCTCATTCTATAAATCTTTATCATCTTAGAG
CTCTGATACTATTTAGAATTACTATTCCTTCAAATAAGAGATTAGAAACA
GGGTTATATTTGGGGTAGGTTGACTTACTTTTCTGGGAACCAAAGCATAT
TAAATTGACCAGTTTTAACACACTTCTATGTATGCACAAAGATATATATT
TACATTCTGCAAAATCATTCTTTCCTTTTTGAATTTGAAAAGGATCTTTG
GTATACAGATATTCAATAGCCAGCCTGAAGATTCATTTGAATTCATTTAA
TGTTTAGATTCACTACATGAAATGATCCAGAAGAGAGTACTCAAATATAA
GTATCTATAACGATGGAAATATACATCTCCACTGCCCAAGATGGTAGTCA
TGAGTCAATATTGATCATGTGAGACGTGGCAAGTGTTACTCAGGGTCTCA
ATATTTAAATGTATTAAGCTTTAATTAATGTAAATTTGAATTTAGCAAAA
CATGTATAGCTTGTGGTTACTGTTTTATTCAGTGCCAATATAGAACATTT
CCATGATTACAGAAAGTTATCTTAGAATACTCAGTTCTGGACTATTTTAT
CTGGCTAAATTAAATGTTAAAATATTACAAATTCATCTTCAGGCTGGCTG
TTGAATATTTTTATAGCAAAAGTCATTTATAAATTTAAAACTCAAATAAT
TATCTTTTTCAATATGTAAAATATGTCTTTACATATTCTACTCCCTTCTT
ACATACATATTCTGATGTAACATAGGTATTCTCTTATTCATGCACACTGA
AATGACAACATAAATAATTTTACTAAGTGTCACCATATAAAAAACTTTGA
ACAAAATCAGATTATATCACTGTGGATATTTCTATTTTGAACTAACTTAG
ATGATAATTTTAATCTATATCCTAGATGAACTTTAAATCAATAAAATCTC
TCAATGGTGTTATAAATCTCAAGCCATTAGCCACTGATTATCCCATTTTT
ATTCTTTTCATATTAATTTTATTGCCATGTATGAATGCTGTAGCATCCAT
GTTTAAATACTAGTTAACAAAATGCACTGGCATCAGATACAATAAGGATG
AAATGAGATATAATTAGGACTCTGGTAACACACATAAAATTGGAAAGATA
CCCTGAAATTCAAGCCAAGAAGATATTTATCCAGCTTATTTTATTTTGAG
ACAGAGTCTTGCTCTCTCACTCAGGCTGGAGTGCAGTGGACCATTCTAGG
CTCGCTCCAACCTCTGTCTCCCAAATTGAAGTAATTCTCGTGCCTCAATC
TCCCGAGTAGCTGGGATTACAGGCATGTGTCACCAAGCCTGGCTGATTTT
TGTAGTTTTAGTAGAGACGGGGTTTCACCATGATGGCCAGGCTGGTCTTG
AACTCCTGGCCTCAAGTGACTGGAACACCTCGGCCTCCTAAAGTGCTGGG
ATTACAGACGAGAGCCACTGAACAGCTTTGATCCAACTTATTTGGATGAA
TGAGTTACATATTTTACATTAAATCTGTTATTGTGATAATTCTTCATGTT
ATTTTCCATGTATAGATTTATATATAATGTAATTTTAATTTTTTTTCACC
GGAGAGTATAAACAACAATTATTTTATAAACAGGATAATAAAAATAAGAC
AAAAATTGTTGAAATGTCTTCATTTGACTACTAACTTTTTACATGTTTGT
TACTTTGAAGCTGTTATCAATACTTGTGATGTATTACAATTAAGTAAAGA
TTTAAAGATGCCATTTTTAACTTATTATGACACAAAGTCTATAAATTCTT
ATATTTTGAGATTTGTATTTAAATAACTTGTGAAATTTAATTTTAAAATA
AAATTTCTTCTATGGATTGGTCTTCAATCGAGGCATAAAAAGGAATATAA
CAGTGTGGCACTATAACTTCTATATTGAATTTCTATATTATTTAACACAA
TTATAATTTTGCTAATGAATTGTAATGTTTTTAAAAAGCTAGGTGAATTT
TATTAAATTCATTACATGGCGATAACACAGAGAAAACATTTTGGGGATTC
TTTTAAAATGGTATGTACAAAAGCTTAAAAGTTGTTATGTAGTGGCAGAG
ATAAAAAAGTAAAACAAAAAAAAGCTTAAAAGTTTGCTTTACTATTTATA
GGCTCATAAGTGTAAGTGTGCCAGAAAATGAAAAAGAAAGGAGAGAAATT
ATAAATAACTGTGTGGAAAACACAGATAAAGCATAAAGATAGAATATAAA
GATAGAAGCATTTTAATATGAGGCAGTGATGGCTTTTTGAAGAATCCCAA
CTAAGGACCTACTTTTAGTTAATAAATAATATGTTTCTAATCCCTATATT
GTCCACAGCAACCTTTTTAGGACATGGAGCAGTGACTATGAGTGCCAGAA
GGCAAGAGTAGAAGCAATTGTAAAATCATGAACACTAGTTTGTAAAATCC
TCACTGAGATATAATATCTGTTTGCCTCTACCTTAGAATTATTAATGTCT
TGAGGGCTGGGA
A very small piece of chromosome 21
7
Genomics and the human genome
  • Genetics single-gene level
  • What does this gene do?
  • When is it expressed?
  • What does it interact with?
  • Etc.
  • Genomics genetics on a genome level
  • What is the structure of the genome?
  • What are all the genes in this organism?
  • When are they expressed?
  • What do they do and how do they interact to
    produce a cell?
  • How are the genes in one organism similar to
    other organisms?
  • Etc.

8
(No Transcript)
9
E. coli K-12 Genome 4289 ORFS 115
Structural RNAs 4.6X106 bp
10
H. influenzae
1830 KB 1743 genes 40 no human counterpart
11
Mycoplasma genitalium 580 KB 482 genes!
12
Methanococcus jannaschii 1660 kB 1738
genes 60 no human counterpart
13
Eukaryote genomes sequenced
arabadopsis 125 Mb
yeast 12.1 Mb
fruit fly 180 Mb
worm 97 Mb
human 3200 Mb
Estimated number of genes
5,800
13,600
19,000
25,500
20,000-30,000
14
S.
cerevisiae 12000 KB 5885 genes 16 chromosomes
15
H. sapiens 3.2 MB 23 pairs of chromosomes 20-40,00
0 genes
16
Genomic structure
  • 3 GB
  • Only 3 protein coding!

17
Genomic annotation
Gene finding
  • Sequence features
  • Hexamer frequency (codon usage
  • Sequence similarity
  • Known expressed sequences
  • Known genes

Gene function similarity to known genes in any
organism
18
Tools
  • Public Databases
  • NCBI clearing house for all data related to
    genomes
  • Genomes, Genes, Proteins, SNPs, ESTs, Taxonomy,
    etc
  • TIGR hand curated database
  • Analysis Software
  • Database query (find similar sequences),
    alignment algorithms, family id (clustering),
    gene prediction, repeat finding, experimental
    design, etc
  • Expect for query routines, these are generally
    not accessible to biologists. Instead, results
    are made available via databases and browsers
  • Browsers
  • Genome Ensembl, MapViewer
  • Comparative Genomics VISTA, UCSC
  • Can query on location, gene name, everyone plays
    together!

19
Queries and Alignments
  • Find matches between genomes
  • Queries find local alignments for a gene or
    other short sequence
  • Global alignments attempt to optimally align
    complete sequences
  • Indels are insertions/deletions that help
    construct alignments

AGGATGAGCCAGATAGGA---ACCGATTACCGGATAGC
AGGATGA-CCAGATAGGAG
TGACCGATTACCGGATAGC
20
(No Transcript)
21
(No Transcript)
22
Table 11. Genome overview. ----------------------
--------------------------------------------------
Size of the genome (including gaps) 2.91
Gbp Size of the genome (excluding gaps) 2.66
Gbp Longest contig 1.99 Mbp Longest
scaffold 14.4 Mbp Percent of AT in the
genome 54 Percent of GC in the
genome 38 Percent of undetermined bases in
the genome 9 Most GC-rich 50 kb Chr.
2 (66) Least GC-rich 50 kb Chr. X
(25) Percent of genome classified as
repeats 35 Number of annotated
genes 26,383 Percent of annotated genes with
unknown function 42 Number of genes
(hypothetical and annotated) 39,114 Percent of
hypothetical and annotated genes with unknown
function 59 Gene with the most exons Titin
(234 exons) Average gene size 27 kbp Most
gene-rich chromosome Chr.19 (23 genes/Mb) Lea
st gene-rich chromosomes Chr.
13 (5 genes/Mb), Chr. Y (5 genes/Mb) Total
size of gene deserts (gt500 kb with no annotated
genes) 605 Mbp Percent of base pairs spanned by
genes 25.5 to 37.8 Percent of base pairs
spanned by exons 1.1 to 1.4 Percent of base
pairs spanned by introns 24.4 to
36.4 Percent of base pairs in intergenic
DNA 74.5 to 63.6 Chromosome with highest
proportion of DNA in annotated exons Chr.
19 (9.33) Chromosome with lowest proportion of
DNA in annotated exons Chr. Y (0.36) Longest
intergenic region (between annotatedhypothetical
genes) Chr.13 (3,038,416 bp) Rate of SNP
variation 1/1250 bp
23
Titin 234 introns
24
Properties of Human Proteins
Number 31,778 Average Length 352 amino
acids Matches to Nonhuman 75 Proteins Matches
to Mouse 69 Proteins
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Genomic Analysis of Gene Expression
How to identify genes responsible for
disease? How to determine global changes in gene
expression? How to intervene?
29
Global Genomic Analysis
Microarrays Two-Hybrid maps ChIP ChIP-ChIP Mass
Spec SAGE Systematic Gene Knockdown (Genomic
RNAi) Insertional Gene disruption GST-Libraries Li
nkage maps Haplotype Maps (see Nature 437
1365) SNPs
30
Microarray Analysis
Generate a grid (array) containing small regions
of DNA from known genes Generate fluorescent
labeled probes from mRNA Use hybridization to
quantify gene expression by measuring fluorescent
signals Enables dissection of global gene
expression patterns between cells and tissues,
normal vs. disease, treated vs. control
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
Molecular Portrait of Breast Cancer
Microarrays allowed expression analysis of 8012
genes between normal and breast cancer
cells Allowed identification of genes whose
expression level changed upon treatment with
doxorubucin and classification of tumors based on
treatment success and expression patterns By
profiling gene expression patterns, treatment can
be personalized to increase chances of
success New genes identified as potential
clinical targets
Write a Comment
User Comments (0)
About PowerShow.com