PowerPoint Presentation - The Human Genome Project: The end is in site - PowerPoint PPT Presentation

Loading...

PPT – PowerPoint Presentation - The Human Genome Project: The end is in site PowerPoint presentation | free to download - id: 55e89c-MzE0N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

PowerPoint Presentation - The Human Genome Project: The end is in site

Description:

Title: PowerPoint Presentation - The Human Genome Project: The end is in site Author: Bruce A. Roe Last modified by: Bruce Roe Created Date: 1/26/2001 8:22:14 PM – PowerPoint PPT presentation

Number of Views:1004
Avg rating:3.0/5.0
Slides: 64
Provided by: Bru118
Learn more at: http://www.genome.ou.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: PowerPoint Presentation - The Human Genome Project: The end is in site


1
Automation for Genomics Discovery at the Oklahoma
Genome Center
Bruce A. Roe Department of Chemistry and
Biochemistry, University of Oklahoma, Norman, OK
73019
Working Innovation into the Drug Discovery
Pipeline June 3, 2004 Houston Marriott Medical
Center
2
Central Dogma of Molecular Biology
Each Chromosome Contains Hundreds of Genes
3
What is a GENOME?
For humans, is the complete set of 23 chromosome
pairs that we inherited from our parents. The
human genome contains all the information needed
to make a human. Most bacteria have only a
single chromosome that represents its genome and
contains all the information needed to make that
bacteria.
4
Human Genome Project Goals 1998-2003
  • Achieve 5-fold coverage of at least 90 of the
    genome in a working draft based on mapped
    clones and finish one-third of the 3 billion base
    paired human genomic DNA sequence by the end of
    2000
  • Finish the complete human genome sequence by the
    end of April 2003, marking the 50th anniversary
    of the discovery of the double helix structure of
    DNA by Watson and Crick
  • Make the sequence totally and freely accessible
  • Reduce the cost of DNA sequencing to 25
    cents/base over this 5 year period by developing
    new technologies
  • Study human genome sequence variation by creating
    a Single Nucleotide Polymorphism (SNP) map with
    at least 100,000 markers

5
How Far Have We Come as of June 2004?

  • Over 99 of the 3.15 billion bases in the human
    genome have been sequenced to completion finished
    as of April, 2003. All the data is publicly
    available in the public databases.
  • Ten human chromosomes (7,9,10,13,14,19,20,21,22,Y)
    have been annotated and published and the
    remaining 14 are in the final phases of
    annotation.
  • There are fewer than 400 gaps in the sequence of
    the 24 chromosomes (22 numbered chromosome pairs
    plus X and Y)
  • The cost of completed genomic DNA sequencing is
    slightly less than 8 cents/finished base with the
    development of improved automation.
  • Had 3 quality checking exercises where two groups
    checked the quality of another both in silico and
    by re-sequencing.

http//www.ncbi.nlm.nih.gov/genome/seq/HsHome.shtm
l
6
How do we sequence DNA?
The processes is similar to taking many copies a
newspaper, shreading it, then trying to put
together a copy of the original newspaper This
is accomplished by breaking many copies of the
DNA into small pieces and determining the order
of the four bases in each of these small
pieces Then, we overlap the small sequenced
pieces to obtain the sequence of the original,
larger DNA
7
(No Transcript)
8
Sequence Pipeline at the University of Oklahoma
Genome Center, OU-ACGT
9
Hydroshear
  • GeneMachines, Inc. San Carlos, CA
  • Precision-drilled ruby orifice
  • 500 m l syringe pump
  • Pump retraction speed range 0 40
  • A 100 to 300 ml sample sheared at a retraction
    speed
  • setting of 10 produces DNA 1- 4 Kbp fragments

10
Genetix QPixII Colony Picker
Digitizes colonies and picks in batches of 96
into 384-well plates Pins are sterilized after
each set of 96 colonies are picked
11
Cell Growth in 384 well plates in a HiGro
  • Capacity 48 shallow, 384 well plates or 24 deep
    well plates.
  • Cells are grown into TB medium supplemented with
    salts and antibiotic
  • Cells are shaken at 520 rpm for 22 hours at 370C.
  • After 3.5 hours, oxygen is added _at_ 0.5 ft3/min
    for 0.5 second every 30 seconds.

12
Zymark SciClone with Twister II
13
Subclone Isolation I (Mini-Staccato)
  • This Zymark robot has 384 cannula array, four
    built in shakers, three attached storage racks,
    built-in barcoding and a Twister II robotic arm.
  • This automation has allow us to perform the DNA
    isolation completely unattended from as many as
    80 384 well plates of bacterial cells per day.

14
Subclone Isolation I (Mini-Staccato)
The initial lysis solution (NaOH and SDS) is
added to each of four 384 well plates containing
bacterial cells that were loaded onto the
built-in shakers incorporated into the SciClone
workspace deck.
15
Subclone Isolation I (Mini-Staccato)
The second solution, TE-RNase A, is added to each
of the 384 well plates and again shaken on the
four auto-centering magnetic shakers on the
SciClone workspace deck.
16
Subclone Isolation I (Mini-Staccato)
Once all three lysis solutions are added and the
plates are shaken after each addition, the plates
are transferred from the SciClone workspace deck
to a storage rack by the Twister II robotic arm.
17
Fluorescent DNA Sequencing
18
Subclone Isolation and Sequencing Reaction
Pipetting (Velocity 11 VPrep)
  • Liquid handling station with 384-channel pipettor
    head
  • Four movable shelves on either side of the
    pipettor head
  • Used for Subclone isolation, sequencing reactions
    set-up and as shown here, the ethanol-acetate
    precipitation clean-up step.

19
Thermocycling (ABI 9700)
20
Capillary Electrophoresis DNA Sequencing
  • Our present capacity is fourteen 96 ABI 3700
    capillary electrophoresis-based DNA sequencing
    instruments that are capable of analyzing two
    384-well thermocycle plates or eight 96-well
    thermocycle plates per day.
  • The DNA sequencing data is transferred to the Sun
    computer workgroup for base calling (Phred),
    assembly (Phrap) and analysis (Consed).

21
(No Transcript)
22
Primer synthesis (Mermade IV) for PCR-based
closure and finishing
  • Standard phosphoramidite chemistry in an argon-
    filled reaction chamber.
  • 192 primers synthesized at 2.5 nmole scale. Twice
    each day.
  • 2.5 nanomole synthesis (50 cents/oligo) typically
    is used for either PCR or DNA sequencing primers,
    but can be scaled to 10 nanomole.

23
Data assembly and Analysis
Phred/Phrap/Consed
Sun V880 server
Exgap
  • 32 GB RAM running Solaris 8 OS and 3 TB of data
    stored on RAID-5 arrays with autoloader tape
    backup
  • Also
  • 12 workstations each with 1 GB RAM

24
Sanger, Keio, Wash U, OU
25
Human Chromosome 22 Sequence Features
  • 39 of the sequence is occupied by genes
    including their introns, 5 and 3 non-translated
    regions.
  • 3 of the complete sequence encodes the protein
    products of these genes.
  • 42 of the sequence is composed of repetitive
    sequences, compared to 46 for the entire
    genome.
  • Only slightly over half of the genes predicted
    for human chromosome 22 can be experimentally
    validated.

Shoemaker DD., et al. Experimental annotation
of the human genome using microarray technology.
Nature. 409, 922-7 (2001).
26
An Individuals Genome Differs from the DNA of
  • Siblings by 1 to 2 million bases, 99.98
    identical, with coding regions 99.99999
    identical
  • Unrelated humans by 6 million bases, 99.8
    identical overall, with coding regions 99.9999
    identical
  • Chimpanzees by about 100 million base pairs 98
    identical
  • Baboons by about 300 million base pairs 92
    identical
  • Mice by about 2.8 billion bases, but coding
    regions are 90 identical
  • Leaf spinach by about 2.9 billion bases, but
    coding regions are 40 identical

27
Differences between individuals
AGCCACACAGTGTCCACCGGATGGTTGATTTTGAAGCAGAGTTAGCTTGT
CACCTGCCTCCCTTTCCCGGGACAACAGAAGCTGACCTCTTTGNTCTCTT
GCGCAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTC
CACTCAGCCACTGGATTTAAGCAGAGTTCAAGTAAGTACTGGTTTGGGGA
GNAGGGTTGCAGCGGCNGAGCCAGGGTCTCCACCCAGGAAGGACTNATCG
GGCAGGGTGTGGGGAAACAGGGAGGTTGTTCAGATGACCACGGGACACCT
TTGACCCTGGCCGCTGTGGAGTGTTTGTGCTGGTTGATGCCTTCTGGGTG
TGGAATTGTTTTTCCCGGAGTGGCCTCTGCCCTCTCCCCTAGCCTGTCTC
AGATCCTGGGAGCTGGTGAGCTGCCCCCTGCAGGTGGATCGAGTAATTGC
AGGGGTTTGGCAAGGACTTTGACAGACATCCCCAGGGGTGCCCGGGAGTG
TGGGGTCCNAGCCAG
The yellow underlined sequence is the first exon
of the BCR gene involved in leukemia. Only 5
bases (N) differ in non-gene regions.
28
Human Chromosome 22 Single Nucleotide
Polymorphisms
Number of overlaps 335 Size of
overlaps 13,203,147 bp Number of SNPs 11,116
(1/1000 bp) Number of substitutions 9,123
(82) Number of ins/del 1,193 (18)
Only 48 of the 11,116 SNPs were in coding regions
10 fold lower than in non-coding
E. Dawson, et al. A SNP Resource For Human
Chromosome 22 Extracting Dense Clusters of SNPs
from the Genomic Sequence. Genome Research, 11,
170-178 (2001).
29
We each are like a different symphony orchestra
All playing the same instruments slightly
differently
30
Good news and Bad news
  • Good news lt40,000 genes (counting dark space?)
  • We only know the function of about half the
    predicted genes.
  • Likely gt 1 million different gene products based
    on alternative splicing and post-translational
    modifications.

31
Where we stand now
  • We essentially have the dictionary with all
    the words (genes) spelled correctly, but only
    slightly more than half of the words (genes) have
    definitions.
  • Slightly over half of the 936 genes predicted for
    human chromosome 22 have been experimentally
    validated.
  • 223 have a known function and expression
  • 172 have no known function but evidence for
    expression
  • 182 have no known function and no evidence for
    expression
  • 228 pseudogenes
  • Through comparative genomic sequencing we can
    annotate the human genome based on evolutionary
    conserved gene sequences and use model systems to
    study gene expression.

32
If a genomic region is conserved in evolutionary
distant organisms, it is present because the
region is maintained through selective pressure
over evolutionary time likely because it performs
necessary function.
33
(No Transcript)
34
Chimpanzee and Baboon Genomic Sequencing
  • Medically important model eukaryotic organisms
  • The chimpanzee is our nearest evolutionary
    relative with a genome that has 98 sequence
    identity with the human genome
  • The baboon genome has 92 sequence identity
    with the human genome

35
PIP Plot of a region of human chr22 compared to
syntenic regions of baboon and mouse
36
34 Kbp deletion in baboon
37
Exons in one copy of a zebrafish duplicated gene
with 75 homology to human but greatly diverged,
lt50 homology, in the other copy
38
A complementary approach is to determine if the
predicted protein coding conserved elements are
functional by investigating their expression
profiles during development.
39
Whole mount in situ hybridization using zebra
fish as the model organism
Small people that swim in the water and breath
through gills Han Wang, OU
40
Zebrafish as a model system
  • Have a short, 3 month to reproductive maturity.
  • Can be easily bred in the lab in large numbers.
  • Are small in size - an adult is just a few
    centimeters long.
  • Have an 5 day embryonic development period from
    fertilized egg to a swimming fish.
  • The embryos are transparent making it easy to see
    internal organs during development.
  • Is well established as a resource for genetic
    studies.
  • The Sanger Institute is completing the genome
    sequence, which presently is 50 complete and
    publicly available.
  • More than 90 of the predicted human genes have
    a zebra fish ortholog.

41
Whole mount in situ hybridization
Alkaline phosphatase-conjugated anti-DIG antibody
BCIP NBT
DIG-labeled ssDNA or RNA probe
Digoxigenin label uridine
Wash
Wash
P
mRNA
1. Add digoxigenin-labeled probe complementary to
RNA of interest
2. Add alkaline phosphatase-conjugated antibody
that binds to digoxigenin
3. Add BCIP NBT that turns dark purple dye when
dephosphorylated by the alkaline phosphatase
thereby coloring the cell
BCIP 5 bromo-4-chloro-indoxyl phosphate NBT
nitro-blue-tetrazolium
42
Exon-specific ssDNA primers
Mermade synthesis of unique exon specific primers
of the gene of interest
These steps now have been automated in a 96 well
format
43
Ethidium bromide stained 1 agarose gel of dsPCR
off genomic DNA and subsequently unidirectional
amplified single stranded DNA probes
  • These studies clearly demonstrate that, contrary
    to popular belief, single stranded DNA contains
    regions that fold into sufficient double stranded
    secondary structures that ethidium bromide can
    bind.
  • However, agarose gel electrophoresis is labor
    intensive (slab gel preparation and loading),
    electrophoresis is time consuming, and detection
    typically requires the use of carcinogenic
    ethidium bromide

44
AMS-90 for ssPCR primer, dsPCR and single strand
unidirectional exon amplification
45
PCR and Unidirectional Single Primer
Amplification on the AMS-90
Both double and single stranded DNA rapidly can
be resolved, detected and archived on the AMS-90
46
Custom MerMade Synthesized 20-mer DNA Primers
Rapidly Analyzed on the AMS-90
Rapid, 30 seconds/lane run time vs over an
hour/sample via capillary electrophoresis, of
single stranded oligonucleotides
47
AMS-90 vs Ethidium Bromide Stained Agarose Gels
or Capillary Electrophoresis
  • Both can be used to resolve and view both double
    stranded and single stranded DNAs
  • However, analysis on the AMS-90 requires
  • minimal human interaction,
  • no separate photography,
  • much less technician time,
  • eliminates the use of carcinogenic ethidium
    bromide
  • is less error prone and
  • takes much less time.

48
  • Human hypothetical protein-KIAA0819
  • One gene with 11 exons on Hu Chr 22
  • This one gene is split into 2 genes
  • in zebra fish
  • ZF1 - Genomic location307,280-316,461 bp on
    Sanger Institute chromosome fragment ctg14067
  • With the first 4 exons
  • ZF2 - Genomic location107,344-119,287 on Sanger
    Institute chromosome fragment ctg11065
  • With the remaining 7 exons
  • Note 4 7 11

49
A multiPIP analysis of the predicted genes from
human, rat, mouse, fugu and zebra fish (ZF1 and
ZF2) with homology to cDNA probe KIAA0819
50
Orthologous duplicated copies of a single copy
human KIAA0819 gene in zebra fish
51
Whole mount in situ hybridization of ssDNA probes
for the ZF1 gene
Only antisense probe hybridization to the Otic
Placode
52
Expression of ZF1 Gene in the Otic Placode
53
Whole mount in situ hybridization of a ssDNA
probe unique to the ZF2 gene at 24 and 48 hpf
AntiSense probe
Sense probe
hindbrain
24 hpf
forebrain
Otic placode
hindbrain
Pectoral fin
48 hpf
Only antisense probe hybridization to the
hindbrain, forebrain, Otic Placode and pectoral
fin
54
(No Transcript)
55
  • Expression analysis show functional divergence
    after duplication in zf1and zf2
  • ZF1 is expressed only in the Otic Placode seen at
    24-120 hpf
  • ZF2 is expressed in the hindbrain, otic placode
    and the pectoral fin, with the expression in the
    otic placode differing from that of ZF1
  • It is highly likely that the one gene in humans
    is expressed in the developing ear, brain and
    involved in early limb development

56
Whole mount in situ hybridization of a ssDNA
probe for Human Gene NM_032775-ENSG00000185214
On Hu Chr 22 at positions 19,120,360 -
19,174,676 (no expression confirming ESTs)
Only antisense probe hybridization to the Otic
Placode and swim bladder
57
Summary of in situ hybridization studies
Gene Antisense probe Sense probe ESTs Dj508I15.c2
2.5 Brain - Phf5a-like gene Brain
- - KIAA0819-ZF1 Otic placode -
KIAA0819-ZF2 Hind brain, Otic placode, -
and pectoral fin NM_032775 Otic placode and
- - swim bladder DGCR8 Hind brain,
Hind brain and Branchial arches, pectoral
fin Heart, and pectoral fin AP000553.6 Notochor
d, liver Notochord - Hind brain, and Otic
placode
3 out of 7 predicted genes but with no previous
evidence for expression
58
Conclusions
  • It now is clear that there are large conserved
    sequence regions from evolutionary distant
    organisms ranging from humans to fish. If these
    regions are conserved, the function of the
    encoded genes also likely is conserved.
  • The zebra fish is an ideal system in which to
    investigate protein expression profiles for genes
    that are human orthologs.
  • All aspects of this work have been and will
    continue to be improved by automation.

59
Whats next for our Genome Center?
  • Participate in sequencing the mouse, chimp,
    baboon, lemur, bovine, dog, cat, chicken and
    zebra fish genomes concentrating on
  • Regions of high biological interest and
  • Regions orthologous to human chromosome 22
  • Sequence the Medicago truncatula (alfalfa) genome
    using a mapped BAC-based approach concentrating
    on coding regions
  • Continued sequencing of selected pathogenic
    bacteria
  • Investigate the function of the predicted genes
    with unknown function in the zebrafish system
    first by whole mount in situ and then expression
    knock down experiments with morpholino oligos,
    once robust, automated methods have been
    developed.

60
Laboratory Organization
Bruce Roe, PI
Research Teams
Limei Yang Angie Prescott Audra Wendt Mandi
Aycock
Doris Kupfer Julia Kim Sun So Graham Wiley
Ziyun Yao Steve Shaull Youngju Yoon
Jami Milam Sara Downard Ging Sobhraksha
ShaoPing Lin Honggui Jia Hongming Wu Baifang
Qin Peng Zhang
Fares Najar Chunmei Qu Keqin Wang
Shuling Li
Stephan Deschamps Shelly Oommen Christopher
Lau
Trang Do Anh Do Lily Fu Yang Ye Tessa Manning
Fu Ying Liping Zhou Ruihua Shi Junjie Wu
Pheobe Loh Sulan Qi Bart Ford
Lin Song Ying Ni Huarong Jiang
Axin Hua Weihong Xu Yanhong Li
Previous undergraduate res. student
Present undergraduate res. student Previous
graduate student Present graduate student
Funding from the NHGRI, Noble Foundation, DOE,
NSF - Collaborators at Sanger, CWRU, CHOP, Keio,
UIUC and Riken
61
The ACGT Team
62
Peggy and Charles Stephenson Center
63
(No Transcript)
About PowerShow.com