Human Genome Project - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Human Genome Project

Description:

On the other hand, consider the size for the genome: 3 x 109 bp. ... Human and mouse (or hamster) cultured cells can be fused together using polyethylene glycol. ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 13
Provided by: t80
Category:

less

Transcript and Presenter's Notes

Title: Human Genome Project


1
Human Genome Project
2
Basic Strategy
  • How to determine the sequence of the roughly 3
    billion base pairs of the human genome. Started
    in 1995.
  • Various side projects genetic diseases,
    variations between individuals, ethnic variation,
    comparison to other species.
  • Strategy
  • 1. physical map relating specific DNA markers to
    the proper chromosomal position.
  • 2. Overlapping set of cloned DNAs (contigs)
  • 3. sequencing and assembly
  • 4. finding the genes in the sequence
  • 5. annotation of gene function

3
Physical Maps
  • A genetic map uses recombination, crossing over
    during meiosis, to determine how frequently two
    genes (or markers) are inherited together.
  • A physical map determines where a given DNA
    marker is located on the DNA of the chromosome.
  • Genetic and physical maps are (supposed to be)
    colinearall the genes appear in the same order
    in both maps. But, distances are quite
    different there is very little recombination in
    the centromeres, so large DNA distances are very
    short recombination distances.
  • Genetic maps using microsatellite (SSR) markers
    were used to develop physical maps the
    appropriate SSR sites were expected to be found
    on the corresponding cloned DNA.

4
Sequence Tagged Sites
  • a sequence tagged site (STS) is a short sequence
    that is unique in the genome.
  • You obtain the sequence information from cloned
    DNA, and then locate it in the genome.
  • Using PCR it is then possible to determine
    whether your STS is present in any other clone or
    cell line.
  • Obtaining STS sequencing the ends of large
    cloned DNAs (BACs or YACs, for example).
  • Uniqueness use the cloned DNA from the STS as a
    probe on a Southern blot of genomic DNA if the
    STS is unique, only 1 band will hybridize.
  • Repetitive DNA is very common in the human
    genome, and many DNA sequences are not unique.
  • A good source of unique DNA is EST clones cDNA
    made from messenger RNA.
  • Size a DNA sequencing run will usually give
    500-600 bp of good, reliable sequence
    information. On the other hand, consider the
    size for the genome 3 x 109 bp. Each base is
    one of 4 choices, so a 16 bp sequence will appear
    about once in 4.3 x 109 bp. In practice, 20 bp
    is about the minimum size for good PCR
    amplification, and 24 bp is about the minimum
    that will give a good BLAST hit.

5
Somatic Cell Hybrids
  • Human and mouse (or hamster) cultured cells can
    be fused together using polyethylene glycol.
  • The resulting fused cell is a heterokaryon it
    has 2 nuclei from different species.
  • If the heterokaryon undergoes mitosis, the nuclei
    fuse.
  • Human chromosomes are unstable in a mixed
    nucleus, and most of them are randomly lost. The
    mouse chromosomes all stay.
  • Different cell lines can be established that
    contain different combinations of human
    chromosomes
  • You can identify which human chromosomes remain
    using chromosome banding techniques.
  • A good way to determine which chromosome a DNA
    sequence is on. Sometimes also for gene products
    or phenotypes.

6
Radiation Hybrids
  • Standard somatic cell fusions contain entire
    human chromosomes. To locate a gene more
    closely, you need to use chromosome fragments.
  • Start by irradiating human cells with a
    controlled dose of X-rays chromosomes break up.
    Then, fuse the cells to mouse cells. The human
    chromosome fragments get integrated into the
    mouse chromosomes.
  • Create a panel of mouse/human hybrid cell lines.
  • The current standard panels contain about 100
    cell lines.
  • Each line contains about 32 of the human genome
  • Average size of human genome fragment 25 kbp
  • More radiation smaller fragments
  • Mapping the hybrid cell lines contain random
    human chromosome fragments, but closely linked
    sites are usually in the same cell line (same
    basic principle as recombination mapping).
  • Until you have located some of the markers on the
    chromosomes, radiation hybrid mapping only gives
    you information about whether any two sequences
    are close together on the chromosome.

7
Contigs
  • A contig is a set of partially overlapping
    clones, a contiguous set of clones. No gaps
    between them.
  • Contigs allow you to build up the sequence of the
    chromosome over much larger regions than any
    single clone.
  • The first reasonably complete physical map of the
    human genome involved contigs generated by YACs
    (yeast artificial chromosomes).
  • Initially, you have a collection of clones with
    no information about how they are ordered on the
    chromosome.
  • Contigs are built up by using PCR to identify
    unique sequences (STS or EST) on each clone, and
    then looking for overlaps between the clones.

8
Sequencing Strategy
  • Once a contig map of the genome was obtained, it
    was necessary to sequence each individual clone.
  • Most of the actual human genome sequencing was
    done on BAC clones, which are less prone to
    rearrangement than YAC clones. BACs are about
    100-200 kbp long.
  • Large clones are generally sequenced by shotgun
    sequencing The large cloned DNA is randomly
    broken up into a series of small fragments ( less
    than 1 kb). These fragments are cloned and
    sequenced. A computer program then assembles
    them based on overlaps between the sequences of
    each clone.
  • To ensure that every bit has been covered, you
    need to sequence random clones until you have
    covered each spot 5-10 times on average.

9
Whole Genome Shotgun Sequencing
  • Why bother with creating a large scale physical
    map all that YAC and BAC cloning, radiation
    hybrids, STS comparisons, etc? Why not just
    fragment the whole genome into 1 kb pieces,
    sequence them all, and let the computer assemble
    the whole genome?
  • In practice, the genome is cloned into large
    fragments first, and then each large fragment is
    broken up for shotgun sequencing. But, the large
    fragments are not ordered no physical map or set
    of contigs is created.
  • Requires a lot of overlapping coverage
  • Also requires good software.
  • Very successful for prokaryotic genomes (10 Mbp
    or less).
  • but the human genome is 300 times larger
  • Big problem repeat sequence DNA, which is
    everywhere, and especially near the centromere.
    To find overlaps between clones, you need unique
    regions.
  • It remains unclear whether whole genome shotgun
    sequencing will work if there is no other
    information available to provide order. It has
    not been widely adopted for eukaryotic projects
    (so far).

10
Gene Detection
  • the best evidence that a given DNA sequence is
    expressed is to find an EST (cDNA copy of mRNA)
    that matches it. Large numbers of EST libraries
    have been constructed and sequenced.
  • The primary result of this was to determine that
    many genes have several different intron slicing
    patterns sequences are exons in some tissues but
    introns in others.
  • Homology searches, using BLAST, are a good way to
    find genes. If a DNA sequence closely matches a
    sequence from another organism, it has been
    evolutionarily conserved, and that usually means
    that it is an expressed gene.
  • Exon prediction exons need to be open reading
    frames (no stop codons), and they display
    patterns of nucleotide usage different from
    random DNA. Several different programs exist,
    and they give somewhat varying results.
    Hypothetical genes are genes whose existence
    has been predicted by computer but which lacks
    any experimental or cross-species data to confirm
    it.
  • a conserved hypothetical gene is a sequence
    that matches other species even though there is
    no EST or other experimental evidence for its
    expression

11
Gene Annotation
  • Computer predictions of gene function are
    mediocre at best. Humans, especially those who
    are experts in the field, do a much better job of
    evaluating evidence and deciding what a given
    genes function is.
  • There is a big problem of too much information
    not uniformly coded or maintained. The
    scientific literature contains numerous examples
    of the same gene or protein with several
    different names, and getting common definitions
    of functions is even harder.
  • To counter this, the Gene Ontology Consortium
    (GO) has created a controlled vocabulary of about
    11,000 terms.
  • Every gene product (protein) can be annotated
    into three general categories
  • molecular function what the protein actually
    does, such as kinase activity
  • biological process what cellular process the
    protein participates in, such as signal
    transduction
  • cellular component where the protein is found in
    the cell, such as integral to the plasma
    membrane
  • Each gene product can have multiple descriptive
    terms.
  • The terms are hierarchical more specific terms
    are contained within less specific terms.
  • But, a given term can have more than one parent
    and more than one child term.

12
GO Example
Write a Comment
User Comments (0)
About PowerShow.com