Genome Sequence Acquisition and Analysis - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Genome Sequence Acquisition and Analysis

Description:

... from the Human Genome Draft Sequences? Goals ... Human Genome Project (HGP) ... GeneCard: Database of annotated genes in human genome. ORFs and Translation ... – PowerPoint PPT presentation

Number of Views:217
Avg rating:3.0/5.0
Slides: 43
Provided by: bidlabLif
Category:

less

Transcript and Presenter's Notes

Title: Genome Sequence Acquisition and Analysis


1
Genome Sequence Acquisition and Analysis
2
Outline
  • Defining Genomes
  • What Have We Learned from the Human Genome Draft
    Sequences?

3
Goals for this Chapter 1
  • 1.1 Defining Genomes
  • Define the filed of genomics
  • Learn how genomes are sequenced
  • Understand the utility of short DNA segments
  • Utilize online tools to analyze genome sequences

4
What Is Genomics?
  • Genome
  • The total DNA content of a haploid cell or half
    the DNA content of a diploid cell.
  • Genomics
  • Genomics involves large data sets (about 3
    billion base pairs for the human genome) and
    high-throughput methods (fast methods for
    collecting the data)
  • Sequencing DNA and collecting genome variations
    with a population as well as transcription
    control of genes.
  • From DNA sequence analysis to an organisms
    response to environmental perturbations.

5
Proteomics
  • Proteome
  • The complete protein content of a cell/organism
    at a given moment.
  • Proteomics (-omic)
  • Other terms
  • Transcriptome
  • Metabolome

6
How Are Whole Genomes Sequenced?
  • Sanger method (dideoxy method)
  • Dr. Fred Sanger
  • Polymerase chain reaction (PCR)

7
Figure 1.1
8
Figure 1.2
9
What Is An E-value?
  • A BLASTn search returns hits, sequences that
    produce significant alignments to the query
    sequence.
  • The significance of a hit is measured by its
    E-value, or expect value.
  • Biological significance hits will tend to have
    E-values much less than 1.0.
  • The larger the E-value, the greater the chance
    that the similarity between the hit and the query
    is due to more coincidence.

10
(No Transcript)
11
(No Transcript)
12
Why Do the Databases Contain So Many Partial
Sequences?
  • Human Genome Project (HGP)
  • By comparing different genomes, we should be able
    to better understand genomes in general, and the
    human genome in particular.
  • Sequence-tagged Sites (STSs)
  • The short segments of unique DNA sequence along
    every chromosome.
  • Bacterial Artificial Chromosome (BACs)
  • 150 Kb
  • Yeast Artificial Chromosome (YACs)
  • 150 Kb 1.5 Mb

13
Figure 1.3
14
Expressed Sequence Tags (ESTs)
  • The short segments of cDNA
  • Used to identify genes
  • ESTs hints at the size of the genome and
    alternative ways to splice mRNA.
  • ESTs is helpful for labs interested in cloning
    particular genes.
  • Related database
  • dbEST,
  • UniGene,
  • HomoloGene.

15
Institutes
  • National Institutes of Health (NIH)
  • The Institute for Genomics Research (TIGR)

16
How Do We Make Sense of All These Bases?
  • Online Mendelian Inheritance in Man (OMIM)
  • A comprehensive list of everything known about
    human biology and diseases.
  • GeneCard Database of annotated genes in human
    genome.
  • ORFs and Translation
  • Open reading frames (ORFs, pronounced orphan)
  • Recorded as accession number.
  • Coding Sequence (CDS)

17
Can We Predict Protein Functions?
  • Kyte-Doolittle plot (hydropathy plot)
  • To predict whether a protein is an internal
    membrane protein or not.
  • The 3D shape of a protein is probably its most
    important characteristics.
  • Conserved Domain (CD)

18
3D Structures
  • Protein Data Banks (PDB)
  • Entrez Structure

19
Structure-Function Relationships
  • Gene Ontology (GO)
  • Biological process is the why the overall
    objective toward which this protein contributes.
  • Molecular function is the what the biochemical
    activity the protein accomplishes.
  • Cellular component is the where the location of
    protein activity.
  • Gene Ontologys unification of protein roles will
    help us communicate more effectively as we
    determine how genomes produce multifunctional
    cells.

20
How Well Are Genes Conserved in Diverse Species?
  • Clusters of Orthologous Groups (COGs)
  • Enzyme Commission (EC) numbers
  • Swiss-Prot
  • Phylogenetic tress
  • Paralogs
  • The genes arose from a common ancestral gene
    within one species
  • Orthologs
  • The same gene in two organisms evolved from a
    common ancestral gene in another species.
  • Synteny
  • Genetic loci located on the same chromosome
    within a species, even if they ware separated by
    a great distance.
  • Homology
  • Two sequences were described as homologous if
    their sequences were similar because of a common
    evolutionary origin.

21
How Do You Know Which Bases Form a Gene?
  • Intergenic sequence
  • Eukaryotic genes may contain introns as well as
    the coding exons.
  • The ORFs are called pseudogenes since mutation
    has rendered them nonfunctional.

22
Gene Expression Process
(???)
23
The Gene Structure
Genes
Coding Regions
Upstreams
Downstreams
DNA-binding
Transcription Factors
5-AGCAATAGG-3 3-TCGTTATCC-5
Binding Sites
24
Transcription Factors (TFs)
  • DNA-binding proteins
  • Recognize specific sites (sequences) gt binding
    sites.
  • Transcription factors activate transcription
    initiation by RNA Polymerase II or III.
  • Regulating the gene transcription process.

Binding Site
A Transcription Factor
25
Some TF Binding Sites
26
Intron and Exon
27
RNA Splicing (Pre-mRNA -gt Mature mRNA)
Gene
5
3
DNA
Transcription
Exon
Intron
Transcription
pre-mRNA
Splicing
RNA
mRNA
Translation
Translation
Protein
protein
28
Splice Sites
Exon
Exon
5
3
Intron
?
?
?
Pre-mRNA
Cut
Cut
PyPyPyPyPyNCAG?
AG?GTAGGT
Donor site (5 splice site)
Acceptor site (3 splice site)
29
How Many Proteins Can One Gene Make?
30
Goals for this Chapter 2
  • 1.2 What Have We Learned from the Human Genome
    Draft Sequences?
  • Survey human genome
  • Verify genome annotations with online tools
  • Recognize alternative forms of genes
  • Explore epigenetic regulation of genome function

31
Overview of Human Genome First Draft
  • Published on 15 Feb. 2001.
  • Humans have approximately 35,000 genes.
  • Draft sequence means the DNA was sequenced on
    average four times, with finished sequence having
    eightfold coverage and errors estimated to be one
    in 10,000 bp.

32
Figure 1.5
CpG dinucleotides from CpG island and the
cytosine base is often methylated.
33
UTR
34
Figure 1.6
35
3-Dimensional
Gene Regulation Mechanism
TF 1
TF 2
Cooperation (Protein-Protein Interaction)
Correlation of Site Occurrences
Binding Site 2
Binding Site 1
Transcription start direction (??????)
The graphic is from Dr. Thomas Werners tutorial
in ISMB2000.
36
Figure 1.7
  • Humans have more cells with specialized functions
    than yeast, or worms.
  • Human needs to regulate gene expression very
    carefully.
  • Proteome complexity is regulated by gene
    expression.

37
Figure 1.8a
38
Figure 1.8b
39
When Are the Data Sufficient?
  • A Gene Is a Gene Is a Gene Short of
  • Every Gene Has a Promoter
  • Other Than rRNA and tRNA, All Genes Produce
    Proteins

40
Can the Genome Alter Gene Expression Without
Changing the DNA Sequence?
  • Imprinting, a process mammals use to mark a small
    set of genes during gametogenesis so that only
    the paternal (??) or maternal (??) copy will be
    transcribed and the other allele at the same
    locus will remain silent.
  • As of 2002, there are more than 20 mammalian
    genes are known to be imprinted.

41
What Is the Fifth Base in DNA?
  • Methyl-Cytosine
  • Dnmt1p is an enzyme that adds a methyl (CH3)
    group onto hundreds of thousands but not all
    cytosines throughout the genome.
  • Methylome is a significant component of gene
    regulation and thus the proteome.
  • Direct methylation may block the transcription of
    some genes.

42
Figure 1.11
Write a Comment
User Comments (0)
About PowerShow.com