Gene Structure and Identification III - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Gene Structure and Identification III

Description:

Promoter/Enhancer analysis. Regulatory Sequences. Known Consensus Sequences ... Larger promoters, distant enhancers, regulatory sites in introns. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 28
Provided by: csta2
Category:

less

Transcript and Presenter's Notes

Title: Gene Structure and Identification III


1
Gene Structure and Identification III
BIO520 Bioinformatics Jim Lund
2
For real prediction we need
  • Solve the protein folding problem
  • Solve the molecular docking/binding problem
  • Develop realistic simulations of molecules in
    cells
  • Simulate multicellular systems

3
Promoter/Enhancer analysis
  • Regulatory Sequences
  • Known Consensus Sequences
  • Consensus Sequence Generation
  • Using functional (experimental) Data
  • Real examples

4
Gene Regulatory Sequences
  • Functional sites
  • Consensus
  • Experimental tests
  • Inferred sites
  • Transcriptome analysis

5
Sequnce Logos
  • http//weblogo.berkeley.edu/

6
(No Transcript)
7
Position Weight Matrix
  • PO A C G T
  • 01 6 4 4 6 N
  • 02 4 9 3 4 N
  • 03 12 4 3 1 A
  • 04 6 1 11 2 R
  • 05 3 2 11 4 G
  • 06 3 3 4 10 N
  • 07 3 10 3 4 N
  • 08 11 2 4 3 A
  • 09 4 9 3 4 N
  • 10 3 6 3 8 N

8
EUKARYOTES
  • More complex signals
  • Basal/core promoter
  • Promoter
  • Enhancers
  • More genes
  • More dispersed signals
  • Larger promoters, distant enhancers, regulatory
    sites in introns.
  • Combinatoric regulation common

9
Basal Promoter Analysis
Myers and Maniatis, Genes VI, 831
  • TATA-box -25 to -30 TBP
  • CCAAT-box -212 to -57 CTF/NF1
  • GC-box -164 to 1 SP1
  • K C W K Y Y Y Y 1 to 5 cap signal

1
10
Finding PolII sites (transcription start site)
  • Promoter Scan
  • TSSG/TSSW (TSSP for plants)
  • Core-Promoter
  • FPROM
  • BCM Search Launcher

11
Enhancer Elements
  • Octamer OCT1, OCT2
  • ?B NF ?B
  • ATF ATF
  • AP1 AP1
  • ..

False , False -
12
Consensus Sequence Databases
  • TRANSFAC
  • TFD (transcription factor database)

13
Consensus Sequence Databases
  • Finding sites in promoter regions
  • TESS
  • http//www.cbil.upenn.edu/cgi-bin/tess/tess
  • TFSEARCH
  • http//www.cbrc.jp/research/db/TFSEARCH.html
  • BCM Search Launcher
  • http//searchlauncher.bcm.tmc.edu/seq-search/gene-
    search.html

14
HBB promoter (TESS)
15
Sequence-based algorithms
  • Genes from
  • Microarray transcription analysis
  • ChIPchip experiments
  • Orthologous sequences
  • Experimental/other
  • Programs for finding consensus sites
  • MEME analysis of clusters
  • AlignAce
  • BioProspector/CompareProspector

16
Practical Gene Finding
  • Use ALL tools
  • Predictive Stitch together a consensus
  • ORF finders
  • Find patterns (and WWW pattern searches)
  • HMM GRAIL, Genscan
  • Comparative
  • BLASTN, BLASTX
  • Compare genomes (humanmouse)
  • cDNA, protein, genetic evidence

17
ORFs-aldolase gene
18
Genomic DNA-cDNA alignment
P
DNA sequencing
Align (GAP)
cDNA
19
Comparative Genomics
  • Conservation of coding regions
  • Identification of transcription signals
  • words in common
  • Example-yeast comparisons

20
Ensembl prediction pipeline
DNA
RepeatMasker
Genscan
Pmatch all human Proteins and cdnas
Blast genscan peptides v Protein,unigene,est,vert
mrna
MiniGenewise MiniEst2genome
Genes
21
(No Transcript)
22
(No Transcript)
23
Genscan features
  • Model both strands at once
  • Each state may output a string of symbols
    (according to some probability distribution).
  • Explicit intron/exon length modeling
  • Advanced splice site modeling
  • Complete intron/exon annotation for sequence
  • Able to predict multiple genes and partial/whole
    genes
  • Parameters learned from annotated genes
  • Separate parameter training for different CpG
    content groups (lt 43, 43-51, 51-57,gt57 CG
    content)

24
GENSCAN predictions
  • Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T
    CodRg P.... Tscr..
  • ----- ---- - ------ ------ ---- -- -- ---- ----
    ----- ----- ------
  • 7.00 Prom 63096 63135 40
    -2.75
  • 7.01 Init 63183 63274 92 2 2 103 77
    142 0.997 14.61
  • 7.02 Intr 63403 63625 223 1 1 83 96
    181 0.999 15.61
  • 7.03 Term 64524 64652 129 2 0 101 50
    83 0.373 3.00
  • 7.04 PlyA 64758 64763 6
    1.05
  • 8.00 Prom 70508 70547 40
    -4.75
  • 8.01 Init 70595 70686 92 1 2 103 77
    133 0.990 13.71
  • 8.02 Intr 70817 71039 223 2 1 100 96
    217 0.999 20.91
  • 8.03 Term 71890 72018 129 0 0 116 43
    119 0.827 7.40
  • 8.04 PlyA 72126 72131 6
    1.05
  • 9.00 Prom 74399 74438 40
    -8.25
  • 9.01 Sngl 76602 76847 246 2 0 71 50
    218 0.886 11.13
  • 9.02 PlyA 76928 76933 6
    1.05

25
GENSCAN predicted exons
26
Annotated predicted exons
27
HBB gene
  • HBB exons 1-3
  • 70545..70686
  • 70817..71039
  • 71890..72150
  • GENSCAN
  • 70595 70686
  • 70817 71039
  • 71890 72018
Write a Comment
User Comments (0)
About PowerShow.com