Evolutionary and genomic approaches to find gene regulatory sequences - PowerPoint PPT Presentation

About This Presentation
Title:

Evolutionary and genomic approaches to find gene regulatory sequences

Description:

SV40 promoters and enhancer. Properties of known regulatory regions ... In vivo occupancy by GATA-1 suggests other activities in addition to enhancers. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 68
Provided by: rossha6
Learn more at: http://www.bx.psu.edu
Category:

less

Transcript and Presenter's Notes

Title: Evolutionary and genomic approaches to find gene regulatory sequences


1
Evolutionary and genomic approaches to find gene
regulatory sequences
  • Penn State University, Center for Comparative
    Genomics and Bioinformatics Webb Miller,
    Francesca Chiaromonte, Anton Nekrutenko, Kateryna
    Makova, Stephan Schuster, Ross Hardison
  • University of California at Santa Cruz David
    Haussler, Jim Kent
  • Childrens Hospital of Philadelphia Mitch Weiss
  • NimbleGen Roland Green

University of Nebraska, Lincoln February 14. 2007
2
Major goals of comparative genomics
  • Identify all DNA sequences in a genome that are
    functional
  • Selection to preserve function
  • Adaptive selection
  • Determine the biological role of each functional
    sequence
  • Elucidate the evolutionary history of each type
    of sequence
  • Provide bioinformatic tools so that anyone can
    easily incorporate insights from comparative
    genomics into their research

3
Known types of gene regulatory regions
G.A. Maston, S.K. Evans, M.R. Green (2006) Ann.
Rev. Genomics Human Genetics 729-59.
4
Regulatory regions tend to be clusters of
transcription factor binding sites
Sequence-specific
SV40 promoters and enhancer
5
Properties of known regulatory regions
  • Binding sites for transcription factors, many
    with sequence specificity
  • Clusters of binding sites
  • Conventional promoters encompass major start
    sites for transcription
  • Conserved over evolutionary time???

6
Structures involved in transcription are probably
more complex
Middle image Green active transcription
(Br-UTP label) Red all nucleic acids HeLa
cell Sides EM spreads of transcripts
Peter R. Cook, Oxford University,
http//users.path.ox.ac.uk/pcook/images/Images.h
tml
7
Domain opening is associated with movement to
non-heterochromatic regions
Schubeler, Francastel, Cimbora, Reik, Martin,
Groudine (2000) Genes Dev. 14 940-950
8
Other possible activities for sequences involved
in gene regulation
  • Opening or closing a chromosomal domain
  • Move a gene to or away from a transcription
    factory
  • Control how long a gene is in a transcription
    factory
  • Long association
  • High level expression
  • Really long gene
  • Short association
  • Lower level expression
  • Rapid regulation
  • Are these conserved over evolutionary time?

9
3 modes of evolution
Sequence matches at longer phylogenetic distances
could reflect purifying selection Sequence
differences at closer phylogenetic distances
could reflect adaptive evolution.
10
Conservation vs. Constraint
  • Conserved sequences are those that align between
    two species thought to be descended from a common
    ancestor
  • Constrained sequences show evidence in their
    alignments of negative (purifying) selection
  • E.g. change at a rate significantly slower than
    neutral DNA

11
Ideal cases for interpretation
12
Messages about evolutionary approaches to
predicting regulatory regions
  • Regulatory regions are conserved, but not all to
    the same phylogenetic distance.
  • Incorporation of pattern and composition
    information along with with conservation can lead
    to effective discrimination of functional classes
    (regulatory potential).
  • Regulatory potential in combination with
    conservation of a GATA-1 binding motif is an
    effective predictor of enhancer activity.
  • In vivo occupancy by GATA-1 suggests other
    activities in addition to enhancers.
  • Comparison of polymorphism and divergence from
    closely related species can reveal regulatory
    regions that are under recent selection.

13
Finding all gene regulatory regions is a
challenge for comparative genomics
  • Known regulatory regions for the HBB complex
  • 23 total
  • 19 conserved (align) between human and mouse
  • Many others show no significant difference in a
    measure of constraint (phastCons) from the bulk
    or neutral DNA

14
Two extremes of constraint in TRRs
15
ENCODE projects
  • ENCODE (ENCyclopedia Of DNA Elements) consortium
    aiming to find function for all human DNA
    sequences
  • Phase I focused on 1 of human DNA
  • 30 Mb, 44 regions
  • About 10 regions had known genes of interest
    (CFTR, HOX)
  • Others were chosen to get a sampling of regions
    varying in gene density and alignability with
    mouse
  • Major areas
  • Genes and transcripts
  • Transcriptional regulation
  • Chromatin structure
  • Multiple sequence alignment
  • Variation in human populations

16
Biochemical assays for protein-binding sites in
DNA
Purified protein Naked DNA
Chromatin Immunoprecipitation DNA sites occupied
by a protein inside cells.
17
ChIP-on-chip to examine many sites
18
Putative transcriptional regulatory regions
pTRRs
  • Antibodies vs 10 sequence-specific factors
  • Sp1, Sp3, E2F1, E2F4, cMyc, STAT1, cJun, CEBPe,
    PU1, RA Receptor A
  • High resolution ChIP-chip platforms Affymetrix
    and NimbleGen
  • Data from several different labs in ENCODE
    consortium
  • High likelihood hits for ChIP-chip
  • 5 false discovery rate
  • Supported by chromatin modification data
  • Modified histones in chromatin H4Ac, H3Ac,
    H3K4me, H3K4me2, H3K4me3, etc.
  • DNase hypersensitive sites (DHSs) or nucleosome
    depleted sites
  • Result set of 1369 pTRRs

19
A small fraction of cis-regulatory modules are
conserved from human to chicken
  • About 4 of pTRRs, 4 of DNase HSs, 4-7 of
    promoters active in multiple cell lines
  • Tend to regulate genes whose products control
    transcription and development

Millions of years
91
173
310
450
David King
20
Most pTRRs are conserved in eutherian mammals
Percentage of class that align no further than
pTRRs
DNase HSs
Promoters
Primates 3
11
1-13
Millions of years
91
Eutherians 71
70
63
173
310
Marsupials 21
14
16-28
450
Tetrapods 4
4
4-7
Vertebrates 1
1
2-4
Within aligned noncoding DNA of eutherians, need
to distinguish constrained DNA (purifying
selection) from neutral DNA.
21
Measures of conservation and constraint capture
only a subset of pTRRs
Fraction overlapping an MCS
phastCons (background rate corrected)
Composite alignability (background rate
corrected)
Aligns, but no inference about purifying selection
Allows a range of constraint
Stringent constraint
22
Different measures perform better on specific
functional regions
Sensitivity
1-Specificity
23
Examples of clade-specific pTRRs
24
Messages about evolutionary approaches to
predicting regulatory regions
  • Regulatory regions are conserved, but not all to
    the same phylogenetic distance.
  • Incorporation of pattern and composition
    information along with with conservation can lead
    to effective discrimination of functional classes
    (regulatory potential).
  • Regulatory potential in combination with
    conservation of a GATA-1 binding motif is an
    effective predictor of enhancer activity.
  • In vivo occupancy by GATA-1 suggests other
    activities in addition to enhancers.
  • Comparison of polymorphism and divergence from
    closely related species can reveal regulatory
    regions that are under recent selection.

25
Regulatory potential (RP) to distinguish
functional classes
26
Good performance of ESPERR for gene regulatory
regions (RP)
-
Francesca Chiaromonte
James Taylor
27
Messages about evolutionary approaches to
predicting regulatory regions
  • Regulatory regions are conserved, but not all to
    the same phylogenetic distance.
  • Incorporation of pattern and composition
    information along with with conservation can lead
    to effective discrimination of functional classes
    (regulatory potential).
  • Regulatory potential in combination with
    conservation of a GATA-1 binding motif is an
    effective predictor of enhancer activity.
  • In vivo occupancy by GATA-1 suggests other
    activities in addition to enhancers.
  • Comparison of polymorphism and divergence from
    closely related species can reveal regulatory
    regions that are under recent selection.

28
Conservation of predicted binding sites for
transcription factors
Binding site for GATA-1
29
Genes Co-expressed in Late Erythroid Maturation
G1E-ER cells proerythroblast line lacking the
transcription factor GATA-1. Can rescue by
expressing an estrogen-responsive form of
GATA-1 Rylski et al., Mol Cell Biol. 2003
30
Predicted cis-Regulatory Modules (preCRMs) Around
Erythroid Genes
BYong Cheng, Ross, Yuepin Zhou, David
King FYing Zhang, Joel Martin, Christine Dorman,
Hao Wang
31
preCRMs with conserved consensus GATA-1 BS tend
to be active on transfected plasmids
32
preCRMs with conserved consensus GATA-1 BS tend
to be active after integration into a chromosome
33
Examples of validated preCRMs
34
Correlation of Enhancer Activity with RP Score
35
Validation status for 99 tested fragments
36
preCRMs with High RP and Conserved Consensus
GATA-1 Tend To Be Validated
37
CACC box helps distinguish validated from
nonvalidated preCRMs
Ying Zhang
38
Messages about evolutionary approaches to
predicting regulatory regions
  • Regulatory regions are conserved, but not all to
    the same phylogenetic distance.
  • Incorporation of pattern and composition
    information along with with conservation can lead
    to effective discrimination of functional classes
    (regulatory potential).
  • Regulatory potential in combination with
    conservation of a GATA-1 binding motif is an
    effective predictor of enhancer activity.
  • In vivo occupancy by GATA-1 suggests other
    activities in addition to enhancers.
  • Comparison of polymorphism and divergence from
    closely related species can reveal regulatory
    regions that are under recent selection.

39
preCRMs with conserved consensus GATA-1 binding
sites are usually occupied by that protein ChIP
assay
40
Design of ChIP-chip for occupancy by GATA-1
  • Non-overlapping tiling array with 50bp probe and
    100bp resolution (NimbleGen)
  • Cover range
  • Mouse chr757225996-123812258 (70Mbp)
  • 3. Antibody against the ER portion of
    GATA-1-ER protein in rescued G1E-ER4 cells

Yong Cheng, with Mitch Weiss Lou Dore (CHoP),
Roland Green (NimbleGen)
41
Signals in known occupied sites in Hbb LCR
HS1
HS2
HS3
1) Cluster of high signals 2) hill shape of the
signals
42
Peak Finding Programs
  • TAMALPAIS
  • Mark Bieda from Peggy Farmhams lab
  • Focus more on the cluster of the signals
  • 4 thresholds based on number of consecutive
    probes with signals in the 98th or 95th
    percentiles
  • MPEAK
  • Bing Rens lab
  • Focus more one the hill shape of the signal
  • 4 thresholds, for a series of probes with at
    least one that is 3, 2.5, 2 or 1 standard
    deviations above the mean

43
ChIP-chip hits for GATA-1 occupancy
Technical replicates of ChIP-chip with antibody
against GATA1-ER
Mpeak
TAMALPAIS
275 hits in both
276 hits in both
216
60
59
321 total ChIP-chip hits
44
ChIP-chip hits validate at a high rate
Validation determined by quantitative PCR. 19 of
the 321 hits were tested. 13 (70) were
validated.
ChIP DNA
Validation rate is similar at different thresholds
9 regions were hits in only one of the two
technical replicates. None were validated.
45
Association of WGATAR and conservation with
ChIP-chip Hits
  • 249 out of the 321 (78) have WGATAR motifs,
    binding site for GATA-1
  • Of the GATA-1 binding motifs in those 249 hits,
    112 (45) are conserved between mouse and at
    least one non-rodent species.

46
Expected and unexpected ChIP-chip hits
47
Distribution of ChIP-chip hits on 70Mb of mouse
chr7
Yong Cheng, Yuepin Zhou and Christine Dorman
48
Almost half the GATA-1 ChIP-chip hits increase
expression of a transgene, K562 cells
15
6
6
No GATA-1
GATA-1 occupied sites by ChIP-chip
24 validated out of 56 fragments with ChIP-chip
hits tested 43
49
Conserved and nonconserved ChIP-chip hits can be
active as enhancers
50
Messages about evolutionary approaches to
predicting regulatory regions
  • Regulatory regions are conserved, but not all to
    the same phylogenetic distance.
  • Incorporation of pattern and composition
    information along with with conservation can lead
    to effective discrimination of functional classes
    (regulatory potential).
  • Regulatory potential in combination with
    conservation of a GATA-1 binding motif is an
    effective predictor of enhancer activity.
  • In vivo occupancy by GATA-1 suggests other
    activities in addition to enhancers.
  • Comparison of polymorphism and divergence from
    closely related species can reveal regulatory
    regions that are under recent selection.

51
Polymorphism as a transient phase of evolution
Slide from Dr. Hiroshi Akashi
52
Test of neutrality using polymorphism and
divergence data
53
Test for recent selection in human noncoding DNA
  • McDonald-Kreitman test
  • Use ancestral repeats as neutral model (MKAR
    test)
  • Count polymorphisms in human using dbSNP126
  • Count divergence of human from
  • Chimpanzee (great Ape, diverged from human
    lineage 6 Myr ago)
  • Rhesus macaque (Old World Monkey, diverged from
    human lineage 23 Myr ago)
  • Tiled windows, most analysis on 10kb windows
  • Compute p-value for neutrality by chi-square test
  • Ratio of polymorphism to divergence ratios gives
    indication of direction of inferred selection

Heather Lawson, Anthropology, PSU
54
pTRR apparently under positive selection
55
A promoter distal to the beta-like globin genes
has a signal for recent purifying selection
56
Selection on a primate-specific promoter
57
The distal promoter is close to the locus control
region for beta-globin genes
58
Messages about evolutionary approaches to
predicting regulatory regions
  • Regulatory regions are conserved, but not all to
    the same phylogenetic distance.
  • Incorporation of pattern and composition
    information along with with conservation can lead
    to effective discrimination of functional classes
    (regulatory potential).
  • Regulatory potential in combination with
    conservation of a GATA-1 binding motif is an
    effective predictor of enhancer activity.
  • In vivo occupancy by GATA-1 suggests other
    activities in addition to enhancers.
  • Comparison of polymorphism and divergence from
    closely related species can reveal regulatory
    regions that are under recent selection.

59
Many thanks
PSU Database crew Belinda Giardine, Cathy
Riemer, Yi Zhang, Anton Nekrutenko
BYong Cheng, Ross, Yuepin Zhou, David
King FYing Zhang, Joel Martin, Christine Dorman,
Hao Wang
RP scores and other bioinformatic
input Francesca Chiaromonte, James Taylor, Shan
Yang, Diana Kolbe, Laura Elnitski
Alignments, chains, nets, browsers, ideas, Webb
Miller, Jim Kent, David Haussler
Funding from NIDDK, NHGRI, Huck Institutes of
Life Sciences at PSU
60
Computing Regulatory Potential (RP)
Alignment seq1 G T A C C T A C T A C G C A
seq2 G T G T C G - - A G C C C A
seq3 A T G T C A - - A A T G T A
Collapsed alphabet 1 2 1 3 4 5 7 7 6 8 3 6 3 9
  • A 3-way alignment has 124 types of columns.
    Collapse these to a smaller alphabet with
    characters s (for example, 1-9).
  • Train two order t Markov models for the
    probability that t alignment columns are followed
    by a particular column in training sets
  • positive (alignments in known regulatory regions)
  • negative (alignments in ancestral repeats, a
    model for neutral DNA)
  • E.g. Frequency that 3 4 is followed by 5
  • 0.001 in regulatory regions
  • 0.0001 in ancestral repeats
  • RP of any 3-way alignment is the sum of the log
    likelihood ratios of finding the strings of
    alignment characters in known regulatory regions
    vs. ancestral repeats.

61
Stage 1 Reduced representations
ESPERR Evolutionary Sequence and Pattern
Extraction using Reduced Representations
62
Stage 2 Improve encoding
63
Train models for classification
64
Categories of Tested DNA Segments
65
Example that suggests turnover
GATA-1 BSs
66
Additional methods find CACC box as distinctive
for validation
All validated preCRMs
All nonvalidated preCRMs
CLOVER (Zlab)
Hexamer Counting
Background
ELPH (UMaryland)
EKLF PWM (Dr. Perkins)
Mouse chr 19 (42.8 CG) - NCBI Build 30
67
Using Galaxy to find predicted CRMs
Write a Comment
User Comments (0)
About PowerShow.com