Deciphering Gene Regulatory Networks by in silico approaches PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Deciphering Gene Regulatory Networks by in silico approaches


1
Deciphering Gene Regulatory Networks by in silico
approaches
Sridhar Hannenhalli Penn Center for
Bioinformatics Department of Genetics University
of Pennsylvania
2
Transcriptional Regulation
Transcription Start Site
Interactions and Modules
TF-DNA binding
3
Overview
Core promoter prediction TF-DNA binding TF-TF
interactions Transcriptional Modules Application
s
4
Overview
Identification Representation Discovery
(motif-discovery) Search Ambiguity/Redundancy
Core promoter prediction TF-DNA binding TF-TF
interactions Transcriptional Modules Application
s
5
Binding site identification
SELEX
ATACGGT ATACCGT ATCGGCA AAAGGCT
CONSENSUS A T A S G S T
ChIP-chip
Deletion/Mutation
Specificity
WEIGHT MATRIX 1.2 0.0 0.96 -1.6 -1.6 -1.6
0.0 -1.6 -1.6 0.0 0.59 0.0 0.59 -1.6 -1.6
-1.6 -1.6 0.59 0.96 0.59 -1.6 -1.6 0.96
-1.6 -1.6 -1.6 -1.6 0.96
6
Binding site search
  • TFs often bind to short and degenerate DNA
    sequences, leading to false positives
  • Evolutionary conservation (phylogenetic
    footprinting/shadowing) can help reduce the false
    positives
  • About half of the functional binding sites are
    not conserved
  • A combination of evolutionary conservation and
    binding site score can detects 70 of the
    experimentally verified binding sites at a False
    Positive rate of 1/50kb per PWM (Levy and
    Hannenhalli, Mammalian Genome, 2002)

TRANSFAC/JASPAR PWM
Human genome
Multi-species conservation
7
Non-Independence of binding site positions
  • Bacteriophage Mnt prefers binding to C, instead
    of wild-type A, at position 16 when wild-type C
    at position 17 is changed to other bases. (Man
    and Stormo, 2001, NAR)
  • Barash, Elidan, Freidman, Kaplan, 2003, RECOMB
  • Osada, Zaslavsky and Singh, 2004, Bioinformatics

8
Binding site representation
ATACGGT ATACCGT CGCGGCA CGAGCCT
WEIGHT MATRIX 1.2 0.0 0.96 -1.6 -1.6 -1.6
0.0 -1.6 -1.6 0.0 0.59 0.0 0.59 -1.6 -1.6
-1.6 -1.6 0.59 0.96 0.59 -1.6 -1.6 0.96
-1.6 -1.6 -1.6 -1.6 0.96
Assumption of positional independence
ATACGGT ATACCGT CGCGGCA CGAGCCT
A PSPA or Variable length Markov Model of binding
sites is superior to the PWM model
  • For 95 JASPAR PWMs, PSPAM is better in 48 cases
    and worse in 6 cases at significant level of 0.05.

9
Conservation patterns in cis-elements reveal
inter-position dependence
Human .ACCGTGT.ACCTTCT.. Chimp .AGCGT
GT.ACCTTGT.. Mouse .TCGGTGA.TGCTTCT
.. Rat .CCCGTGA.AGCTTGT.. Dog .TCGG
TCT.ACCCTCT..
G G G G C
C C G C G
10
3
N (binding sites)
2
1
X
Y
X
Y
X
Y
X
Y
Pr(X) probability of X using standard tree
Markov process Pr(XY) probability of X
dependent on corresponding Y branches
Compensatory Mutation SXY fraction of sites
for which Pr(X Y) gt Pr(X)
Scope X Y
11
SX,X1 for 79 vertebrate PWMs from JASPAR
Control-1 Randomly select i, j pairs. Control-2
Randomly select i and then select jis.
Control-3 constructs PWM Mr with same width as M
by randomly sampling columns from the 79
vertebrate PWMs in JASPAR. Control-4 Construct
PWM Mr from M by randomly shuffling the
compositions at each column (position).
12
SX,Xs decreases with increasing scope s.
However it remains significantly greater than the
respective control-4 up to scope 6
13
Functional relevance of positions with
compensatory mutation
14
Evans, Donahue, Hannenhalli, RECOMB-Comparative
Genomics 2006
15
Binding site Ambiguity/Redundancy
  • Several transcription factors have distinct PWMs
  • Several distinct transcription factors have very
    similar PWMs

ACCGTGTTT ACCGACTTT ACCGTGAAT ACCGTGTTT TCCGTGTTT
TCAGTGTTT TCTGTGTTT TCGGTGTTT
PWM1
PWM
PWM2
16
Enhancing Positional Weight Matrices using
Mixture models
  • A mixture model allowing an arbitrary number of
    base PWM

Given mixture
the probability of observing sequence Xi
(Xi1,, Xin) is
Use EM algorithm to estimate subclasses We use
k2 base class PWMs (due to lack of data and lack
of knowledge of appropriate number of classes)
Hannenhalli and Wang, Bioinformatics, 2005
17
Sequence conservation of binding sites using
Mixture model
48
39
23
  • Based on 64 Vertebrate TF entries in JASPAR
    database

18
Subclass Dissimilarity vs Prediction Improvement
Less dissimilar
More dissimilar
19
39 36 30 23 15
13
64 57 44 32 20
16
Relative entropy between two base PWMs
20
Expression Coherence of target genes using
mixture model
EC of a set of genes is the fraction of
gene-pairs whose expressions across several
tissues/conditions are very similar
PWM1
PWM2
Is the intra-class EC higher than inter-class EC?
In 44 of the 55 (80) cases, the average
expression coherence within subclass-PWM targets
was higher than expression coherence of across
subclass targets. In all but one cases (98) at
least one of the two subclass PWMs had a
coherence score higher than the cross coherence
score.
Hannenhalli and Wang, Bioinformatics, 2005
21
LEU3 Dataset Liu et al., 2002
  • Free energy of binding available for 46 observed
    binding sites of LEU3 Liu et al., 2002
  • The two clusters from the EM algorithm have
    significantly different binding energies.

22
Yeast Reb1
  • Using the mixture modeling on the 15 known REB1
    sites from TRANSFAC, we find the last position to
    be such that
  • 1st subclass-PWM has Pr(G)0.85, Pr(T)0.15
  • 2nd subclass-PWM has Pr(G)0 and Pr(T)0.5

Tanay et al, 2004, GR Wang and Warner, 1998, Mol
Cell Biol
23
Bi-clustering based modeling
Vertical Partitioning
Vertical partitioning
ACCGTCTCAA ACCGTGTGAA AGCGTGCCCT ACGGTGCCCA TGGCCG
CCGA TCGCACTCTT TGCCCCTGCT TGGCCCTCTT
ATACGGT ATACCGT CGCGGCA CGAGCCT
III
I
IV
Horizontal Partitioning
ACCGTGTTT ACCGACTTT ACCGTGAAT ACCGTGTTT TCCGTGTTT
TCAGTGTTT TCTGTGTTT TCGGTGTTT
II
V
Horizontal partitioning
24
Context-dependent binding specificity
X
Y
X
Z
X
25
Binding site Ambiguity/Redundancy
  • Several transcription factors have distinct PWMs
  • Several distinct transcription factors have very
    similar PWMs

26
TESS
27
32 Class
1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0 -1.6
-1.6 0.0 0.59 0.0 0.59 -1.6 -1.6 -1.6
-1.6 0.59 0.96 0.59 -1.6 -1.6 0.96 -1.6
-1.6 -1.6 -1.6 0.96
1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0 -1.6
-1.6 0.0 0.59 0.0 0.59 -1.6 -1.6 -1.6
-1.6 0.59 0.96 0.59 -1.6 -1.6 0.96 -1.6
-1.6 -1.6 -1.6 0.96
80 Family
117 Subfamily
1034 factors
28
Once upon a time a transcription factor gene was
duplicated
DNA Binding Domain
Interaction Domain
Promoter
Conserved DBD
Divergent nDBD
Redundant paralogs
Divergent Expression
Divergent Promoter
29
Hypothesis Homologous TF-pairs with similar DBD
have diverged in expression. Control
Homologous nonTF-pairs Homologous TF-pairs
with dissimilar DBD
D(X,Y) EX EY
T158
Ti
T1
TF X TF Y
30
416 homologous TF-pairs (BLAST E-value lt
E-10) 125 with similar binding (p-value lt 0.02)
TFs with similar binding are more similar
overall. Thus a greater expression divergence is
surprising.
In thyroid tissue the hypothesis holds
(Mann-Whitney p-value 0.00156)
31
In Human, 416 homologous TFs, 125 with similar
binding In a total of 158 samples (Novartis)
p-value Number of Human Tissues MW test
0.1 91.7 (145)
0.05 87.3 (138)
0.01 74.7 (118)
In Yeast, 219 homologous TFs, 35 with similar
binding In a total of 57 samples (Spellman)
p-value Number of Yeast Samples
0.1 49.1 (28)
0.05 33.3 (19)
0.01 1.8 (1)
32
Overview
Core promoter prediction TF-DNA binding TF-TF
interactions Transcriptional Modules Application
s
33
Transcription Factor cooperation/interaction
Expression Coherence
Pilpel et al. (2001). Nat Genet, Banerjee and
Zhang (2003) NAR
Positional Coherence
Hannenhalli and Levy (2002). NAR.
Interaction-dependent binding
34
Interaction-dependent binding
ChIP-chip
Set of gene promoters bound by F
DNA binding motif M of F
Transcription Factor F
Can M discriminate between P and B?
Bound promoters (P)
Unbound promoters (B)
The answer is NO for a large fraction of
transcription factors
Perhaps binding of F depends (synergistic or
antagonistic) on other motifs
35
PWM based occupancy probability
PWM based occupancy probability
Binding probability (ChIP)
Interaction coefficient
  • The ChIP-chip data for a majority of TFs is
    better explained using interaction-dependent
    binding.
  • Almost all of the Yeast cell cycle interactions
    were detected at 10 prediction rate
  • When applied to genome-wide CREB binding in rat,
    15 of the 18 detected interactions have varying
    degree of support.
  • Wang, Jensen, Hannenhalli RECOMB-Regulation 2005

36
Overview
Core promoter prediction TF-DNA binding TF-TF
interactions Transcriptional Modules Application
s
37
Co-regulated genes have common binding sites in
their promoters
Apoptosis Pathway
BCL2-antagonist(BAD)
68 TFs
37 TFs in common
B-cell CLL/lymphoma 2(BCL2)
89 TFs
AP-2, CREB, E2F, cMyc, NF-Kappa-b, c-ETS, Egr-1
etc.
374
Hypergeometric p-val E-11
68
37
89
38
Interacting proteins have greater similarity in
their promoter regions
Hannenhalli and Levy (2003). Mamm Genome
39
Transcriptional module discovery
TFs
Singular Value Decomposition
1 1 1 0 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 0 0 1 0 1
1 0 0 0 1 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 0 0
Genes
Clique enumeration in bipartite graphs
Cluster of genes and discriminating TF
Distance Matrix
K-means Clustering
40
Tissue-Specific Transcriptional Module
Tissue specificityby expression levelSchug et
al 2005
Binding prediction
Transcriptional-Module specific to a tissue type
Everett, Wang, Hannenhalli, ISMB 2006
41
Overview
Core promoter prediction TF-DNA binding TF-TF
interactions Transcriptional Modules Application
s
42
Transcriptional Regulation in Cardiac Myocytes
Frey N, Olson EN. Annu Rev Physiol.
20036545-79.
43
Expression profiling in advanced heart failure
  • Large tissue bank from Temple and Penn
  • Failing explanted hearts (n173)
  • Non-failing hearts from unused donors (n16)
  • Each hybridized with an HU133A (n189)
  • Conservative analysis RMA (bioconductor), SAM

3000 dysregulated genes in advanced human HF
with FDR lt 5.
Is there any evidence that specific transcription
factors are directing these changes?
44
Transcriptional Genomics
45
Differentially expressed Genes (G)
Score(x) freq(x) in G / freq(x) in B
Statistical Significance is computed using 1000
random sampling of genes from background set
Background Set (B)
46
Transcription Factors enriched in differentially
up-regulated genes
TRANSFAC ID Fold enrichment p-value Factor
M00471 1.70 0.000 TBP
M00318 1.63 0.001 Lentiviral_Poly_A
M00062 1.52 0.000 IRF-1
M00138 1.50 0.004 Octamer
M00291 1.48 0.000 Freac-3
M00403 1.48 0.001 aMEF-2
M00103 1.48 0.000 Clox
M00216 1.47 0.000 TATA
M01000 1.46 0.001 AIRE
M00109 1.46 0.000 C/EBPbeta
M00405 1.45 0.001 MEF-2
M00451 1.45 0.004 NKX3A
M00972 1.44 0.001 IRF
M00249 1.43 0.002 CHOPC/EBPalpha
M00102 1.43 0.002 CDP
M00302 1.43 0.000 NF-AT
M00729 1.42 0.003 Cdx-2
M00622 1.41 0.001 C/EBPgamma
M00078 1.41 0.005 Evi-1
M00407 1.40 0.003 RSRFC4
M00616 1.39 0.004 AFP1
M00310 1.35 0.000 APOLYA
M00770 1.35 0.002 C/EBP
M00485 1.34 0.002 Nkx2-2
M00432 1.34 0.004 TTF1
M00346 1.34 0.002 GATA-1
M00478 1.34 0.003 Cdc5
M00724 1.33 0.005 HNF-3alpha
M00699 1.32 0.002 ICSBP
M00394 1.31 0.002 Msx-1
M00088 1.28 0.005 Ik-3
M00238 1.27 0.005 Barbie_Box
47
What about early events?
  • The differentially upregulated genes have a
    greater number (32) of enriched TFs compared to
    downregulated genes (6).
  • The ischemic and idiopathic cases are consistent
  • Validation of GATA, MEF2, NKx, NFAT transcription
    factors in human heart failure
  • Potential role for FOX factors and IRF

Mice with infarcts and sham operated controls
sacrificed at varying times after surgery (1, 4,
8, 24 hrs, 8 wks) Analysis of differentially
co-regulated gene clusters reveal consistent set
of transcription factors.
48
FOX factor Summary
  • FOX targets change substantially in advanced
    human HF and in early HF in mice.
  • FOX factors are present in human heart at
    physiologic levels FOXP1, P4, C1, C2, J2
  • FOXP1 is localized to nuclei of human cardiac
    myocytes.
  • Do FOX factors mediate cardiac hypertrophy?

Hannenhalli et al. Circulation, 2006
49
Gene Regulation in Learning and Memory
Naïve (N) Conditioned Stimulus only (CS) Fear
Conditioned (FC)
Hippocampus Amygdala
Keeley et al. Memory and Learning, 2006
50
Immediate Early Gene Expression is Regulated by
Many Transcription Factors
http//web1.tch.harvard.edu/research/greenberg/old
site/Pathways.html
51
50 Most Significantly Regulated Genes were Used
for Further Analysis
Hippocampus
Amygdala
52
Hippocampus- and Amygdala-specific promoter
modeling
  • Hippocampus
  • CREB, E2F1, Pax4, Sp1, GATA1, AP2, ZF5, Nrf-1
  • Amygdala
  • CREB, E2F1, Pax4, Sp1, GATA1, AP2, ZF5, Ets1,
    Elk1, Myc/Max, USF

53
Promoter models were able to predict regulation
of less significant genes with some system
specificity
54
Overview
Core promoter prediction TF-DNA binding TF-TF
interactions Transcriptional Modules Application
s
55
Core Promoter Minimal DNA sequence required for
the assembly of the Pre-initiation complex (100
bps flanking the TSS) Goal Determine sequence
properties responsible for precise Pol-II
localiazation
56
CpG island line
PromoterInspector
PromoterScan
Hannenhalli
PromFind
FirstEF
Promoter1.0
NNPP
TSSG
PSPA
CorePromoter
TATA
Calverie
Autogene
Dragon
1995
2006
2000
1990
57
CpG Islands
Unmethylated GC-rich regions (experimental)
GC-rich regions (? 200 bp) on the genome with
high CG di-nucleotide frequency (computational)
Gardiner-Garden and Frommer, 1987
About half of all genes have a CpG island
overlapping the first exon.
Antequera and Bird, 1993
58
Categories of DNA sequence signals used in
promoter prediction
Generalization of Markov Models Wang and
Hannenhalli, BMC BI, 2005
Long range sequence Characteristics(10kb)
TSS
Short genomic Sub regional signal, eg. CpG
island(0.52kb)
Specific cis elements (eg. TATA)
59
Position Specific Propensity Analysis (PSPA)
PSPA based Model
Use -100bp around TSS as training
Wang and Hannenhalli, BBRC, 2006
60
Overlap between prediction tools
61
Carninci et al. (2006). "Genome-wide analysis of
mammalian promoter architecture and evolution."
Nat Genet 38(6) 626-635.
62
  • CpG poor promoters have greater conservation and
    fewer aTSS and mostly involved in extra-cellular
    and stress-response activities.
  • By including position specific motifs and their
    co-occurrence, PSPA improves the Transcription
    Start site localization.
  • Many Position Specific elements are associated
    with target gene function.
  • There is little overlap among various
    state-of-the-art prediction tools.
  • Alternative promoters have tissue specific usage

63
Acknowledgement
Junwen Wang PCBI, UPenn Larry Singh PCBI,
UPenn Li-San Wang Biology, UPenn Shane
Jensen Statistics, Wharton, UPenn Perry
Evans Greg Donahue Genomics and Comp Bio,
Upenn Tom Cappola Cardiology, UPenn Mike
Keeley Biology, Upenn Ted Abel Biology, Upenn
Write a Comment
User Comments (0)
About PowerShow.com