Network motifs: discovery and applications - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

Network motifs: discovery and applications

Description:

Title: Slide 1 Author: Guy Last modified by: Guy Created Date: 3/30/2005 10:24:13 AM Document presentation format: Custom Company: Zinman Other titles – PowerPoint PPT presentation

Number of Views:294
Avg rating:3.0/5.0
Slides: 84
Provided by: guy80
Category:

less

Transcript and Presenter's Notes

Title: Network motifs: discovery and applications


1
Network motifs discovery and applications
  • Guy Zinman
  • Seminar in Bioinformatics
  • Technion, Spring 2005

2
Outline
  • Theory of network motifs
  • Definition, Algorithm
  • Application to E. Coli transcription network
  • The dynamic behavior of the motifs
  • Finding active subnetworks
  • Simulated annealing
  • experiments

3
Network
4
Network
  • Dictionary definition
  • A group or system of (electric) components and
    connecting circuitry designed to function in a
    specific manner.
  • Network is the backbone of a complex system
  • Studies of networks are similar to paleontology
    learning about an animal
  • from its backbone

5
Network motifs
  • The notion of motif, widely used for sequence
    analysis, is generalized to the level of
    networks.
  • Network Motifs are defined as patterns of
    interconnections that recur in many different
    parts of a network at frequencies much higher
    than those found in randomized networks.

6
Network motifs (cont.)
  • Such motifs are found in networks from
  • Biochemistry
  • Transcriptional regulation networks
  • Neurobiology
  • Neuron connectivity
  • Ecology
  • Food webs
  • Engineering
  • Electoronic circuits
  • World Wide Web

7
Network motifs (cont.)
8
(No Transcript)
9
Schematic view of motif detection
  • Occurrence of the FFL motif

10
Random vs designed/evolved features
  • Large networks may contain information about
    design principles and/or evolution of the complex
    system
  • Which features are there for a reason
  • design principles (e.g. feed-forward loops)
  • constraints (e.g. the all nodes on the Internet
    must be connected to each other)
  • evolution, growth dynamics (e.g. network growth
    is mainly due to gene duplication)

11
Network motifs
  • Alon U. et al Network Motifs Simple building
    Blocks of Complex Networks Science, 2002.
  • Different motifs were found in different classes
    of network.
  • The motif reflect the underlying processes that
    generate each type of network.

12
Motifs detected
  • Two significant motifs
  • Both appeared numerous times in non-homologous
    gene systems that perform diverse biological
    functions

13
Motifs detected
14
Motifs detected
15
Main tasks for detecting network motifs
  • There are two main tasks in detecting network
    motifs
  • (1) generating an ensemble of proper random
    networks
  • (2) counting the subgraphs in the real network
    and in random networks.

16
The algorithm
  • Starting point graph with directed edges
  • Scan for n-node subgraphs (n3,4) and count
    number of occurrences
  • Compare to Erdos-Renyi randomized graph
  • (randomization preserves in-, out- and inout-
    degree of each node)

17
All 3-node connected subgraphs
  • 13 different isomorphic types of 3-node connected
    subgraph
  • There are
  • 199 4-node subgraphs,
  • 9364 5-node subgraphs

18
Generation of randomized network
  • Algorithm A
  • Employ a Markov-chain algorithm based on starting
    with the real network and repeatedly swapping
    randomly chosen pairs of connections (X1 gt Y1,
    X2 gt Y2 is replaced by X1 gt Y2, X2 gt Y1) until
    the network is well randomized.
  • Switching is prohibited if the either of the
    connections X1 gt Y2 or X2 gt Y1 already exist.

19
Generation of randomized network
  • Algorithm B
  • Each network was presented as a connectivity
    matrix M, such that Mij 1 if there is a
    connection directed from node i to node j, and 0
    otherwise.
  • The goal is to create a randomized connectivity
    matrix Mrand, which has the same number of
    nonzero elements in each row and column as the
    corresponding row and column of the real
    connectivity matrix.

20
Generation of randomized network
  • Ri ?jMrand,ij ?jMij, Ci ?iMrand,ij ?iMij.
  • To generate the randomized networks, we start
    with an empty matrix Mrand.
  • We then repeatedly randomly choose a row n
    according to the weights pi Ri/?Ri and a column
    m according to the weights qj Rj/?Rj.
  • If Mrand,nm 0, we set Mrand,mn 1.
  • We then set Rm Rm 1 and Cn Cn 1. If the
    entry (m, n) was previously entered to the
    randomized matrix, that is, ifMrand,mn 1, or if
    m n, we choose a new (m, n).
  • This process is repeated until all Ri 0 and Cj
    0.

21
Network motif detection
  • For each nonzero element (i,j)
  • Looping through all connected elements Mik 1,
    Mki 1, Mjk 1, and Mkj 1. This is
    recursively repeated with elements (i, k), (k,
    i), (j,k), and (k, j) until an n-node subgraph is
    obtained.
  • A table is formed that counts the number of
    appearances of each type of subgraph in the
    network, correcting for the fact that multiple
    submatrices of M can correspond to one isomorphic
    architecture owing to symmetries.

22
Network motif detection
  • This process is repeated for each of the
    randomized networks. The number of appearances of
    each type of subgraph in the random ensemble is
    recorded, to assess its statistical significance.
  • The present concepts and algorithms are easily
    generalized to nondirected or directed graphs
    with several colors of edges and nodes,
    multipartite graphs, and so forth.

23
Criteria for Network Motif Selection
  • The probability that it appears in a randomized
    network an equal or greater number of times than
    in the real network is smaller than P 0.01.

Reminder p-value the probability to get the
given result when the tested subject is not
affected by the experiment. if p-value lt 0.01
than the subject is considered to be affected
(the hypothesis is correct).
24
Run time complexity
  • The performance of this algorithm scales with the
    total number of n-node subgraphs in the network.
  • The number of subgraphs and the algorithm runtime
    also increase dramatically for subgraphs with n
    5.

25
Sampling method for subgraph counting
  • Kashtan et al. Efficient sampling algorithm for
    estimating subgraph concentrations and detecting
    network motifs Bioinformatics, 2004.
  • This algorithm samples subgraphs in order to
    estimate their relative frequency.
  • The runtime of the algorithm asymptotically does
    not depend on the network size.
  • Surprisingly, few samples are needed to detect
    network motifs reliably.

26
Subgraph sampling
  • Procedure description
  • pick a random edge from the network and then
    expand the subgraph iteratively by picking random
    neighboring edges until the subgraph reaches n
    nodes.
  • For each random choice of an edge, in order to
    pick an edge that will expand the subgraph size
    by one, prepare a list of all such candidate
    edges and then randomly choose an edge from the
    list.

27
Subgraph sampling
  • Finally, the sampled subgraph is defined by the
    set of n nodes and all the edges that connect
    between these nodes in the original network.
  • Finding n-node subgraphs for n 5 is much easier
    now.

28
Comparing sampling method results with exhaustive
enumeration
29
Transcriptional Regulation Network ofEscherichia
coli
  • Operon a group of contiguous genes that are
    transcribed into a single mRNA molecule.
  • The transcriptional network is represented as a
    directed graph each operon represents a node and
    edges represent
  • direct transcriptional
  • interactions.

30
Application to E. Coli
  • Alon U. Network motifs in the transcriptional
    regulation network of Eschersichia coli Nature
    Genetics, 2002.
  • Database - RegulonDB
  • contains interactions between Transcription
    Factors and the operons they regulate
  • Contains 577 interactions, 424 operons and 116
    TFs
  • 35 more TFs were added from literature
  • Previously described algorithm was run on this
    data (1000 random networks)

31
Significant motifs
  • Feedforward loop
  • found in 22 different systems,
  • 10 TFs and 40 operons
  • P-Val0.001

32
Concentration of FFL
33
Same in the yeast regulatory network
  • Young et. al Transcriptional Regulatory Networks
    in Saccharomyces cerevisiae Science, 2002

34
  • Can you think of a possible role for this motif?

35
Dynamics for the FFL
36
  • Mangan et al., Structure and function of the
    feed-forward loop PNAS, 2003.
  • Consider Sx and Sy as
  • Input signal small molecules
  • That activate or inhibit the
  • Activity of X and Y.

37
Coherency of FFLs
  • The FFL is coherent if the direct effect of the
    general TF on the effector has the same sign.
  • 85 of the FFL found were coherent.

38
Significant motif
  • Single Input Motif (SIM)
  • Single Transcription Factor controls set of
    operons.
  • All operons in a SIM are regulated
  • with the same sign.
  • Appeared in 24 different systems

39
Dynamics for the SIM
40
Significant motif
  • Dense Overlapping Regulon (DOR) -
  • a layer of overlapping interactions between
    operons and a group of TFs, much denser than this
    structure would appear in an Erdos-Renyi random
    graph

41
E. Coli network
42
Dor detection
  • Briefly
  • Define a (nonmetric) distance measure between
    operon k and j.
  • The operons were clustered.
  • DORs corresponded to clusters with more than C10
    connections, with ratio of connections to TF
    greater than R2.

43
mFinder
  • A software tool for estimating subgraph
    concentrations and detecting network motifs.
  • www.weizmann.ac.il/mcb/UriAlon/

44
Discussion
  • The concept of homology between genes based on
    sequence motifs has been crucial for
    understanding the function of uncharacterized
    genes.
  • Likewise, the notion of similarity between
    connectivity patterns in networks, based on
    network motifs, may be helpful in gaining insight
    into the dynamic behavior of newly identified
    gene circuits.

45
Discussion
  • Until now we considered only transcription
    interactions specifically manifested by
    transcription factors that bind regulatory sites.
  • This transcriptional network can be thought of as
    slow part of the cellular regulation network
    (time scale of minutes).

46
Discussion
  • An additional layer of faster interactions, which
    include interaction between proteins (often
    subsecond timescale), contributes to the full
    regulatory behavior.

47
Finding active subnetworks
  • Ideker, T. Discovering regulatory and signaling
    circuits in molecular interaction networks
    Bioinformatics, 2002.
  • Integrates protein-protein and protein-DNA
    interactions with mRNA expression data, in a goal
    of better understanding the molecular mechanism
    of the observed gene expression.
  • Uses a method of searching the network to find
    active subnetwork, i.e., connected sets of
    genes with unexpectedly high levels of
    differential expression, under one or more
    perturbation.

48
Methodology
  • Using a molecular interaction network to analyze
    changes in expression over 20 perturbations to
    the yeast galactose utilization (GAL) pathway.
  • Determining which conditions significantly
    affected the gene expression in each active
    subnetwork.

49
The means
  • Combining a rigorous statistical measure for
    scoring subnetworks with a search algorithm for
    identifying subnetworks with high score.

50
Basic z-score calculation
  • To rate the biological activity of a particular
    subnetwork, begin with assessing the significance
    of differential expression for each gene.
  • The error model provided by VERA (Variability and
    ERror Assessment) program.
  • VERA estimates the parameters of a statistical
    model using the method of maximum likelihood.
  • Output p-values (pi), representing the
    significance of expression change.

51
Basic z-score calculation
  • Each pi is converted to z-score
  • zi F-1(1-pi)
  • F-1 The inverse normal CDF (cumulative
    distribution function)
  • Smaller p-values correspond to larger z-score

52
Scoring of Subnetworks
  • Aggregate z-score for an entire subnetwork A of k
    genes
  • Notice
  • zA will also be distributed according the
    standard normal (because the variables are
    independent).
  • Subnetworks of all sizes are comparable under
    this scoring system, independent of k.
  • A high zA indicates a biologically active
    subnetwork.

53
Calibrating z against background distribution
  • Randomly sample gene sets of size k using a Monte
    Carlo approach, compute their scores zA, and
    calculate standard deviation parameters for each
    k.
  • The corrected subnet score SA is

54
Scoring an example subnetwork
SA
55
Scoring over multiple conditions
  • Starting with a matrix of p-values (genes vs.
    conditions) and corresponding z-scores.
  • Producing m different aggregate scores, one for
    each condition, and sorting them.
  • Finding the probability that at least j of the m
    conditions had scores above zA(j)
  • Monte Carlo technique is used for estimating the
    mean and the standard deviation from random gene
    set of size k.

56
Scoring over multiple conditions
57
Finding the maximal scoring
  • Problem
  • Finding the maximal scoring connected subgraph
    is NP-hard.

58
The Difficulty in Searching Global Optima
Global maxima
Local maxima
Local maxima
significance score
subnetwork
59
Rugged landscapes and local maxima problem
60
Monte Carlo random search
  • Known also as the Metropolis algorithm
  • A simulation technique for conformational
    sampling and optimization based on a random
    search for energetically favourable conformations
  • Finding global (or at least good local) maximum
    by biased random walk may take some luck

61
Global maxima
Local maxima
Local maxima
significance score
subnetwork
62
Climbing mountains easier simulated annealing
In order to get out from a local maxima one needs
to allow for locally unfavorable moves
Global maxima
Local maxima
Local maxima
significance score
subnetwork
63
Introduction to simulated annealing
  • Simulated annealing (Kirkpatrick et al.,1983).
  • Mathematical method developed together with
    Monte Carlo techniques to avoid false maxima
    Method simulates slow cooling of a solidifying
    solution to form a single crystal
  • Origin
  • The annealing process of heated solids
  • Intuition
  • By allowing occasional descent in the search
    process, we might be able to escape the trap of
    local maxima.
  • In our context
  • Allow nodes to be removed from the subsets, even
    if the resulting subnetworks score is a (little)
    lower.


64
  • What can be an adverse effect of this method?

65
Consequences of the Occasional Ascents
adverse effect
desired effect
Might pass global optima after reaching it
Help escaping the local optima.
  • So the result is not guaranteed to be optimal.
  • But here we dont care- any high-scoring
    subnetwork is suspected to be biologically
    significant.

66
Climbing mountains easier simulated annealing
  • Defining a temperature function.
  • Increasing the effective temperature means
    higher probability of accepting moves that
    increase the energy Thus, the likelihood of
    escaping from a local maximum may be tuned.

67
Control of Annealing Process
Acceptance of a search step (Metropolis
Criterion)
Assume the performance change in the search
direction is .
Always accept a ascending step, i.e.
Accept a descending step only if it pass a random
test, i.e. with probability p
68
Control of Annealing Process
Cooling Schedule
T, the annealing temperature, is the parameter
that control the frequency of acceptance of
decending steps.
We gradually reduce temperature T(k) between 1
and 0.
The probability to accept declining steps is
proportional!
69
In our context
  • Input
  • Graph G (V,E) of molecular interactions,
  • N number of iteration
  • Ti temperature function which decreases from
    Tstart to Tend
  • Output
  • Gw Subgraph of G
  • Initialize Gw by setting each node to an
    active/inactive state randomly (with p ½).

70
Simulated Annealing Algorithm
  • For i 1 to N DO
  • Randomly pick a node v from V and toggle its
    state.
  • Compute the score si for the working subgraph Gw
  • IF (si gt si-1), keep v toggled
  • ELSE keep v toggled with probability

71
Heuristics for improved annealing
  • Look for M active subnetworks simultaneously.
  • M is a user defined variable
  • Maintaining multiple components can improve the
    efficiency of annealing.
  • Can be done by
  • multiple annealing runs
  • Or by
  • extending the annealing approach to maintain a
    graph state vector of the top M component scores.

72
Galactose metabolic flow
73
Results
Experiment 1 small network of 362 interaction. 2
conditions of the expression data gal80 deletion
vs. WT. 5 significant subnetworks were found,
including 41 out of 77 significant genes.
74
Score and temperature vs. number of iteration
  • Temperature cooling is geometric from 1 to 0.
  • N
  • By the end of the run, each of the 5 subnetworks
    reach a (local) maximum.

75
Evaluation of the subnetworks
Z-score distribution of the top 5 active networks.
Z-score distribution with real data
Z-score distribution with random data ( scrambled
nodes z-scores )
76
Experiment 2
Results
  • Network consists of all known interactions7145
    protein-protein interactions from BIND317
    regulation interactions from TRANSFAC
  • Expression data includes 20 perturbations to
    genes in the Galactose pathway.
  • 7 active subnetworks found. The biggest consists
    of 340 genes.
  • Repeating annealing with the network above,
    generated 5 significant sub-sub-networks.
  • All results were evaluated with methods similar
    to what we have seen.

77
(No Transcript)
78
Discussion
79
Cytoscape
  • www.cytoscape.org

80
Summary
  • Theory of network motifs
  • Definition, Alogorithm
  • Application to E. Coli transcription network
  • The dynamic behavior of the motifs
  • Finding active subnetworks
  • Simulated annealing
  • 2 experiments

81
References
  • S Shen-Orr, R Milo, S Mangan U Alon,
  • Network motifs in the transcriptional regulation
    network of Escherichia coli.
  • Nature Genetics, 3164-68 (2002).
  • R Milo, S Shen-Orr, S Itzkovitz, N Kashtan, D
    Chklovskii U Alon,
  • Network Motifs Simple Building Blocks of Complex
    Networks
  • Science, 298824-827 (2002).
  • Ideker, T., Ozier, O., Schwikowski, B., and
    Siegel, A.
  • Discovering regulatory and signaling circuits in
    molecular interaction networks.
  • Bioinformatics 18 S233 (2002).

82
  • S. Mangan and U. Alon
  • Structure and function of feed forward loop
    network motif.
  • PNAS 10011980-11985 (2003).
  • N. Kashtan, S. Itzkovitz, R. Milo and U. Alon
  • Efficient sampling algorithm for estimating
    subgraph concentration and detecting network
    motifs Bioinformatics 201746-175 (2004).
  • S. kirkpatrick, C. D. Gelatt and M. P. Vecchi
  • Optimization by simulated annealing
  • Science 220671-680 (1983).

83
Thank you
Write a Comment
User Comments (0)
About PowerShow.com