Protein-Protein Interaction Network - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Protein-Protein Interaction Network

Description:

Protein-Protein Interaction Network Gautam Chaurasia 08.07.04 Overview Introduction. Three Different Models: Structure of the protein-protein interaction network. – PowerPoint PPT presentation

Number of Views:460
Avg rating:3.0/5.0
Slides: 47
Provided by: Gas138
Category:

less

Transcript and Presenter's Notes

Title: Protein-Protein Interaction Network


1
Protein-Protein Interaction Network
  • Gautam Chaurasia
  • 08.07.04

2
Overview
  • Introduction.
  • Three Different Models
  • Structure of the protein-protein interaction
    network.
  • Non-power law.
  • Evolutoin of the network.
  • Power Law Random Graphs.
  • Detection of functional modules from protein
    interaction networks.
  • Clustering algorithm

3
Introduction
  • The network is viewed as a graph whose nodes
    correspond to proteins. Two proteins are
    connected by an edge if they interact.
  • The collection of all interactions between the
    proteins of an organism is called interactome.
  • The Y2H system (yeast-two-hybrid) is used to
    yield a comprehensive map of protein-protein
    interaction network.
  • The network resembles a random graph in that it
    consists of many small subnets (groups of
    proteins that interact with each other but do not
    interact with any other protein) and one large
    connected subnet comprising more than half of all
    interacting proteins.

4
Structure of PPI Network
  • Yeast protein interaction network. (Uetz et al.
    2000)
  • A A two-dimensional drawing of the entire
    network.
  • B The giant (hub) component of this graph
    consists of 466 proteins.
  • C A small section of the hub component, with
    gene or open reading frame names shown next to
    each node.

5
Structure of PPI Network
  • Degree
  • Described by the connectivity k of the node,
    which tells us how many links the node has to
    other nodes.
  • Degree distribution
  • The degree distribution p(k), gives the
    probability that a selected node has exactly k
    links.
  • P(k) is obtained by

6
ER Random Graph
  • ER Random Graphs
  • An ER random graph consists of n nodes and k
    edges, where any pair of nodes is equally likely
    to be connected by one of the k edges.
  • Start with a given number of nodes and add links
    randomly.
  • which creates a graph with approximately
    pN(N1)/2 randomly placed links.
  • The node degrees follow a Poisson distribution.

7
Scale-Free Network
  • Scale-free networks Rich getter richer.
  • Scale-free networks are characterized by a
    power-law degree distribution the probability
    that a node has x links follows,
  • where ? gt 0, so that a plot of log(degree) by
    log(frequency) shows a decreasing linear trend.

8
Model-I Non-Power Law-I
  • The essence of this model is to observe that
    parts of proteins, called domains, contains sites
    into which complementary parts of other protein
    can bind.
  • These complementary parts are referred to as
    positive and negative aspects of domain.
  • Bipartite sub-graph-graphs comprising two
    disjointed sets of nodes in which each node in
    one set is connected to every node in the other
    set.

Fig 2 In this figure, a particular domain for
which the positive form is present in three
proteins A, B, and C, and whose negative form is
in four proteins W, X, Y, Z.
9
Model-I Non-Power Law-II
  • We assume that there are n proteins and m domains
    with a negative and positive form.
  • A domains may be any of the 2m types 1, 1-, 2,
    2-,....,m, m-.
  • Each of the n proteins contains each of the 2m
    possible domains with constant probability p.
  • Let Xi be the number of domains that the ith
    protein has is distributed binomially
  • All the Xi are independent and identically
    distributed.
  • Thus, the average number of sites per protein l
    2mp.

10
Model-I Non-Power Law-III
  • Let Yi be the number of interactions of the ith
    protein.
  • So the probability that any other protein j will
    not connect to i only if it does not contain any
    of the x complementary domain aspects.
  • Since there are n-1 such proteins, we have
  • Where q (1-p). Hence, the unconditional
    distribution of Yi is a binomial mixture of
    binomials

11
Model-I Non-Power Law-IV
  • By Using Inclusion-Exclusion property-type
    expression we get
  • Binomial distribution An experiment with a fixed
    number of independent trials, each of which can
    only have two possible outcomes.
  • For example Tossing a coin 20 times to see how
    many tails occur.
  • Inclusion-Exclusion Let A denote a finite set
    and let P1, ...,Pn be any given properties. We
    want to express the number of elements of A which
    have none of these properties in terms of numbers
    of elements which have some of these properties.

12
Log-log plot of the distribution
  • f(y) is plotted for n6000 proteins, m1000
    domains and
  • ? 1,2.
  • The resulting graph shows clear non-linearity.

Fig Loglog plot of the distribution of vertex
degrees in the modelled interactome with 6000
proteins, 1000 domains and an average of 1 or 2
domains per protein, shown as solid and dotted
lines respectively
13
Degree distribution of sampled sub-graphs
  • A total of 450 proteins were sampled at random.
  • The mean number of neighbors for each protein in
    this sample was 5.
  • The resulting graph has approximately the same
    number of vertices and edges as the Uetz
    datasets.

FigThe Ito and Uetz datasets are plotted in
black and blue, respectively. A straight line
(power law) fit is shown as a dotted line. The
distribution is obtained by sampling from this
model with 6000 proteins, 1000 domains and an
average of 1 domain per protein is plotted in
red.
14
Degree distribution of sampled sub-graphs
  • A total of 1500 proteins were sampled at random.
  • The resulting graph shows the fit of this model
    to datais better than power law.
  • Fig The DIP dataset is plotted in black. The
    distribution obtained by sampling from this model
    with 6000 proteins, 1000 domains and an average
    of 2 domains per protein is plotted in red. A
    straight line (power law) fit is shown as a
    dotted line.

15
Conclusions
  • The degree distribution predicted by this model
    fit the data better than do power law
    distribution.
  • This model fits better to the subnet as compared
    to the power law

16
Example
  • This model can be used to infer the existence of
    interactions not yet detected experimentally, by
    using the predicted bipartite structure of
    sub-graphs.

In this figure strongly suggests that o-Raf1,
PLC-, RALGDS, AF-6, RLF and SUR-8 contain a motif
that interacts with a complementary motif in
R-Ras, Rap1A, KRAS2B, RIN, RIBB, N-Ras and H-Ras.
This would imply that for instance RLF and AF-6
should interact with Rap1A and R-Ras in order to
complete the bipartite graph.
17
Model-II
  • The Yeast Protein Interaction Network Evolves
    Rapidly and Contains Few Redundant Duplicate
    Genes.

18
Evolution of Function
  • Examples
  • Partially redundant duplicates
  • CLN1/2/3 Involves in regulation of activity of
    yeast cyclin dependent kinase. Ks 2.4, over
    200 Myr.
  • TPK1/2/3 Catalytic subunits of yeast cyclic
    AMP-dependent protein kinase. Ks 1.31
  • Diverged gene function
  • EDN vs. ECP EDN has high RNAse activity, act as
    antivetroviral agent,
  • whereas ECP is an antibacterial toxin exertings.
  • dopa carboxylase and amd Duplicates are
    expressed in different parts of the cell,
    therefore having different biological functions.

19
Objective
  • Two main questions are addressed in this model
    are
  • At what rate does functional divergence occur
    after gene duplication for a large sample of
    duplicated gene in genome?
  • Which effects have the products of the duplicated
    genes in the protein-protein interaction network?

20
Data for Analysis
  • The required information on protein-protein
    interaction data comes from a large experiment
    (Uetz et al. 2000) using the yeast two-hybrid
    system (Field and Song 1989).
  • 985 proteins, 899 interactions.
  • 45 self intearctions.
  • Data for duplicated genes were obtained from the
    University of Oregon and described by using the
    fraction Ks .
  • Ks is the measure of the similarity between two
    genes.
  • Only those genes pairs were considered for
    further analysis whose Ks lt 5 cutoff.
  • There were such 9,059 pairs among 6,000 genes
    with Ks lt 5.

21
Power Law Random Graphs-I
  • PL random graphs are random graphs whose degree
    probability distribution P(d) is proportional to
    d-t for some constant t.
  • First, n 6279 isolated nodes were generated,
    and a random integer d gt 0 was assinged to these
    node.
  • This random number d was generated in the
    following way,
  • where r is a random real number uniformly
    distributed in the interval (0, 1), and g gt 0 ,
    is a constant.

22
Power Law Random Graphs-II
  • Second, this number d was accepted with
    probability d-t.
  • The resulting distribution of d is a Power law
    with an weighing function.
  • If d was discarded, a new d was generated
    according to same prescription, and this process
    was repeated untill a d was accepted
  • Once d was accepted, it was assigned to the
    randomly chosen node.

23
Power Law Random Graphs-III
  • Another node was chosen at random (without
    replacement of the previous chosen node), an
    integer d was assigned to it in same way, and
    this process was repeated untill the sum S of
    all the integers assigned to the chosen nodes
    first exceed 2k, where k is the number of edges.
  • The integer assigned to each node correspond to
    the nodes degree.
  • Nodes were connected as per the number of edges
    and this was done untill the number of edges is
    S/2 k.

24
Interaction Network vs. Random Graph
  • Comparison of protein contact network (n 985
    nodes, k 899 edges) with random graphs.
  • The PPI network has an excess of proteins with
    degree 1, but fewer proteins with a higher
    degree than the ER Random graph.
  • Whereas degree distribution of PPI network is
    consistent with the Power Law Random graph.

25
Duplications and Interactions
  • This figure illustrate the effect of gene
    duplication on gene products involved in protein
    interactions.

26
Divergence of Interactions
  • 20 of duplicate gene pairs share an interaction
    partner with 0.5 lt Ks lt 1.0, whereas 80 of genes
    have no common interaction partner with their
    duplicates approximately 100 Myr after
    duplication.
  • Ks gt 2 approaches the value expected for
    randomly chosen gene pairs.
  • The histogram of the fraction of duplicates genes
    whose products have at least one interacting
    protein in common as a function of Ks.

200-300 myr
Intercation turn over every 200-300 Myr
27
Divergence of Interactions
  • Only 57 of the most closely related duplicate
    gene pairs (0ltKslt.5) for which both genes
    interact with other proteins share any protein
    interaction partner in the same subnet.
  • For 380 gene pairs with Ks gt 0.5 the fraction of
    duplicate partners with shared interaction is lt
    20.
  • Ks gt 1.5 is close to the random expected value.

28
The Rate of Interaction Loss
  • The divergence in protein interaction after gene
    duplication is largely due to interaction loss.
  • 127 pairs with KS lt 2, where both duplicates
    engage in protein-protein interaction network.
  • 920 interactions were present after duplication.
  • 429 of which have been lost since at the rate of
    2.3e-3/Myr.
  • Is this estimate low or high?
  • interaction data noise leads to overestimates.
  • young pairs and double-losses lead to
    underestimates.

29
Divergence of Self-interactions
  • Loss or gain of interactions between a pair of
    paralogs due to self-interaction.

Self-Interactions and interactions between
products of duplicate genes.
30
Divergence of Self-interactions
  • Total of 25 paralogs.
  • Only few conserved self-interactions was found.
  • New interactions
  • 13/25 new interactions at the rate of 2.88 x 10-6
    /Myr per pair Ks 1 corresponds to 100 Myr.

31
Conclusions
  • Protein-protein interaction network shows a
    power-law degree distribution.
  • Total 6280 ORF in yeast genome with 1.97 x 107
    possible pair- wise interactions.
  • New interactions forming at slow rates/pair, and
    evolved at a rate of 2.88x10-6 per protein pair
    per million year.
  • Extrapolating the above estimate to entire yeast
    proteome would thus yield (1.97 x 107 x
    2.88x10-6) 57 newly evolved interaction per
    million years.

32
Model-III- Cluster Analysis
Detection of Functional Modules from Protein
Interaction Networks of S.cerevisiae.
33
Cluster Analysis
  • CA is an obvious choice of methodology for the
    extraction of functional modules from protein
    interaction networks.
  • Clustering is defined as the grouping of objects
    based on their sharing discrete, measureable
    properties.
  • In functional genomics, clustering algorithm have
    been devised for multiple tasks, such as mRNA
    expression analysis and the detection of protein
    families.
  • The aim of this model is to detect biologically
    meaningfull patterns in the entire known protein
    interaction network of S.cerevisiae.

34
Clustering Algorithm
  • The protein interaction data were obtained from
    DIP database.
  • The network of proteins is first transformed into
    a weighted graph.
  • The weights attributed to each intearaction
    reflect the degree of confidence level,
    represented by the number of experiments that
    support the interactions.
  • The score of 3.0 was assigned for the first
    instance of interaction, and increased by 1 if
    the interaction supported by another method or
    0.25 if the interaction had already been observed
    by that method.

35
Clustering Algorithm
  • The resulting graph is weighted network of
    proteins connected by edges.
  • Now this weighted graph is converted into a line
    graph L(G), in which edegs now represent nodes
    and nodes represent edges.

36
Clustering Algorithm
  • The scores for the original constituent
    interaction are then averaged and assigned to
    each edge.
  • The TribeMCL software, an algorithm for
    clustering graph, was used to cluster the
    interaction network and recover cluster of
    associated interactions.
  • These clusters range in size from 2 to 292
    components (average size is 8.05), and form a
    scale-free protein network.

37
Results
  • Total of 1046 clusters were obtained.
  • In this analysis, each protein was on average
    present in 2.1 clusters.
  • Only 76 interactions and 146 proteins (represent
    only lt 1 of total data), which were weakly
    connected to the main interaction network, were
    discarded by the clustering method.
  • The found Clusters were classified in three
    categories according to the functional
    involvement of proteins in different machanism.
  • KEGG regulatory and metabolic classifications
    (20).
  • GQFC Genequiz automatic functional
    classification (45).
  • MIPS Cellular localization (48).

38
Validation of the Clustering Method-I
  • Scoring the cluster Cluaters are validated by
    assesing the consistency of protein
    classification within an individual cluster.
  • This is measured, for each of three
    classifiaction schemes, by calculating the
    redundancy of each cluster j
  • Rj redundancy (Rj) of each cluster j.
  • n represents the number of classes in the
    classification scheme,
  • Ps represents the relative frequency of the
    class in cluster j,
  • The numerator represents the information content
    in bits given by entropy (H),
  • The denominator is a normalizing factor
    representing the maximum entropy for the cluster
    j (Hmax).

39
Validation of the Clustering Method-II
Fig. Module validation using biological
classification schemes
40
Validation of the Clustering Method-III

Fig. Module validation using biological
classification schemes
41
Validation of the Clustering Method-IV
Fig. Module validation using biological
classification schemes
42
Example-I Cluster 55
  • Here, cluster 55 recovers a set of protein
    interactions (inset) that are involved in vaculor
    transport and fusion from ER via pre- vacuolar
    compartment.

43
Examples-II clusters 32 and 86
  • Recovery of signal transduction pathway
    controlling cell wall biogenesis, from the
    membrane protein (Fks1) to the trancription
    factors activated by this pathway (Swi4, Swi6 and
    Rlm1).Pathway was recovered as a set of two
    clusters connected by two proteins (Pkc1p and
    Smd3p), shows one-to-many relationship.

44
Network of functional modules
  • This graph shows the connection between 40
    functional modules connected by shared proteins.

45
Conclusions
  • This model can be used to predict poorly
    characterized proteins into their functional
    context according to their interacting partners
    within a module.
  • The predictve power of this model allows us to
    examine the organization and coordination of
    multiple complex cellular processes and determine
    how they are organized into pathways.
  • One-to-many relationship can be used for pathway
    discovery.

46
References
  • On the structure of proteinprotein interaction
    Networks A. Thomas, R. Cannings, N.A.M. Monk, and
    C. Cannings. Biochemical Society Transactions
    (2003) Volume 31, part 6.
  • The Yeast Protein Interaction Network Evolves
    Rapidly and Contains Few Redundant Duplicate
    Genes. Andreas Wagner Mol. Biol. Evol.
    18(7)12831292. 2001.
  • Detection of Functional Modules From Protein
    Interaction Networks Jose B. Pereira-Leal,1
    Anton J. Enright,2 and Christos A. Ouzounis1
    PROTEINS Structure, Function, and Bioinformatics
    544957 (2004).
Write a Comment
User Comments (0)
About PowerShow.com