Loading...

PPT – Protein-Protein Interaction Network PowerPoint presentation | free to view - id: 731459-Nzg2M

The Adobe Flash plugin is needed to view this content

Protein-Protein Interaction Network

- Gautam Chaurasia
- 08.07.04

Overview

- Introduction.
- Three Different Models
- Structure of the protein-protein interaction

network. - Non-power law.
- Evolutoin of the network.
- Power Law Random Graphs.
- Detection of functional modules from protein

interaction networks. - Clustering algorithm

Introduction

- The network is viewed as a graph whose nodes

correspond to proteins. Two proteins are

connected by an edge if they interact. - The collection of all interactions between the

proteins of an organism is called interactome. - The Y2H system (yeast-two-hybrid) is used to

yield a comprehensive map of protein-protein

interaction network. - The network resembles a random graph in that it

consists of many small subnets (groups of

proteins that interact with each other but do not

interact with any other protein) and one large

connected subnet comprising more than half of all

interacting proteins.

Structure of PPI Network

- Yeast protein interaction network. (Uetz et al.

2000) - A A two-dimensional drawing of the entire

network. - B The giant (hub) component of this graph

consists of 466 proteins. - C A small section of the hub component, with

gene or open reading frame names shown next to

each node.

Structure of PPI Network

- Degree
- Described by the connectivity k of the node,

which tells us how many links the node has to

other nodes. - Degree distribution
- The degree distribution p(k), gives the

probability that a selected node has exactly k

links. - P(k) is obtained by

ER Random Graph

- ER Random Graphs
- An ER random graph consists of n nodes and k

edges, where any pair of nodes is equally likely

to be connected by one of the k edges. - Start with a given number of nodes and add links

randomly. - which creates a graph with approximately

pN(N1)/2 randomly placed links. - The node degrees follow a Poisson distribution.

Scale-Free Network

- Scale-free networks Rich getter richer.
- Scale-free networks are characterized by a

power-law degree distribution the probability

that a node has x links follows, - where ? gt 0, so that a plot of log(degree) by

log(frequency) shows a decreasing linear trend.

Model-I Non-Power Law-I

- The essence of this model is to observe that

parts of proteins, called domains, contains sites

into which complementary parts of other protein

can bind. - These complementary parts are referred to as

positive and negative aspects of domain. - Bipartite sub-graph-graphs comprising two

disjointed sets of nodes in which each node in

one set is connected to every node in the other

set.

Fig 2 In this figure, a particular domain for

which the positive form is present in three

proteins A, B, and C, and whose negative form is

in four proteins W, X, Y, Z.

Model-I Non-Power Law-II

- We assume that there are n proteins and m domains

with a negative and positive form. - A domains may be any of the 2m types 1, 1-, 2,

2-,....,m, m-. - Each of the n proteins contains each of the 2m

possible domains with constant probability p. - Let Xi be the number of domains that the ith

protein has is distributed binomially - All the Xi are independent and identically

distributed. - Thus, the average number of sites per protein l

2mp.

Model-I Non-Power Law-III

- Let Yi be the number of interactions of the ith

protein. - So the probability that any other protein j will

not connect to i only if it does not contain any

of the x complementary domain aspects. - Since there are n-1 such proteins, we have
- Where q (1-p). Hence, the unconditional

distribution of Yi is a binomial mixture of

binomials

Model-I Non-Power Law-IV

- By Using Inclusion-Exclusion property-type

expression we get - Binomial distribution An experiment with a fixed

number of independent trials, each of which can

only have two possible outcomes. - For example Tossing a coin 20 times to see how

many tails occur. - Inclusion-Exclusion Let A denote a finite set

and let P1, ...,Pn be any given properties. We

want to express the number of elements of A which

have none of these properties in terms of numbers

of elements which have some of these properties.

Log-log plot of the distribution

- f(y) is plotted for n6000 proteins, m1000

domains and - ? 1,2.
- The resulting graph shows clear non-linearity.

Fig Loglog plot of the distribution of vertex

degrees in the modelled interactome with 6000

proteins, 1000 domains and an average of 1 or 2

domains per protein, shown as solid and dotted

lines respectively

Degree distribution of sampled sub-graphs

- A total of 450 proteins were sampled at random.
- The mean number of neighbors for each protein in

this sample was 5. - The resulting graph has approximately the same

number of vertices and edges as the Uetz

datasets.

FigThe Ito and Uetz datasets are plotted in

black and blue, respectively. A straight line

(power law) fit is shown as a dotted line. The

distribution is obtained by sampling from this

model with 6000 proteins, 1000 domains and an

average of 1 domain per protein is plotted in

red.

Degree distribution of sampled sub-graphs

- A total of 1500 proteins were sampled at random.
- The resulting graph shows the fit of this model

to datais better than power law.

- Fig The DIP dataset is plotted in black. The

distribution obtained by sampling from this model

with 6000 proteins, 1000 domains and an average

of 2 domains per protein is plotted in red. A

straight line (power law) fit is shown as a

dotted line.

Conclusions

- The degree distribution predicted by this model

fit the data better than do power law

distribution. - This model fits better to the subnet as compared

to the power law

Example

- This model can be used to infer the existence of

interactions not yet detected experimentally, by

using the predicted bipartite structure of

sub-graphs.

In this figure strongly suggests that o-Raf1,

PLC-, RALGDS, AF-6, RLF and SUR-8 contain a motif

that interacts with a complementary motif in

R-Ras, Rap1A, KRAS2B, RIN, RIBB, N-Ras and H-Ras.

This would imply that for instance RLF and AF-6

should interact with Rap1A and R-Ras in order to

complete the bipartite graph.

Model-II

- The Yeast Protein Interaction Network Evolves

Rapidly and Contains Few Redundant Duplicate

Genes.

Evolution of Function

- Examples
- Partially redundant duplicates
- CLN1/2/3 Involves in regulation of activity of

yeast cyclin dependent kinase. Ks 2.4, over

200 Myr. - TPK1/2/3 Catalytic subunits of yeast cyclic

AMP-dependent protein kinase. Ks 1.31 - Diverged gene function
- EDN vs. ECP EDN has high RNAse activity, act as

antivetroviral agent, - whereas ECP is an antibacterial toxin exertings.
- dopa carboxylase and amd Duplicates are

expressed in different parts of the cell,

therefore having different biological functions.

Objective

- Two main questions are addressed in this model

are - At what rate does functional divergence occur

after gene duplication for a large sample of

duplicated gene in genome? - Which effects have the products of the duplicated

genes in the protein-protein interaction network?

Data for Analysis

- The required information on protein-protein

interaction data comes from a large experiment

(Uetz et al. 2000) using the yeast two-hybrid

system (Field and Song 1989). - 985 proteins, 899 interactions.
- 45 self intearctions.
- Data for duplicated genes were obtained from the

University of Oregon and described by using the

fraction Ks . - Ks is the measure of the similarity between two

genes. - Only those genes pairs were considered for

further analysis whose Ks lt 5 cutoff. - There were such 9,059 pairs among 6,000 genes

with Ks lt 5.

Power Law Random Graphs-I

- PL random graphs are random graphs whose degree

probability distribution P(d) is proportional to

d-t for some constant t. - First, n 6279 isolated nodes were generated,

and a random integer d gt 0 was assinged to these

node. - This random number d was generated in the

following way, - where r is a random real number uniformly

distributed in the interval (0, 1), and g gt 0 ,

is a constant.

Power Law Random Graphs-II

- Second, this number d was accepted with

probability d-t. - The resulting distribution of d is a Power law

with an weighing function. - If d was discarded, a new d was generated

according to same prescription, and this process

was repeated untill a d was accepted - Once d was accepted, it was assigned to the

randomly chosen node.

Power Law Random Graphs-III

- Another node was chosen at random (without

replacement of the previous chosen node), an

integer d was assigned to it in same way, and

this process was repeated untill the sum S of

all the integers assigned to the chosen nodes

first exceed 2k, where k is the number of edges. - The integer assigned to each node correspond to

the nodes degree. - Nodes were connected as per the number of edges

and this was done untill the number of edges is

S/2 k.

Interaction Network vs. Random Graph

- Comparison of protein contact network (n 985

nodes, k 899 edges) with random graphs. - The PPI network has an excess of proteins with

degree 1, but fewer proteins with a higher

degree than the ER Random graph. - Whereas degree distribution of PPI network is

consistent with the Power Law Random graph.

Duplications and Interactions

- This figure illustrate the effect of gene

duplication on gene products involved in protein

interactions.

Divergence of Interactions

- 20 of duplicate gene pairs share an interaction

partner with 0.5 lt Ks lt 1.0, whereas 80 of genes

have no common interaction partner with their

duplicates approximately 100 Myr after

duplication. - Ks gt 2 approaches the value expected for

randomly chosen gene pairs.

- The histogram of the fraction of duplicates genes

whose products have at least one interacting

protein in common as a function of Ks.

200-300 myr

Intercation turn over every 200-300 Myr

Divergence of Interactions

- Only 57 of the most closely related duplicate

gene pairs (0ltKslt.5) for which both genes

interact with other proteins share any protein

interaction partner in the same subnet. - For 380 gene pairs with Ks gt 0.5 the fraction of

duplicate partners with shared interaction is lt

20. - Ks gt 1.5 is close to the random expected value.

The Rate of Interaction Loss

- The divergence in protein interaction after gene

duplication is largely due to interaction loss. - 127 pairs with KS lt 2, where both duplicates

engage in protein-protein interaction network. - 920 interactions were present after duplication.
- 429 of which have been lost since at the rate of

2.3e-3/Myr. - Is this estimate low or high?
- interaction data noise leads to overestimates.
- young pairs and double-losses lead to

underestimates.

Divergence of Self-interactions

- Loss or gain of interactions between a pair of

paralogs due to self-interaction.

Self-Interactions and interactions between

products of duplicate genes.

Divergence of Self-interactions

- Total of 25 paralogs.
- Only few conserved self-interactions was found.
- New interactions
- 13/25 new interactions at the rate of 2.88 x 10-6

/Myr per pair Ks 1 corresponds to 100 Myr.

Conclusions

- Protein-protein interaction network shows a

power-law degree distribution. - Total 6280 ORF in yeast genome with 1.97 x 107

possible pair- wise interactions. - New interactions forming at slow rates/pair, and

evolved at a rate of 2.88x10-6 per protein pair

per million year. - Extrapolating the above estimate to entire yeast

proteome would thus yield (1.97 x 107 x

2.88x10-6) 57 newly evolved interaction per

million years.

Model-III- Cluster Analysis

Detection of Functional Modules from Protein

Interaction Networks of S.cerevisiae.

Cluster Analysis

- CA is an obvious choice of methodology for the

extraction of functional modules from protein

interaction networks. - Clustering is defined as the grouping of objects

based on their sharing discrete, measureable

properties. - In functional genomics, clustering algorithm have

been devised for multiple tasks, such as mRNA

expression analysis and the detection of protein

families. - The aim of this model is to detect biologically

meaningfull patterns in the entire known protein

interaction network of S.cerevisiae.

Clustering Algorithm

- The protein interaction data were obtained from

DIP database. - The network of proteins is first transformed into

a weighted graph. - The weights attributed to each intearaction

reflect the degree of confidence level,

represented by the number of experiments that

support the interactions. - The score of 3.0 was assigned for the first

instance of interaction, and increased by 1 if

the interaction supported by another method or

0.25 if the interaction had already been observed

by that method.

Clustering Algorithm

- The resulting graph is weighted network of

proteins connected by edges. - Now this weighted graph is converted into a line

graph L(G), in which edegs now represent nodes

and nodes represent edges.

Clustering Algorithm

- The scores for the original constituent

interaction are then averaged and assigned to

each edge. - The TribeMCL software, an algorithm for

clustering graph, was used to cluster the

interaction network and recover cluster of

associated interactions. - These clusters range in size from 2 to 292

components (average size is 8.05), and form a

scale-free protein network.

Results

- Total of 1046 clusters were obtained.
- In this analysis, each protein was on average

present in 2.1 clusters. - Only 76 interactions and 146 proteins (represent

only lt 1 of total data), which were weakly

connected to the main interaction network, were

discarded by the clustering method. - The found Clusters were classified in three

categories according to the functional

involvement of proteins in different machanism. - KEGG regulatory and metabolic classifications

(20). - GQFC Genequiz automatic functional

classification (45). - MIPS Cellular localization (48).

Validation of the Clustering Method-I

- Scoring the cluster Cluaters are validated by

assesing the consistency of protein

classification within an individual cluster. - This is measured, for each of three

classifiaction schemes, by calculating the

redundancy of each cluster j - Rj redundancy (Rj) of each cluster j.
- n represents the number of classes in the

classification scheme, - Ps represents the relative frequency of the

class in cluster j, - The numerator represents the information content

in bits given by entropy (H), - The denominator is a normalizing factor

representing the maximum entropy for the cluster

j (Hmax).

Validation of the Clustering Method-II

Fig. Module validation using biological

classification schemes

Validation of the Clustering Method-III

Fig. Module validation using biological

classification schemes

Validation of the Clustering Method-IV

Fig. Module validation using biological

classification schemes

Example-I Cluster 55

- Here, cluster 55 recovers a set of protein

interactions (inset) that are involved in vaculor

transport and fusion from ER via pre- vacuolar

compartment.

Examples-II clusters 32 and 86

- Recovery of signal transduction pathway

controlling cell wall biogenesis, from the

membrane protein (Fks1) to the trancription

factors activated by this pathway (Swi4, Swi6 and

Rlm1).Pathway was recovered as a set of two

clusters connected by two proteins (Pkc1p and

Smd3p), shows one-to-many relationship.

Network of functional modules

- This graph shows the connection between 40

functional modules connected by shared proteins.

Conclusions

- This model can be used to predict poorly

characterized proteins into their functional

context according to their interacting partners

within a module. - The predictve power of this model allows us to

examine the organization and coordination of

multiple complex cellular processes and determine

how they are organized into pathways. - One-to-many relationship can be used for pathway

discovery.

References

- On the structure of proteinprotein interaction

Networks A. Thomas, R. Cannings, N.A.M. Monk, and

C. Cannings. Biochemical Society Transactions

(2003) Volume 31, part 6. - The Yeast Protein Interaction Network Evolves

Rapidly and Contains Few Redundant Duplicate

Genes. Andreas Wagner Mol. Biol. Evol.

18(7)12831292. 2001. - Detection of Functional Modules From Protein

Interaction Networks Jose B. Pereira-Leal,1

Anton J. Enright,2 and Christos A. Ouzounis1

PROTEINS Structure, Function, and Bioinformatics

544957 (2004).