Title: Use of Logic Relationships to Decipher Protein Network Organization Peter M. Bowers, Shawn J. Cokus, David Eisenberg, Todd O. Yeates
1Use of Logic Relationshipsto Decipher
ProteinNetwork OrganizationPeter M. Bowers,
Shawn J. Cokus,David Eisenberg, Todd O. Yeates
- Presented by Krishna Balasubramanian
2Contents
- Introduction
- Background
- Method Used - LAPP
- Results
- Observations
- Conclusion
- Future Work
3Introduction
- Major focus of genome research
- Deciphering networks of molecular interactions
underlying cellular function. - Developed a Computational approach
- Identify detailed relationships btw proteins
based on genomic data. - The method reveals many previously unidentified
higher order relationships
4Background
- Patterns across multiple complete genomes have
been used to infer biological interactions and
functional linkages btw proteins - 2 distinct proteins from one organism genetically
fused into a single protein in another organism. - Tendency of 2 proteins to occur in chromosomal
proximity across multiple organisms. - Phylogenetic profile approach
- Detects functional relationships btw proteins
exhibiting statistically similar patterns of
presence or absence. - Determine pattern describing a proteins presence
or absence by searching for its homologs across N
organisms.
5Background
- Original implementations sought to infer links
btw pairs of proteins with similar profiles. - A subsequent variation on that idea linked
proteins if their profiles represented the
negation of each other. - Simple notions - with the presence of one protein
implying the presence or absence of another. - Such simple relationships cannot adequately
describe the full complexity of cellular networks
that involve branching, parallel, and alternate
pathways. - Higher order logic relationships involving a
pattern of presence/absence of multiple proteins
expected due to - Observed complexity of cellular networks.
- Evolutionary divergence, convergence, and
horizontal transfer events.
6Method - LAPP
- Perform complete analysis of logic relations
possible btw triplets of phylogenetic profiles. - Demonstrate the power of the resulting logic
analysis of phylogenetic profiles (LAPP) to - Illuminate relationships among multiple proteins.
- Infer the coarse function of large numbers of
uncharacterized protein families.
7 - Logical Relationships to determine
presence/absence of Proteins - Venn diagrams and logic statements show the 8
distinct kinds of logic functions that describe
the possible dependence of the presence of on the
presence of A and B, jointly. - Logic functions are grouped together if they are
related by a simple exchange of proteins A and B.
8Logical Relationships to determine
presence/absence of Proteins
- There are 8 possible logic relationships
combining two phylogenetic profiles to match a
third profile. - E.g. 1 protein C might be present if and only if
proteins A and B are both present. - Function of protein C is necessary only when the
functions of proteins A and B are both present. - Gene C may be present if and only if either A or
B is present. - Different organisms use two different protein
families in combination with a common third
protein to accomplish some task. - Several of the eight possible logic relationships
intuitively understood to describe commonly
observed biological scenarios. - However, a few of the logic relationships are not
easily related to real biological situations.
9Examples of LAPP based on Phylogenetic
Profiles Phylogenetic Profiles Biological
examples of LAPP
10Examples of LAPP .. Contd
- Hypothetical phylogenetic profiles are used to
illustrate the eight possible logic functions. - Real biological e.g. shown to illustrate the
ternary relationships identified from actual
phylogenetic profiles for the 4 most commonly
observed logic types.
11Identifying Protein Triplets
- Created a set of binary-valued vectors describing
the presence or absence of each of the known
protein families across 67 fully sequenced
organisms. - Categorized complete set of proteins into 4873
distinct families called clusters of orthologous
groups (COGs). - Examined all triplet combinations of profiles and
rank-ordered them according to how well the
logical combination f (a,b) of two profiles
predicted a third profile, c. - Neither profile a nor b alone was predictive of c.
12Identifying Protein Triplets
- Uncertainty Coefficients calculated for U(ca),
U(cb), and the logically combined profile U(cf
(a,b)) - U(xy) H(x) H(y) H(x, y)/H(x)
- H is the entropy of individual/joint
distributions - U can range between 1.0, where x is a
deterministic function of y, and 0.0, where x is
completely independent of y. - Selected triplets whose individual pairwise
uncertainty scores described protein profile c
poorly U(ca) lt 0.3 and U(cb) lt 0.3 but whose
logically combined profile U(cf (a,b)) gt 0.6
described c well.
13Example
- Synthesis of aromatic amino acids proceeds
through the shikimate pathway. - Logic analysis of 5 participating proteins show
- Shikimate can be converted to the end product
prephenate by one of two possible routes, leading
to a type 7 logic relationship.
Example showing triplet andpairwise
uncertaintycoefficients, U.
14Results
- When either one shikimate kinase protein family
(protein A, COG1685) or an alternate shikimate
kinase protein family (protein B, COG0703) is
present in an organism, then excitatory
postsynaptic potential (EPSP) synthase must also
be present (protein C, COG0128) (U 0 0.85) to
carry out the subsequent enzymatic step. - The same type 7 logic relationship is also
observed between alternate shikimate kinase
enzymes and the successive chorismate synthase
(protein D, COG0082) and chorismate mutase
(protein E, COG1605) enzymatic steps of the
pathway. - The ordering of the metabolic steps that follow
shikimate kinase is predicted by the value of
successive U coefficients, where EPSP synthase
(second step, U 0 0.85) is most strongly linked
to shikimate kinase, followed directly by the
chorismate synthase (third step, U 0 0.66) and
lastly by chorismate mutase (fourth step, U 0
0.56).
15Results Contd
- Organisms synthesize chorismate and prephenate
from shikimate with the use of only one of two
possible alternate routes pathways consisting of
either ordered enzymes A-C-D-E or enzymes
B-C-D-E. - LAPP recovers 750,000 previously unknown
relationships among protein families
(U(c(f(a,b)) gt 0.60 U(cb) lt 0.30 U(ca) lt
0.30). - Validity assessed by comparing known annotations
of the linked proteins. - The ability to recover links between proteins
annotated as belonging to a major functional
category has been used widely to corroborate
computational inferences of protein interactions.
16Observations
- One of the most frequently observed triplet
relationships relates three proteins belonging to
the cell motility category, confirmation that the
triplet associations link proteins closely
related in function. - Other triplets involve two proteins from the
motility category and a third protein of another
COG category, producing recognizable horizontal
and vertical bands in the histogram. - E.g. the category combinations NNU (COG category
U, intracellular trafficking and secretion) and
NNS (COG category S, unknown function) are also
plentiful. - Connections between these categories make
intuitive sense and facilitate placement of
unannotated proteins within the context of
specific cellular networks of interacting
proteins.
Section taken from a 3-D histogram that describes
the frequency of observed logic relationships in
which protein A of the triplet is annotated as
belonging to the COG functional category N, cell
motility.
17Observations
- LAPP leads to a set of statistically significant
ternary relationships that are distinct from and
more numerous than the ones inferred using
traditional pairwise analysis. - Matrix of randomized phylogenetic profiles,
containing the same individual and pairwise
distributions as the native profiles used to
assess the probability of observing a given
uncertainty coefficient score by chance. - Triplets with U gt 0.60 are observed from the
unshuffled vectors 102 times more frequently
than from shuffled profiles and 104 more
frequently when U gt 0.80.
Plot of the cumulative number of protein triplets
recovered atan uncertainty coefficient score
greater than a given threshold.
18Observations Contd
- P value calculated for each triplet relationship
by enumerating all possible values of U that
could be obtained from shuffled profiles while
maintaining the individual and pairwise
distributions. - P number of trials that exceed the observed
value of U divided by the total number of trials. - More than 98 of the identified triplets (U gt
0.6) have P lt 0.05, and more than 75 of the
identified triplets have P lt 0.005.
19Observations
- The 8 distinct logic types occur with widely
varying frequencies within the set of significant
ternary relationships. - Consistent with our understanding of evolution
biological relationships. - Logic types 1, 3, 5, and 7 are observed
frequently in the biological data. - Logic types 2, 4, and 8 are more difficult to
relate to simple cellular logic and are observed
only rarely.
Number of identified triplets (U gt0.6) for each
of the eight logic function types for
randomized(black) and real (gray) phylogenetic
profiles.
20Observations
50 highest scoring relationships (U gt 0.75)
involving proteins fromthe cell motility and
intracellular trafficking and secretion
functionalcategories.
21Observations contd
- Cell motility proteins are colored light blue,
intracellular trafficking and secretion are
colored magenta, and proteins annotated as both
are colored in orange. - Edges are shown between proteins A-C and B-C of
each logic triplet, with each edge labeled
according to the logic function type used to
associate the proteins families.
22Observations contd
- The proteins linked include adhesin proteins
necessary for bacterial pathogenesis, chemotaxis
proteins, and translocase proteins. - Network contains previously unknown interactions
that suggest mechanisms connecting bacterial
pathogenesis and chemotaxis. - CheZ, a chemotaxis dephosphorylase that regulates
cell motility, is linked to the surface receptor
and virulence factors adhesin AidA and Flp
pilus-associated FimT.
23Conclusion
- New higher order protein associations detected by
LAPP provides a framework to understand the
complex logical dependencies that relate proteins
to one another in the cell. - Also useful in
- Modeling and engineering biological systems
- Generating biological hypotheses for
experimentation - Investigating additional protein properties
24Future Work
- In all likelihood, logic relationships btw
proteins in the cell extend beyond ternary
relationships to include much larger sets of
proteins. - Ideas underlying the logical analysis of
phylogenetic profiles can be extended to the
investigation of other kinds of genomic data - Gene expression,
- Nucleotide polymorphism
- Phenotype data
25Questions??