Use of Logic Relationships to Decipher Protein Network Organization Peter M. Bowers, Shawn J. Cokus, David Eisenberg, Todd O. Yeates - PowerPoint PPT Presentation

About This Presentation
Title:

Use of Logic Relationships to Decipher Protein Network Organization Peter M. Bowers, Shawn J. Cokus, David Eisenberg, Todd O. Yeates

Description:

Presented by Krishna Balasubramanian. 2. Contents. Introduction. Background. Method Used - LAPP ... Deciphering networks of molecular interactions underlying ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 26
Provided by: KRIS301
Category:

less

Transcript and Presenter's Notes

Title: Use of Logic Relationships to Decipher Protein Network Organization Peter M. Bowers, Shawn J. Cokus, David Eisenberg, Todd O. Yeates


1
Use of Logic Relationshipsto Decipher
ProteinNetwork OrganizationPeter M. Bowers,
Shawn J. Cokus,David Eisenberg, Todd O. Yeates
  • Presented by Krishna Balasubramanian

2
Contents
  • Introduction
  • Background
  • Method Used - LAPP
  • Results
  • Observations
  • Conclusion
  • Future Work

3
Introduction
  • Major focus of genome research
  • Deciphering networks of molecular interactions
    underlying cellular function.
  • Developed a Computational approach
  • Identify detailed relationships btw proteins
    based on genomic data.
  • The method reveals many previously unidentified
    higher order relationships

4
Background
  • Patterns across multiple complete genomes have
    been used to infer biological interactions and
    functional linkages btw proteins
  • 2 distinct proteins from one organism genetically
    fused into a single protein in another organism.
  • Tendency of 2 proteins to occur in chromosomal
    proximity across multiple organisms.
  • Phylogenetic profile approach
  • Detects functional relationships btw proteins
    exhibiting statistically similar patterns of
    presence or absence.
  • Determine pattern describing a proteins presence
    or absence by searching for its homologs across N
    organisms.

5
Background
  • Original implementations sought to infer links
    btw pairs of proteins with similar profiles.
  • A subsequent variation on that idea linked
    proteins if their profiles represented the
    negation of each other.
  • Simple notions - with the presence of one protein
    implying the presence or absence of another.
  • Such simple relationships cannot adequately
    describe the full complexity of cellular networks
    that involve branching, parallel, and alternate
    pathways.
  • Higher order logic relationships involving a
    pattern of presence/absence of multiple proteins
    expected due to
  • Observed complexity of cellular networks.
  • Evolutionary divergence, convergence, and
    horizontal transfer events.

6
Method - LAPP
  • Perform complete analysis of logic relations
    possible btw triplets of phylogenetic profiles.
  • Demonstrate the power of the resulting logic
    analysis of phylogenetic profiles (LAPP) to
  • Illuminate relationships among multiple proteins.
  • Infer the coarse function of large numbers of
    uncharacterized protein families.

7
  • Logical Relationships to determine
    presence/absence of Proteins
  • Venn diagrams and logic statements show the 8
    distinct kinds of logic functions that describe
    the possible dependence of the presence of on the
    presence of A and B, jointly.
  • Logic functions are grouped together if they are
    related by a simple exchange of proteins A and B.

8
Logical Relationships to determine
presence/absence of Proteins
  • There are 8 possible logic relationships
    combining two phylogenetic profiles to match a
    third profile.
  • E.g. 1 protein C might be present if and only if
    proteins A and B are both present.
  • Function of protein C is necessary only when the
    functions of proteins A and B are both present.
  • Gene C may be present if and only if either A or
    B is present.
  • Different organisms use two different protein
    families in combination with a common third
    protein to accomplish some task.
  • Several of the eight possible logic relationships
    intuitively understood to describe commonly
    observed biological scenarios.
  • However, a few of the logic relationships are not
    easily related to real biological situations.

9
Examples of LAPP based on Phylogenetic
Profiles Phylogenetic Profiles Biological
examples of LAPP
10
Examples of LAPP .. Contd
  • Hypothetical phylogenetic profiles are used to
    illustrate the eight possible logic functions.
  • Real biological e.g. shown to illustrate the
    ternary relationships identified from actual
    phylogenetic profiles for the 4 most commonly
    observed logic types.

11
Identifying Protein Triplets
  • Created a set of binary-valued vectors describing
    the presence or absence of each of the known
    protein families across 67 fully sequenced
    organisms.
  • Categorized complete set of proteins into 4873
    distinct families called clusters of orthologous
    groups (COGs).
  • Examined all triplet combinations of profiles and
    rank-ordered them according to how well the
    logical combination f (a,b) of two profiles
    predicted a third profile, c.
  • Neither profile a nor b alone was predictive of c.

12
Identifying Protein Triplets
  • Uncertainty Coefficients calculated for U(ca),
    U(cb), and the logically combined profile U(cf
    (a,b))
  • U(xy) H(x) H(y) H(x, y)/H(x)
  • H is the entropy of individual/joint
    distributions
  • U can range between 1.0, where x is a
    deterministic function of y, and 0.0, where x is
    completely independent of y.
  • Selected triplets whose individual pairwise
    uncertainty scores described protein profile c
    poorly U(ca) lt 0.3 and U(cb) lt 0.3 but whose
    logically combined profile U(cf (a,b)) gt 0.6
    described c well.

13
Example
  • Synthesis of aromatic amino acids proceeds
    through the shikimate pathway.
  • Logic analysis of 5 participating proteins show
  • Shikimate can be converted to the end product
    prephenate by one of two possible routes, leading
    to a type 7 logic relationship.

Example showing triplet andpairwise
uncertaintycoefficients, U.
14
Results
  • When either one shikimate kinase protein family
    (protein A, COG1685) or an alternate shikimate
    kinase protein family (protein B, COG0703) is
    present in an organism, then excitatory
    postsynaptic potential (EPSP) synthase must also
    be present (protein C, COG0128) (U 0 0.85) to
    carry out the subsequent enzymatic step.
  • The same type 7 logic relationship is also
    observed between alternate shikimate kinase
    enzymes and the successive chorismate synthase
    (protein D, COG0082) and chorismate mutase
    (protein E, COG1605) enzymatic steps of the
    pathway.
  • The ordering of the metabolic steps that follow
    shikimate kinase is predicted by the value of
    successive U coefficients, where EPSP synthase
    (second step, U 0 0.85) is most strongly linked
    to shikimate kinase, followed directly by the
    chorismate synthase (third step, U 0 0.66) and
    lastly by chorismate mutase (fourth step, U 0
    0.56).

15
Results Contd
  • Organisms synthesize chorismate and prephenate
    from shikimate with the use of only one of two
    possible alternate routes pathways consisting of
    either ordered enzymes A-C-D-E or enzymes
    B-C-D-E.
  • LAPP recovers 750,000 previously unknown
    relationships among protein families
    (U(c(f(a,b)) gt 0.60 U(cb) lt 0.30 U(ca) lt
    0.30).
  • Validity assessed by comparing known annotations
    of the linked proteins.
  • The ability to recover links between proteins
    annotated as belonging to a major functional
    category has been used widely to corroborate
    computational inferences of protein interactions.

16
Observations
  • One of the most frequently observed triplet
    relationships relates three proteins belonging to
    the cell motility category, confirmation that the
    triplet associations link proteins closely
    related in function.
  • Other triplets involve two proteins from the
    motility category and a third protein of another
    COG category, producing recognizable horizontal
    and vertical bands in the histogram.
  • E.g. the category combinations NNU (COG category
    U, intracellular trafficking and secretion) and
    NNS (COG category S, unknown function) are also
    plentiful.
  • Connections between these categories make
    intuitive sense and facilitate placement of
    unannotated proteins within the context of
    specific cellular networks of interacting
    proteins.

Section taken from a 3-D histogram that describes
the frequency of observed logic relationships in
which protein A of the triplet is annotated as
belonging to the COG functional category N, cell
motility.
17
Observations
  • LAPP leads to a set of statistically significant
    ternary relationships that are distinct from and
    more numerous than the ones inferred using
    traditional pairwise analysis.
  • Matrix of randomized phylogenetic profiles,
    containing the same individual and pairwise
    distributions as the native profiles used to
    assess the probability of observing a given
    uncertainty coefficient score by chance.
  • Triplets with U gt 0.60 are observed from the
    unshuffled vectors 102 times more frequently
    than from shuffled profiles and 104 more
    frequently when U gt 0.80.

Plot of the cumulative number of protein triplets
recovered atan uncertainty coefficient score
greater than a given threshold.
18
Observations Contd
  • P value calculated for each triplet relationship
    by enumerating all possible values of U that
    could be obtained from shuffled profiles while
    maintaining the individual and pairwise
    distributions.
  • P number of trials that exceed the observed
    value of U divided by the total number of trials.
  • More than 98 of the identified triplets (U gt
    0.6) have P lt 0.05, and more than 75 of the
    identified triplets have P lt 0.005.

19
Observations
  • The 8 distinct logic types occur with widely
    varying frequencies within the set of significant
    ternary relationships.
  • Consistent with our understanding of evolution
    biological relationships.
  • Logic types 1, 3, 5, and 7 are observed
    frequently in the biological data.
  • Logic types 2, 4, and 8 are more difficult to
    relate to simple cellular logic and are observed
    only rarely.

Number of identified triplets (U gt0.6) for each
of the eight logic function types for
randomized(black) and real (gray) phylogenetic
profiles.
20
Observations
50 highest scoring relationships (U gt 0.75)
involving proteins fromthe cell motility and
intracellular trafficking and secretion
functionalcategories.
21
Observations contd
  • Cell motility proteins are colored light blue,
    intracellular trafficking and secretion are
    colored magenta, and proteins annotated as both
    are colored in orange.
  • Edges are shown between proteins A-C and B-C of
    each logic triplet, with each edge labeled
    according to the logic function type used to
    associate the proteins families.

22
Observations contd
  • The proteins linked include adhesin proteins
    necessary for bacterial pathogenesis, chemotaxis
    proteins, and translocase proteins.
  • Network contains previously unknown interactions
    that suggest mechanisms connecting bacterial
    pathogenesis and chemotaxis.
  • CheZ, a chemotaxis dephosphorylase that regulates
    cell motility, is linked to the surface receptor
    and virulence factors adhesin AidA and Flp
    pilus-associated FimT.

23
Conclusion
  • New higher order protein associations detected by
    LAPP provides a framework to understand the
    complex logical dependencies that relate proteins
    to one another in the cell.
  • Also useful in
  • Modeling and engineering biological systems
  • Generating biological hypotheses for
    experimentation
  • Investigating additional protein properties

24
Future Work
  • In all likelihood, logic relationships btw
    proteins in the cell extend beyond ternary
    relationships to include much larger sets of
    proteins.
  • Ideas underlying the logical analysis of
    phylogenetic profiles can be extended to the
    investigation of other kinds of genomic data
  • Gene expression,
  • Nucleotide polymorphism
  • Phenotype data

25
Questions??
Write a Comment
User Comments (0)
About PowerShow.com