Integration of FullCoverage Probabilistic Functional Networks with Relevance to Specific Biological - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Integration of FullCoverage Probabilistic Functional Networks with Relevance to Specific Biological

Description:

Integration of Full-Coverage Probabilistic Functional Networks with Relevance to ... Bring together data from a wide range of ... Ribosomal biogenesis in yeast ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 26
Provided by: csManch
Category:

less

Transcript and Presenter's Notes

Title: Integration of FullCoverage Probabilistic Functional Networks with Relevance to Specific Biological


1
Integration of Full-Coverage Probabilistic
Functional Networks with Relevance to Specific
Biological Processes
  • James, K., Wipat, A. Hallinan, J.
  • School of Computing Science, Newcastle University
  • Data Integration in the Life Sciences 2009

2
Integrated functional networks
  • Bring together data from a wide range of sources
  • High throughput data is
  • Large (one node per gene multiple interactions
    per node)
  • noisy (FP 20 90)
  • Incomplete (to different extents)
  • Assess quality of each dataset against a Gold
    Standard
  • Weighted edges reflect sum probability that edge
    actually exists
  • Network can be thresholded to draw attention to
    most probable edges
  • Suitable for manual (interactive) or
    computational analysis

3
Dataset bias
  • Different experiment types provide different
    types of information
  • Overlap between datasets usually low
  • 1 of synthetic lethal pairs physically interact
  • Genes involved in the same process may be
    transcribed together
  • Ribosomal biogenesis in yeast
  • Some types of interaction may provide more
    information about a particular biological process
  • Complex formation Y2H
  • Signal transduction phosphorylation

4
Bias in HTP datasets
From Myers and Troyanskaya, Bioinformatics 2007.
5
Bias Relevance
  • Most network analyses are related to a Process of
    Interest (PoI)
  • PFINs tend to be very large
  • Interactions with equal probability will have
    different utility
  • Several attempts to eliminate bias
  • Loss of data
  • We aim to use bias
  • Relevance

6
Hypothesis
  • Functional annotations can be applied to
    probabilistic integrated functional networks to
    identify interactions relevant to a biological
    process of interest

7
Network integration
8
Network integration
9
Effect of D value
10
Relevance scoring
  • GO annotations
  • One-tailed Fishers exact test to score
    over-representation of genes related to POI
  • POI term of interest plus any descendants except
    inferred from electronic annotation
  • Control network integrated in order of confidence
  • Relevance network integrated in order of
    relevance
  • We use Lee et al. (2004), but method can be
    applied to any network, any data integration
    algorithm

11
Relevance scoring
12
Data sets
  • Saccharomyces cerevisiae data from BioGRID v.38
  • Split by PMID, duplicates removed
  • Datasets gt 100 interactions treated individually
  • 50 data sets, max 14,421 interactions
  • Datasets lt 100 grouped by BioGRID Experimental
    categories
  • 22 data sets, min 33 interactions
  • Gene Ontology terms
  • Telomere Maintenance (GO0000723)
  • Ageing (GO0007568)

13
Choice of D value
  • GO annotations
  • Assign function to nodes based on annotation of
    neighbour with highest weighted edge
  • Leave-one-out on full network
  • Construct Receiver Operating Characteristic (ROC)
    curve
  • Area Under Curve (AUC)
  • SE(W) using Wilcoxon statistic

14
Classifier output
15
ROC Curves
16
D value
17
D value
18
Ranking
19
Results
20
Evaluation - Clustering
  • MCL Markov-based clustering algorithm
  • Considers network topology and edge weights

21
Results
22
Cluster annotation
23
Conclusions
  • Function assignment is statistically
    significantly better, but probably not
    practically useful
  • Simplistic algorithm
  • Dependant upon existing annotation
  • Clustering
  • Fewer, larger clusters
  • Clusters draw together genes of interest
  • Different GO terms perform differently
  • Relevance networks are better for interactive
    exploration
  • Related PoIs

24
Future work
  • Which GO terms work best with relevance?
  • Why?
  • Further exploration of experimental types and
    relevance
  • Implement algorithms in Ondex
  • Optimize function assignment / clustering
    algorithms
  • Extend technique to edges

25
Acknowledgements
  • Centre for Integrated Systems Biology of Ageing
    and Nutrition (CISBAN)
  • Newcastle Systems Biology Resource Centre
  • Research Councils of the UK
  • BBSRC SABR Ondex Project
  • Integrative Bioinformatics Research Group
Write a Comment
User Comments (0)
About PowerShow.com