Rich Probabilistic Methods for Gene Expression - PowerPoint PPT Presentation

About This Presentation
Title:

Rich Probabilistic Methods for Gene Expression

Description:

Automatically trades off fit to data (likelihood of data) with model complexity ... Handling time. Handling sequence data (TFs) Incorporate structure information ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 28
Provided by: get73
Learn more at: http://ai.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Rich Probabilistic Methods for Gene Expression


1
Rich Probabilistic Methodsfor Gene Expression
Eran Segal Ben Taskar Audrey Gasch Nir Friedman
Daphne Koller
2
Outline
  • Motivation for richer models
  • PRMs for gene expression Modeling Learning Inf
    erence
  • Results Synthetic Stress Compendium

3
One Sided Clustering
  • Non-Parametric Clustering
  • Hierarchical Agglomerative
  • SVD
  • K-means
  • Parametric Clustering
  • Probabilistic Clustering

Autoclass using expression levels
Gene-cluster
Level-1
Level-n
Level-2
experiments
4
One Sided Clustering
Experiments
Undetected Separability
Cluster 1
Cluster 2
Undetected Similarity
Cluster 3
Genes
Cluster 4
Cluster 5
Cluster 6
5
Basic Bi-Clustering
Experiments
Detected Separability
C1
C2
C3
C4
C5
C6
Undetected Similarity
C7
C8
C9
Genes
C11
C10
C12
C14
C13
C15
C17
C16
C18
6
Desired Clustering
  • Allow for non-grid clusters
  • Rows no longer correspond to genes (similarly
    for columns)

Experiments
Detected Separability
C1
C2
C3
C4
C5
Detected Similarity
C6
C7
Genes
C8
C9
C11
C10
C12
C14
C13
C15
7
Basic Bi-Clustering
Clust(gene2)
Clust(exp2)
Clust(gene1)
Clust(exp1)
G1-E2
G1-E1
G2-E2
G2-E1
Two-sided clustering (PLSA, Hoffman)
8
Outline
  • Motivation for richer models
  • PRMs for gene expression Modeling Learning Inf
    erence
  • Results Synthetic Stress Compendium

9
PRMs Basic Bi-Clustering
Classes of objects
Gene
Experiment
Gene-cluster
Exp. cluster
Expression
Level
Compact representation of two-sided clustering
10
PRMs Relational Schema
  • Describes the types of objects and relations in
    the database

Gene
Experiment
Mutation
Cluster
Cluster
Binding Sites
Exp. Attributes
Functional Classes
Expression
Exp. Level
11
PRM for Compendium Data
  • Parameters for nodes
  • Structure over gene features

Gene
Array/Mutated Gene
GCluster
GCluster (of mutated gene)
GCN4
HSF
Lipid (of mutated gene)
ACluster
Lipid
Endoplasmatic
Expression
Level
12
Resulting Bayesian Network
  • 3 Genes, 2 Mutation Experiment

Lipid
Lipid
ACluster1
ACluster1
GCluster1
Endoplasmatic
E1,2
E1,1
GCluster2
Endoplasmatic
E2,1
E2,2
GCluster2
Endoplasmatic
E3,1
E3,2
13
PRM Learning
Data
Gene
Experiment
Gene-cluster
Exp. cluster
Learner
Expression
Level
Expert knowledge
  • PRM models can be learned from empirical data
  • parameter estimation
  • structure learning learning the dependency
    structure
  • Can learn with missing data hidden variables

14
PRM Learning
  • Goal Find PRM structure that explains the data
    well
  • Define scoring function to evaluate models
  • Bayesian Score works bestScore (SD) log
    P(D S) P(S)
  • Automatically trades off fit to data (likelihood
    of data) with model complexity
  • Do heuristic search to find high-scoring
    structure
  • Structure found is not necessarily best one

Marginal likelihood
Prior
15
Learning PRMs
  • Parameter Estimation EM Approximate Inference
    for E-Step
  • Structure Learning Complete Data Learning Tree
    splits Avoiding Local Maxima
  • Structure Learning Incomplete Data Iterate
    until convergence (Hard SEM) EM Hard
    assignment Structure Learning

16
Context Specific Dependencies
GCluster 0 (of gene)
true
false
GCluster 3(of mutant)
. . .
false
true
HSF gt 2
ACluster 4
false
true
false
true
Endoplasmatic
Level
Level
. . .
true
false
Level
Level
17
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Experiments
Genes
18
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Experiments
Genes
19
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Gene Similarity
Experiments
Genes
20
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Experiment Similarity
Experiments
Genes
21
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Separability by TF
Experiments
Genes
22
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Attribute Dependencies
Experiments
InduceCluster Change
Genes
23
Learning Process
Gene
Array/Mutated Gene
GCluster
HSF
GCN4
Lipid (of mutated gene)
GCluster (of mutated gene)
Lipid
Endoplasmatic
ACluster
Expression
Level
Achieved Desired Clustering
Experiments
Genes
24
Outline
  • Motivation for richer models
  • PRMs for gene expression Modeling Learning Inf
    erence
  • Results Synthetic Stress Compendium

25
Synthetic Data Recovering Structure
  • Synthetic data 1000 genes, 90 arrays (12 types)
  • Parents recovered Simulated data 84.5 /-
    2.5 Permuted data 56 /- 2.5
  • Cluster recovery Simulated data PRMs
    98.4 /- 1.07 Naïve Bayes 90.8
    /- 0.42 Permuted data PRMs 88.1
    /- 1.52 Naïve Bayes 76.7 /-
    1.42

26
Stress Data
  • 954 genes, 88 arrays (12 types)
  • Structure learning 15 significant TFs 7
    significant function categories
  • Cluster coherence Average variance reduction
    0.69 -gt 0.61 in 3 iterations
  • Allowing annotation changes Average variance
    reduction 0.69 -gt 0.56 in 3 iterations

27
Fragment of PRM for Yeast Stress Data (Gasch al)
Gene
GCluster
Array
Carbon
AAM
Condition
Mig1
Expression
Level
28
Result Context-Specific Groupings
  • A grouping is a set of genes that behave the same
    within a certain context a condition or a set
    of conditions
  • Breakdown of genes into clusters is different in
    different contexts

Yeast Stress Data (Gasch al)
29
Example Biological Result
  • Discovered grouping of 17 genes
  • all induced in diauxic shift
  • all have ? 2 binding sites for Mig1 transcription
    factor
  • many not known to have been regulated by Mig1
  • Context-sensitive groupings were key to
    identifying cluster

30
Compendium Data Results
  • Figure out array cluster of particular gene
    mutation before performing the experiment
  • Can hope to do this because
  • array cluster depends on gene cluster
  • gene cluster predicted based on behavior in other
    arrays

1
44 arrays predictedat 95 accuracy
0.8
0.6
Correct predictions
Accuracy / Predicted
Total predicted
0.4
0.2
0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Prediction confidence
31
Future Directions
  • Handling time
  • Handling sequence data (TFs)
  • Incorporate structure information
  • Discovering pathways
Write a Comment
User Comments (0)
About PowerShow.com