Probabilistic Sparse Matrix Factorization presentation

About This Presentation

Transcript and Presenter's Notes

Title: Probabilistic Sparse Matrix Factorization

1
Probabilistic Sparse Matrix Factorization

Delbert Dueck, Quaid Morris, Brendan
Frey(Probabilistic Statistical Inference
Group)
Tim Hughes(Banting and Best Department of
Medical Research)

2
Objective

Patterns in gene expression array data can be
used to help understand gene regulation and
predict the function of yet-uncharacterized genes
Objective To develop a method of probabilistic
sparse matrix factorization (PSMF) and apply it
to gene expression data to learn the hidden
structure underlying the data.

3
Biological Background

Genes encode basic information about an organism
They tend to be highly expressed in tissues
related to their functional role
Mouse gene expression data is from Zhang, Morris,
et al. (2004)
Gene expression is influenced by the presence of
transcription factors (TFs)
Co-expressed genes are likely activated by the
same TFs
The activity of each gene can be explained by the
activities of a small number of transcription
factors

4
Gene Expression Array Dataset
? G22709 genes ?
Entire data set X GT matrix (G22709, T55)
? 100 genes ?
? T55 tissues ?
T55tissues
5
Sparse Matrix Factorization

Gene expression data model
Each genes expression profile (xg) is
a linear combination (weighted by ygc, c?sg)
of a small number (rgltN)
of C possible transcription factor profiles (zc,
c?sg)

6
Sparse Matrix Factorization
Matrix format (entire dataset)
7
Probabilistic Sparse Matrix Factorization

To express as a distribution, assume
varying levels of Gaussian noise in the data
nothing about transcription factor weights
normally-distributed transcription factor
profiles
uniformly-distributed factor assignments
multinomially-distributed factor counts

8
Probabilistic Sparse Matrix Factorization

To express as a distribution, assume
varying levels of Gaussian noise in the data
nothing about transcription factor weights
normally-distributed transcription factor
profiles
uniformly-distributed factor assignments
multinomially-distributed factor counts
Multiply together to get joint distribution

9
Factorized Variational Inference

Exact inference is intractable with P()

10
Factorized Variational Inference

Exact inference is intractable with P()
Approximate it by a simpler distribution, Q(),
and perform inference on that

11
Factorized Variational Inference

Parameterize Q()
Accounts for noise in transcription factor
profiles and uncertainty in transcription factor
selection

12
Factorized Variational Inference

Parameterize Q()
Accounts for noise in transcription factor
profiles and uncertainty in transcription factor
selection
Minimize KL-divergence between P(), Q()

13
Factorized Variational Inference

Parameterize Q()
Accounts for noise in transcription factor
profiles and uncertainty in transcription factor
selection
Minimize KL-divergence between P(), Q()

14
Variational EM algorithm

Use coordinate descent on free energy

15
Variational EM Free Energy
iteration
16
Visualization
PROBABILISTIC SPARSE MATRIX FACTORIZATION C50
possible factors N3 factors per gene (max)
P(rg).55 .27 .18
Sorted by primary transcription factor (sg1)
17
Results p-value histograms

Genes can be partitioned into primary
categories (i.e. same sg1 value), secondary
classes, etc.
Compare classes with annotated gene ontology
(GO-BP) categories for statistical significance

18
Results mean log10 p-values
19
Results count of significant p-values
20
Future Directions different Q()
Iterated conditional modes (point estimates)
21
Summary

Introduced probabilistic sparse matrix
factorization (PSMF), each row is a linear
combination of a small number of hidden factors
selected from a larger set.
Described a variational inference algorithm for
fitting the PSMF model.
Evaluated model on a gene functional prediction
task.

Write a Comment

User Comments (0)

About PowerShow.com

Probabilistic Sparse Matrix Factorization PowerPoint PPT Presentation