Probabilistic Sparse Matrix Factorization - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Probabilistic Sparse Matrix Factorization

Description:

xg (g=10056), a row vector of length T=55. Sparse Matrix Factorization ... Compare classes with annotated gene ontology (GO-BP) categories for statistical significance ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 17
Provided by: delb151
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Sparse Matrix Factorization


1
Probabilistic Sparse Matrix Factorization
  • Delbert Dueck, Quaid Morris, Brendan
    Frey(Probabilistic Statistical Inference
    Group)
  • Tim Hughes(Banting and Best Department of
    Medical Research)

2
Objective
  • Patterns in gene expression array data can be
    used to help understand gene regulation and
    predict the function of yet-uncharacterized genes
  • Objective To develop a method of probabilistic
    sparse matrix factorization (PSMF) and apply it
    to gene expression data to learn the hidden
    structure underlying the data.

3
Biological Background
  • Genes encode basic information about an organism
  • They tend to be highly expressed in tissues
    related to their functional role
  • Mouse gene expression data is from Zhang, Morris,
    et al. (2004)
  • Gene expression is influenced by the presence of
    transcription factors (TFs)
  • Co-expressed genes are likely activated by the
    same TFs
  • The activity of each gene can be explained by the
    activities of a small number of transcription
    factors

4
Gene Expression Array Dataset
? G22709 genes ?
Entire data set X GT matrix (G22709, T55)
? 100 genes ?
? T55 tissues ?
T55tissues
5
Sparse Matrix Factorization
  • Gene expression data model
  • Each genes expression profile (xg) is
  • a linear combination (weighted by ygc, c?sg)
  • of a small number (rgltN)
  • of C possible transcription factor profiles (zc,
    c?sg)

6
Sparse Matrix Factorization
Matrix format (entire dataset)
7
Probabilistic Sparse Matrix Factorization
  • To express as a distribution, assume
  • varying levels of Gaussian noise in the data
  • nothing about transcription factor weights
  • normally-distributed transcription factor
    profiles
  • uniformly-distributed factor assignments
  • multinomially-distributed factor counts

8
Probabilistic Sparse Matrix Factorization
  • To express as a distribution, assume
  • varying levels of Gaussian noise in the data
  • nothing about transcription factor weights
  • normally-distributed transcription factor
    profiles
  • uniformly-distributed factor assignments
  • multinomially-distributed factor counts
  • Multiply together to get joint distribution

9
Factorized Variational Inference
  • Exact inference is intractable with P()

10
Factorized Variational Inference
  • Exact inference is intractable with P()
  • Approximate it by a simpler distribution, Q(),
    and perform inference on that

11
Factorized Variational Inference
  • Parameterize Q()
  • Accounts for noise in transcription factor
    profiles and uncertainty in transcription factor
    selection

12
Factorized Variational Inference
  • Parameterize Q()
  • Accounts for noise in transcription factor
    profiles and uncertainty in transcription factor
    selection
  • Minimize KL-divergence between P(), Q()

13
Factorized Variational Inference
  • Parameterize Q()
  • Accounts for noise in transcription factor
    profiles and uncertainty in transcription factor
    selection
  • Minimize KL-divergence between P(), Q()

14
Variational EM algorithm
  • Use coordinate descent on free energy

15
Variational EM Free Energy
iteration
16
Visualization
PROBABILISTIC SPARSE MATRIX FACTORIZATION C50
possible factors N3 factors per gene (max)
P(rg).55 .27 .18
Sorted by primary transcription factor (sg1)
17
Results p-value histograms
  • Genes can be partitioned into primary
    categories (i.e. same sg1 value), secondary
    classes, etc.
  • Compare classes with annotated gene ontology
    (GO-BP) categories for statistical significance

18
Results mean log10 p-values
19
Results count of significant p-values
20
Future Directions different Q()
Iterated conditional modes (point estimates)
21
Summary
  • Introduced probabilistic sparse matrix
    factorization (PSMF), each row is a linear
    combination of a small number of hidden factors
    selected from a larger set.
  • Described a variational inference algorithm for
    fitting the PSMF model.
  • Evaluated model on a gene functional prediction
    task.
Write a Comment
User Comments (0)
About PowerShow.com