Properties of sample covariance matrices from modern highdimensional genomic data sets RSS Mancheste - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Properties of sample covariance matrices from modern highdimensional genomic data sets RSS Mancheste

Description:

Properties of sample covariance matrices from modern high-dimensional ... Elbow in scree plot ? Deviation beyond MP edge ? Sampling variation. Johnstone, Ann. ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 28
Provided by: david897
Category:

less

Transcript and Presenter's Notes

Title: Properties of sample covariance matrices from modern highdimensional genomic data sets RSS Mancheste


1
Properties of sample covariance matrices from
modern high-dimensional genomic data setsRSS
Manchester, October 2008
  • David Hoyle
  • Faculty of Medicine,
  • University of Manchester, UK
  • david.hoyle_at_manchester.ac.uk

2
Outline
  • Molecular Biology
  • PCA in Molecular Biology
  • Pathologies of large covariance matrices

3
Molecular Biology
Biological system Data are complex,
high-dimensional and noisy
4
Exploratory Analysis
  • Interested in detecting patterns in data

Covariance important. Tools such as PCA often used
5
PCA
  • This is Principal Component Analysis (PCA)
  • Standard algorithm of multi-variate statistics

6
PCA in Molecular Biology (1)
  • PCA ? Diagnostic, Prognostic

7
PCA in Molecular Biology (2)
  • PCA ? Vizualization, QA,

8
PCA in Molecular Biology (3)
PCA ? Correct for stratification within GWA
studies Price et al., Nature Genetics 38904-909,
2006
9
Model Selection
  • How to select S
  • Retain 90 of variance ?
  • Elbow in scree plot ?
  • Deviation beyond MP edge ?
  • Sampling variation
  • Johnstone, Ann. Stat. (2001)

Asymptotically estimate of S ? beyond edge
10
Goals
  • To understand effects of high-dimensionality and
    small sample size on PCA
  • Can we use understanding to improve estimation
    algorithms for high-dimensional data ?

11
Approaches
  • Sample points are random vectors
  • Sample covariance is a random matrix
  • Use tools of Random Matrix Theory (RMT)

12
RMT
  • History in both Physics Statistics
  • Wishart distribution (1928)
  • Semi-circle law of Wigner (1955)
  • Physics - paradigms for real systems
  • Statisticians interested in covariance
  • Communities have diverged

13
Physics, Statistics RMT
  • Use physics techniques to study the statistics
    problem
  • Two aspects to focus upon
  • Accuracy of sample eigenvalues
  • Accuracy of sample eigenvectors

14
Statistical Physics of Learning
B
R JB
J
  • Statistical Mechanics of Learning, Engel Van
    den Broeck
  • Advanced Mean Field Methods, Opper Saad
  • Introduction to the Theory of Neural
    Computation, Hertz, Palmer, Krogh
  • Statistical Physics of Spin Glasses
    Information Processing, Nishimori

15
Learning in PCA
C ?2I ?2ABBT
Weak Signal-to-Noise
A 1 d
Or N large and p-small
16
Connection to Statistics
  • spiked covariance model Johnstone (2001)
  • Learning curve phase transition obtained by D.
    Paul
  • Statistica Sinica 17 (2007), 1617-1642
  • Johnstone, Baik, El Karoui, Bai, Silverstein,
    Peche
  • In general

17
Learning in PCA
Large sample size scenario easy,
T.W. Anderson, Ann. Math. Stat., 1963
ML estimators are okay for large sample sizes
18
Learning Theory
  • Asymptotic limit achieves perfect learning
  • But we dont have lots of data !!
  • Study scenario where

19
Model
  • Data
  • Learning

20
Retarded Learning
Asymptotic theory
Replica Analysis Reimann et al., J. Phys A,
29 (1996) 3521 Hoyle Rattray,
Europhys. Lett., 62 (2003) 117 Variational Bounds
Herschkowitz Opper, Phys. Rev. Lett. 86
(2001) 2174
21
Transition in top eigenvalue
  • Quantity maximized is Rayleigh quotient of sample
    covariance matrix
  • Maximum value of quotient gives top eigenvalue
  • Phase transition in
    implies phase transition in

22
Top Eigenvalue
Hoyle Rattray , EuroPhysics Letters 62 (2003),
117 Hoyle Rattray, Phys. Rev. E, 69 (2004)
026124
23
Replica Analysis - Eigenvectors
Hoyle Rattray, Phys. Rev. E 75 (2007), 016101
24
Replica Analysis - Eigenspectra
Hoyle Rattray, Phys. Rev. E, 69 (2004) 026124
25
Replica Analysis - Eigenvectors
26
Summary
  • Understanding the properties of standard
    algorithms with high-dimensional data is
    important
  • Common areas between statistical inference
    statistical physics
  • Considering distinguished asymptotic limit
    important
  • Can recover accurate estimates of population
    covariance eigenvalues

27
Thank you
Write a Comment
User Comments (0)
About PowerShow.com