Pathway Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Pathway Analysis

Description:

Pathway Analysis Goals Characterize biological meaning of joint changes in gene expression Organize expression (or other) changes into meaningful chunks (themes ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 35
Provided by: MarkRe80
Category:

less

Transcript and Presenter's Notes

Title: Pathway Analysis


1
Pathway Analysis
2
Goals
  • Characterize biological meaning of joint changes
    in gene expression
  • Organize expression (or other) changes into
    meaningful chunks (themes)
  • Identify crucial points in process where
    intervention could make a difference
  • Why? Biology is Redundant! Often sets of genes
    doing related functions are changed

3
Gene Sets
  • Gene Ontology
  • Biological Process
  • Molecular Function
  • Cellular Location
  • Pathway Databases
  • KEGG
  • BioCarta
  • Broad Institute

4
Other Gene Sets
  • Transcription factor targets
  • All the genes regulated by particular TFs
  • Protein complex components
  • Sets of genes whose protein products function
    together
  • Ion channel receptors
  • RNA / DNA Polymerase
  • Paralogs
  • Families of genes descended (in eukaryotic times)
    from a common ancestor

5
Approaches
  • Univariate
  • Derive summary statistics for each gene
    independently
  • Group statistics of genes by gene group
  • Multivariate
  • Analyze covariation of genes in groups across
    individuals
  • More adaptable to continuous statistics

6
Univariate Approaches
  • Discrete tests enrichment for groups in gene
    lists
  • Select genes differentially expressed at some
    cutoff
  • For each gene group cross-tabulate
  • Test for significance (Hypergeometric or Fisher
    test)
  • Continuous tests from gene scores to group
    scores
  • Compare distribution of scores within each group
    to random selections
  • GSEA (Gene Set Enrichment Analysis)
  • PAGE (Parametric Analysis of Gene Expression)

7
Multivariate Approaches
  • Classical multivariate methods
  • Multi-dimensional Scaling
  • Hotellings T2
  • Informativeness
  • Topological score relative to network
  • Prediction by machine learning tool
  • e.g. random forest

8
Contingency Table 2 X 2
Signif. Genes NS Genes
Group of Interest k n-k n
Others K-k (N-n)-(K-k) N-n
K N-K N
P
9
Categorical Analysis
  • Fishers Exact Test
  • Condition on margins fixed
  • Of all tables with same margins, how many have
    dependence as or more extreme?
  • Hard to compute when n or k are large
  • Approximations
  • Binomial (when k/n is small)
  • Chi-square (when expected values gt 5 )
  • G2 (log-likelihood ratio compare to c2)

10
Issues in Assessing Significance
  • P-value or FDR?
  • Heuristic only use FDR
  • If a child category is significant, how to assess
    significance of parent category?
  • Include child category
  • Consider only genes outside child category
  • What is appropriate Null Distribution?
  • Random sets of genes? Or
  • Random assignments of samples?

11
Critiques of Discrete Approach
  • No use of information about size of change
  • Continuous procedures usually have twice the
    power of analogous discrete procedures on
    discretized continuous data
  • No use of covariation knowing covariation
    usually improves power of test

12
(No Transcript)
13
(2003)
14
GSEA
  • Uses Kolmogorov-Smirnov (K-S) test of
    distribution equality to compare t-scores for
    selected gene group with all genes

15
Update Fixes a Problem
  • Sometimes ranks concentrated in middle
  • Hack Ad-hoc weighting by scores emphasizes peaks
    at extremes

16
(No Transcript)
17
Group Z- or T- Scores
  • Under Null Hypothesis, each genes z-score (zi)
    is distributed N(0,1)
  • Hence the sum over genes in a group G
  • Identify which groups have highest scores
  • Same issues as discrete
  • Null Distribution permute which indices?
  • Hierarchy

18
Issues for Pathway Methods
  • How to assess significance?
  • Null distribution by permutations
  • Permute genes or samples?
  • How to handle activators and inhibitors in the
    same pathway?
  • Variance Test
  • Other approaches

19
Pathway Analysis of Genotype Data
20
The Pathways Proposal
  • Complex disease ensues from the malfunction of
    one or a few specific signaling pathways
  • Alternatives
  • Common variants of several genes in the pathway
    each contribute moderate risk
  • Rare de novo variants confer great risk and
    persist for generations in LD with typed markers
    within unidentified subpopulations of the study
    group

21
Approach 1 - Adaptation of GSEA
  • Order log-odds ratios or linkage p-values for all
    SNPs
  • Map SNPs to genes, and genes to groups
  • Use linkage p-values in place of t-scores in GSEA
  • Compare distribution of log-odds ratios for SNPs
    in group to randomly selected SNPs from the chip

22
Possible Association Models
  1. Each of several genes may have a variant that
    confers increased RR independent of other genes
  2. Several genes in contribute additively to the
    malfunction of the pathway
  3. There are several distinct combinations of gene
    variants that increase RR but only modest
    increases in risk for any single variant

23
Approach 2 Combining p-values
  • 1. Compute gene-wise p-value
  • Select most likely variant - best p-value
  • Selected minimum p-value is biased downward
  • Assign gene-wise p-value by permutations
    (Westfall-Young)
  • Permute samples and compute best p-value for
    each permutation
  • Compare candidate SNP pvalues to this null
    distribution of best p-values
  • 2. Combine p-values by Fishers method

24
Methods 2
  • Additive model
  • Where ni indexes the number of allele Bs of a
    SNP in gene i in the gene set G
  • Select subset of most likely SNPs
  • Fit by logistic regression (glm() in R)
  • Significance by permutations
  • Permute sample outcomes
  • Select genes and fit logistic regression again
  • Assess goodness of fit each time
  • Compare observed goodness of fit

25
Multivariate Approaches to Gene Set Analysis
26
Key Multivariate Ideas
  • PCA (Principal Components Analysis)
  • SVD (Singular Value Decomposition)
  • MDS (Multi-dimensional Scaling)
  • Hotelling T2

27
PCA
PCA1 lies along the direction of maximal
correlation PCA 2 at right angles with the next
highest variation.
Three correlated variables
28
Multi-Dimensional Scaling
  • Aim to represent graphically the most
    information about relationships among samples
    with multi-dimensional attributes in 2 (or 3)
    dimensions
  • Algorithm
  • Transform distances into cross-product matrix
  • Initial PCA onto 2 (or 3) axes
  • Deform until better representation
  • Minimize strain measure

29
Separating Using MDS
Left distributions of individual
variables Right MDS plot (in this case PCA)
30
Multivariate Approaches to Selection
  • Visualizing differences by MDS
  • Hotellings T-squared

31
MDS for Pathways
  • BAD pathway
  • Normal
  • IBC
  • Other BC
  • Clear separation between groups
  • Variation differences

32
Hotellings T2
  • Compute distance between sample means using
    (common) metric of covariation
  • Where
  • Multidimensional analog of t (actually F)
    statistic

33
Principles of Kong et al Method
  • Normal covariation generally acts to preserve
    homeostasis
  • The transcription of genes that participate in
    many processes will be changed
  • The joint changes in genes will be most
    distinctive for those genes active in pathways
    that are working differently

34
Critiques of Hotellings T
  • Not robust to outliers
  • Assumes same covariance in each sample
  • S1 S2 ? Usually not in disease
  • Small samples unreliable S estimates
  • N lt p
Write a Comment
User Comments (0)
About PowerShow.com