Liquid association for large scale gene expression and network studies - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

Liquid association for large scale gene expression and network studies

Description:

, . , , ,/.F0/ 50a0 0G0p0S0s0 0t0 O 5 5 5 5 5 5 5 5 5e: : : : : (CCCCCCC ... H M!MoN N OPQQ_at_RW6R7R8R9R:R;R R=R R?RYW WW}] YYYYY Ya]a]]]]]]baxebbbb euff ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 79
Provided by: kcli7
Category:

less

Transcript and Presenter's Notes

Title: Liquid association for large scale gene expression and network studies


1
Liquid association for large scale gene
expression and network studies
  • Ker-Chau Li
  • Institute of Statistical Science
  • Academia Sinica

(presentation at Isaac Newton Institute for
mathematical Sciences, Workshop under the
Program of Statistical Theory and Methods for
Complex, High-Dimensional Data, June 23-27, 2008)
2
Abstract
  • The fast-growing public repertoire of
    microarray gene expression databases provides
    individual investigators with unprecedented
    opportunities to study transcriptional activities
    for genes of their research interest at no
    additional cost. Methods such as hierarchical
    clustering, principal component analysis, gene
    network and others, have been widely used. They
    offer biologists valuable genome-wide portraits
    of how genes are co-regulated in groups. Such
    approaches have a limitation because it often
    turns out that the majority of genes do not fall
    into the detected gene clusters. If one has a
    gene of primary interest in mind and cannot find
    any nearby clusters, what additional analysis can
    be conducted? In this talk, I will show how to
    address this issue via the statistical notion of
    liquid association. An online biodata mining
    system is developed in my lab for aiding
    biologists to distil information from a web of
    aggregated genomic knowledgebase and data sources
    at multi-levels, including gene ontology,
    protein complexes, genetic markers, drug
    sensitivity. The computational issue of liquid
    association and the challenges faced in the
    context of high p low n problems will be
    addressed.

3
Change, change, and change
  • Calculus is a subject about change
  • Life Science is about change
  • My entire talk is about change

4
Intuition of SIR
Dogma of regression teaches how output Y CHANGES
in response to CHANGES in input X
  • Regression Models
  • Parametric
  • Multiple linear reg
  • Nonlinear reg
  • Wavelet
  • Nonparametric
  • Spline smoothing
  • kernel smoothing
  • Semiparametric
  • Cox regression for survival analysis
  • Input data variables
  • X1 crime rate
  • X2 room size
  • X3 family income
  • Xp air quality
  • p the total number of input variables

Failed when dimension is high
Output data variable Y house
price f(x1,x2 , xp,error)
Dimension Deduction on X
Principle Component Analysis (PCA)
Critical issue Danger of Information loss
Information relevant to Y may not be contained in
the reduced variables because Y is not used in
the dimension reduction process
5
A reversal of the regression paradigm
Instead of asking how Y changes in response to
changes in X, Ask how X CHANGES as Y CHANGES
  • input data variables
  • X1 X2 X3 Xp
  • p the total number of input variables

output data variable Y
f(b1X,b2X , bkX ,error)
sliced means E(X1Y) E(X2Y) E(XpY)
conduct dimension reduction on E(X1Y), .,
E(XpY) to k projection variables, k ltltp
Regression on k projection variables
A fundamental theory for resolving Information
loss !!! A theorem in Li(1991, JASA) shows
that (i) dimension reduction after inverse
regression recovers the effective projection
variables b1X,b2X , bkX , (ii) No Need to
specify the nonlinear function f .
6
Regression is about change
  • Sir Francis Galton (1822 - 1911),
  • half-cousin of Charles Darwin,
  • was an English Victorian
  • polymath, anthropologist, eugenicist,
  • tropical explorer, geographer, inventor,
  • meteorologist, proto-geneticist,
  • psychometrician, and statistician.
  • He was knighted in 1909.
  • Galton invented the use of the regression line
    (Bulmer 2003, p. 184), and was the first to
    describe and explain the common phenomenon of
    regression toward the mean, which he first
    observed in his experiments on the size of the
    seeds of successive generations of sweet peas.

Bivariate normal
Regression slope equals correlation after
variable standardization
7
Correlation and Changes Inside Correlation
Correlation Coefficient has been used by Gauss,
Bravais, Edgeworth Sweeping impact in data
analysis is due to Galton(1822-1911) Typical
laws of heredity in man Karl Pearson modifies
and popularizes its use. A building block in
multivariate analysis, of which clustering,
classification, dimension reduction are recurrent
themes

Liquid association is about the change of
correlation pattern inside the scatter diagram
of a pair of variables
8
Liquid association(LA) a new bioinformatics
tool for exploring gene expression data and
much beyond
Pearson correlation(X,Y)
Basis for clustering genes in microarray two
genes X, Y are likely to be functionally
associated if sharing similar expression
profiles measured by correlation coefficient
The converse statement is not true many
functionally associated genes are often
uncorrelated in expression owing to complexity of
gene regulation such as multiple functional roles
for most genes, role-changing as hidden cellular
state variables vary, etc.
LA
How LA works? Instead of two genes,
three genes are considered at a time.
LA measures the CHANGE in
correlation between two genes X, Y as mediated
by a third gene Z.
Li(1992, PNAS) invented a novel statistical
notion termed liquid association Liquid as
opposed to solid is a metaphor for change
9
How to alleviate computation burden for computing
a genome-wide total of N3 triplets of
genes N6,000 (yeast) 36 billions
N50,000(human ) 20 trillions
An enabling algorithm is derived from an elegant
theorem that offers a simple formula for
measuring LA under the setting of continuous
cellular state changes. Genome-wide
co-expression dynamics theory and application,
K.C. Li (2002,
PNAS )
  • On-line LA system developed for aiding
    integrative cancer biology study
  • Biomarkers and disease candidate genes finding
  • Gene/drug correlation
  • eQTL
  • gene signature for clinical survival prediction
  • MicroRNA expression
  • Array CGH DNA copy number

LA helps Elucidation of Gene regulation in
metabolic pathways (Li 2002)
Urea Cycle
LA helps Finding disease candidate genes(Li et
al 2007)
multiple sclerosis
Examples of LA application
http//kiefer.stat2.sinica.edu.tw/LAP3
10
gene-expression data
cond1 cond2 .. condp
x11 x12 .. x1p x21 x22 ..
x2p
gene1gene2 gene n

11
(No Transcript)
12
Why clustering makes sense biologically?
The rationale is
Genes with high degree of expression similarity
are likely to be functionally related. may form
structural complex, may participate in common
pathways. may be co-regulated by common
upstream regulatory elements.
Simply put,
Profile similarity implies functional association
13
Protein rarely works as a single unit

Homo-dimer, hetro-dimer, protein complex
??? ATP ???
Mitochondrial ATP Synthase E. coli ATP
(?????) Synthase These images depicting models of
ATP Synthase subunit structure were provided by
John Walker. Some equivalent subunits from
different organisms have different names.
14
Example ????? SCATTERPLOT MATRIX of MCM1,MCM2,
MCM3, MCM4, MCM5, MCM6, MCM7,
15
The tighter association among the six genes,
MCM2,..., MCM7 is in a sharp contrast to the
association between each of them and MCM1. It
turns out that the gene products of MCM2,..,MCM7
form a hexameric complex that binds
Chromatin(???). It is a part of pre-replicative
complex, an assembly of proteins that form at
origins of DNA replication between late M phase
and the G1/S transition and includes other
proteins believed to act in DNA replication
initiation.
?????
16
MCM1 is a Transcription factor helps Activation
of gene expression
17
However, the converse is not true
  • The expression profiles of majority of
    functionally associated genes are indeed
    uncorrelated
  • Microarray is too noisy
  • Biology is complex

18
Why no correlation?
  • Protein rarely works alone
  • Protein has multiple functions
  • Different biological processes or pathways have
    to be synchronized
  • Competing use of finite resources metabolites,
    hormones,
  • Protein modification Phosphorylation,
    proteolysis, shuttle,
  • Transcription factors serving both as
    activators and repressors

19
Transcription factors proteins that bind to DNA
Activator XTF, Y target gene correlation is
positive Repressor XTF, Ytarget gene
correlation is negative
20
Some transcription factors can act As both
activators and repressors
Thyroid hormone receptors can be changed from
repressors to activators Dependent on the
absence/presence of thyroid hormone
XTHR Ytarget gene Corr may cancel out if
hormone level fluctuates
21
Going subtleProtein modification Histone
inhibits transcription To activate transcription,
the lysine side chain must be acetylated.
Weaver(2001)
22
Transcription factor can switch between
activator and repressor, dependent on the
abundance level of thyroid hormone.
23
Math. Modeling a nightmare
Current
Next
mRNA
F I T N E S S
mRNA
mRNA
Observed
protein kinase
hidden
ATP, GTP, cAMP, etc
Cytoplasm Nucleus Mitochondria Vacuolar
localization
F U N C T I O N
Statistical methods become useful
DNA methylation, chromatin structure
Nutrients- carbon, nitrogen sources Temperature Wa
ter
24
What is LA? PLA?
  • Concept of mediator

25
Schematic illustration of LA
26
A Challenge
  • What genes behave like that ?
  • Can we identify all of them ?
  • N5878 ORFs
  • N choose 3 33.8 billion triplets to inspect

27
Statistical theory for LA
  • X, Y, Z random variables with mean 0 and variance
    1
  • Corr(X,Y)E(XY)E(E(XYZ))Eg(Z)
  • g(z) an ideal summary of association pattern
    between X and Y when Z z
  • g(z)derivative of g(z)
  • Definition. The LA of X and Y with respect to Z
    is LA(X,YZ) Eg(Z)

28
  • One way to go about estimating LA is to apply
    nonparametric regression for g(z)
  • But this is probably going to eat up too much
    computing time and also face the issue of
    regularization such as shall we apply a common
    smoothing parameter to all curves or not,
  • A idea pop out because of my early work on SURE
    and cross validation.

29
applications of Stein Lemma
Decision Theory
  • Nonparametric regression with stein estimates
  • Connection of Steins unbiased risk estimate,
    with generalized cross validation (Li 1984, Ann.
    Stat)

30
Lemma 1 Eh(X)h(1)-h(0)X ? uniform0,1
  • h is differentiable
  • Fundamental theorem of calculus
  • Sir Issac Newton
  • (1643-1727)
  • Gottfried Leibniz
  • (1646-1716)
  • from Wikipedia

31
Lemma Eh(X) EXh(X)XNormal(0,1)
  • Steins Lemma
  • Charles Stein
  • Integration by part
  • Proof
  • Start from the right side
  • Write down the density of X
  • Integration by part

32
Statistical theory-LA
  • Theorem. If Z is standard normal, then
    LA(X,YZ)E(XYZ)
  • Proof. By Steins Lemma Eg(Z)Eg(Z)Z
  • E(E(XYZ)Z)E(XYZ)
  • Additional math. properties
  • bounded by third moment
  • 0, if jointly normal
  • transformation

33
Normality ?
  • Convert each gene expression profile by taking
    normal score transformation
  • LA(X,YZ) average of triplet product of three
    gene profiles
  • (x1y1z1 x2y2z2 . ) / n

34
How does LA work in yeast?
  • Urea cycle/arginine biosynthesis

35
Yeast Cell Cycle(adapted from Molecular Cell
Biology, Darnell et al)
Most visible event
36
ARG1

Glutamate
ARG2
37
ARG1
8th place negative
Y
Head
X
Compute LA(X,YZ) for all Z
Backdoor
Rank and find leading genes
Adapted from KEGG
38
Why negative LA?high CPA2 signal for
arginine demand. up-regulation of ARG2
concomitant with down-regulation of CAR2
prevents ornithine from leaving the urea
cycle.When the demand is relieved, CPA2 is
lowered, CAR2 is up-regulated, opening up the
channel for orinthine to leave the urea cycle.

39
Other examples (see Li 2002)
  • XGLN3(transcription factor), YCAR1, ZARG4 (8th
    place negative end)
  • Electron transport XCYT1(cytochome c1), gives
    ATP1 (11 times), ATP5 (subunits of ATPase)
  • Calmodulin CMD1, NUF1 (binding target of CMD1),
    CMK1(calmodulin-regulated kinase), YGL149W
  • Glycolysis genes PFK1, PFK2 (6-phospho-fructokinas
    e)
  • CYR1(adenylate cyclase) , GSY1 (glycogen
    synthase), GLC2( glucan branching),
    SCH9(serine/threonine protein kinase longevity)

40
Liquid association
A method for exploiting lack of correlation
between variables
41
LA related References
  • Li, K.C. (2002) Genome-wide co-expression
    dynamics theory and application. Proceedings
    of National Academy of Science . 99, 16875-16880.
  • Li, K.C., and Yuan, S. (2004) A functional
    genomic study on NCI's anticancer drug screen.
    The Pharmacogenomics Journal, 4, 127-135.
  • Li, K.C., Ching-Ti Liu, Wei Sun, Shinsheng Yuan
    and Tianwei Yu (2004). A system for enhancing
    genome-wide co-expression dynamics study.
    Proceedings of National Academy of Sciences .
    101 , 15561-15566.
  • Yu , T., Sun, W., Yuan , S., and Li, K.C.
    (2005). Study of coordinative gene expression at
    the biological process level. Bioinformatics 21
    3651-3657.
  • Yu, T., and Li, K.C. (2005). Inference of
    transcriptional regulatory network by two-stage
    constrained space factor analysis. Bioinformatics
    21, 4033-4038.
  • Wei Sun Tianwei Yu Ker-Chau Li?(2007).
    Detection of eQTL modules mediated by activity
    levels of transcription factors. Bioinformatics
    doi 10.1093/bioinformatics/btm327
    (correspondence author Li)
  • Yuan, S., and Li. K.C. (2007) Context-dependent
    Clustering for Dynamic Cellular State Modeling of
    Microarray Gene Expression. Bioinformatics 2007
    doi 10.1093/bioinformatics/btm457
    (correspondence author Li)
  • Li, KC, Palotie A, Yuan, S, Bronnikov, D., Chen
    D., Wei X., Choi, O., Saarela J., Peltonen L.
    (2007) Finding candidate disease genes by liquid
    association. Genome Biology (in Press).

42
The human examples
43
Gene expression profile for NCIs 60 cell lines
  • For each cell line, the relative mRNA
    concentrations are measured by cDNA glass array.
  • Cell lines used in microarray experiment are
    without drug administration.
  • Ross D.T. et al. Systematic variation in gene
    expression patterns in human cancer cell lines.
    Nat. Genet. 24, 227-235 (2000)

44
NCI 60 Cell lines
OVARIAN (6) IGROV1 OVCAR-3 OVCAR-4 OVCAR-5 OVCAR-8
SK-OV-3 PROSTATE (2) DU-145 PC-3 LEUKEMIA
(6) CCRF-CEM HL-60 K-562 MOLT-4 RPMI-8226 SR
MELANOMA (8) LOXIMVI M14 MALME-3M SK-MEL-2 SK-MEL-
28 SK-MEL-5 UACC-257 UACC-62 BREAST
(8) BT-549 HS578T MCF7 MCF7/ADF-RES MDA-MB-231/ATC
C MDA-MB-435 MDA-N T-47D
LUNG (9) A549/ATCC EKVX HOP-62 HOP-92 NCI-H226 NCI
-H23 NCI-H322M NCI-H460 NCI-H522 CNS
(6) SF-268 SF-295 SF-539 SNB-19 SNB-75 U251
COLON (7) COLO205 HCC-2998 HCT-116 HCT-15 HT29 KM1
2 SW-620 RENAL (8) 786-0 A498 ACHN CAKI-1 RXF-3
93 SN12C TK-10 UO-31
45
How does LA work in cell-lines?
  • Alzheimers Disease hallmark gene
  • Amyloid-beta precursor protein (APP)

46
Alzheimers disease
The brain tissue shows "neurofibrillary tangles"
(twisted fragments of protein within nerve cells
that clog up the cell), "neuritic plaques"
(abnormal clusters of dead and dying nerve cells,
other brain cells, and protein), and "senile
plaques" (areas where products of dying nerve
cells have accumulated around protein). Although
these changes occur to some extent in all brains
with age, there are many more of them in the
brains of people with AD. The destruction of
nerve cells (neurons) leads to a decrease in
neurotransmitters (substances secreted by a
neuron to send a message to another neuron). The
correct balance of neurotransmitters is critical
to the brain.
47
Amyloid beta peptide is the predominant component
of senile plagues in brains of MD patients. It
is derived from Amyloid-beta precusor protein
(APP) by consecutive proteolytic cleavage
of Beta-secretase and gamma-secretase
48
(No Transcript)
49
What is the physiological role of APP?
  • Cao X, Sudhof TC.
  • A transcriptionally active complex of APP with
    Fe65 and histone acetyltransferase Tip60.
  • Science. 2001 Jul 6293(5527)115-20.

50
Abstract of Cao and Sudhof
  • Amyloid-beta precursor protein (APP), a widely
    expressed cell-surface protein, is cleaved in the
    transmembrane region by gamma-secretase.
    gamma-Cleavage of APP produces the extracellular
    amyloid beta-peptide of Alzheimer's disease and
    releases an intracellular tail fragment of
    unknown physiological function. We now
    demonstrate that the cytoplasmic tail of APP
    forms a multimeric complex with the nuclear
    adaptor protein Fe65 and the histone
    acetyltransferase Tip60. This complex potently
    stimulates transcription via heterologous Gal4-
    or LexA-DNA binding domains, suggesting that
    release of the cytoplasmic tail of APP by
    gamma-cleavage may function in gene expression.

51
Take XAPP, YAPBP1
  • APBP1 encodes FE65
  • Find BACE2 from our short list of LA score
    leaders.
  • BACE2 encodes a beta-site APP-cleaving enzyme

52
Take XAPP, YHTATIP HTATIP encodes Tip60
  • Finds PSEN1 (second place positive LA score
    leader)
  • Which encodes presenilin 1,
  • a major component of
  • gamma-secretase

53
(No Transcript)
54
Finding disease candidate genes by liquid
association Ker-Chau Li , Aarno
Palotie, Shinsheng Yuan, Denis Bronnikov, Daniel
Chen, XuelianWei, Oi-Wa Choi, Janna Saarela
and Leena Peltonen
55
Multiple Sclerosis
56
Multiple sclerosis
  • 1. MS is a chronic neurological disorder disease,
    characterized by multicentric inflammation,
    demyelination and axonal damage, resulting in
    heterogeneous clinical features, including
    pareses, sensory symptoms and ataxia. The
    classical clinical features include disturbances
    in sensation and mobility. The typical age of
    onset is between years 20 and 40, making MS one
    of the most common neurological diseases of young
    adults. Four genome-wide scans (US, UK, Canada,
    and Finland) have revealed several putative
    susceptibility loci, of which the loci on
    chromosomes 6p, 5p, 17q and 19q have been
    replicated in multiple study samples. More
    recently, Professors Aarno Peltonen and Leena
    Peltonens teams have generated a fine map on
    17q22-q24 (Saarela et al 2002). They are now
    interested in the functional aspect of the genes
    in this region using microarray technology.

57
Application finding candidate genes for Multiple
sclerosis
58
(No Transcript)
59
(No Transcript)
60
  • glutamate-induced excitotoxicity
  • SLC1A3 is highly expressed in various brain
    regions including cerebellum, frontal cortex,
    basal ganglia and hippocampus. It encodes a
    sodium-dependent glutamate/aspartate transporter
    1 (GLAST). Glutamate and aspartate are excitatory
    neurotransmitters that have been implicated in a
    number of pathologic states of the nervous
    system. Glutamate concentration in cerebrospinal
    fluid rises in acute MS patients whilst glutamate
    antagonist amantadine reduces MS relapse rate. In
    EAE, the levels of GLAST and GLT-1 (SLC1A2) are
    found down-regulated in spinal cord at the peak
    of disease symptoms and no recovery was observed
    after remission. We consider highly encouraging
    that several lines of evidence including both
    genetic association and gene expression
    association, would be consistent with the
    glutamate-induced excitotoxicity hypothesis of
    the mechanisms resulting in demyelination and
    axonal damage in MS.

61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
International MS whole genome association
study(2007).
  • Affymetrix 500K to screen common genetic variants
    of 931 family trios.
  • Using the on-line supplementary information
    provided, we found two SNPs, rs4869676(chr536641
    766) and rs4869675(chr5 36636676 ) with TDT
    p-value 0.0221 and 0.00399 respectively, are in
    the upstream regulatory region of the SLC1A3
    gene.
  • In fact, within the 1Mb region of rs486975,
    there are a total 206 SNPs in the Affymetrix 500K
    chip. No other SNPs have p-value less than that
    of rs486975.
  • The next most significant SNPs in this region are
    rs1343692(chr535860930), and rs6897932(chr53591
    0332 the identified MS susceptibility SNP in the
    IL7R axon).
  • The MS marker we identified rs2562582(chr5
    36641117) is , less than 5K apart from
    rs4869675, but was not in the Affymetrix chip.

66
A little bit late
  • IL7R was found long time ago before by LA !!!See
    the attached the e-mail? sent more than two years
    ago in 2005 !!!
  • Begin forwarded messageFrom Ker Chau Li (local)
    ltkcli_at_stat.ucla.edugtDate March 28, 2005 101751
    AM PSTTo Robert Yuan ltsyuan_at_stat.ucla.edugt,
    Aarno Palotie ltAPalotie_at_mednet.ucla.edugt, Daniel
    Chen ltpharmacogenomics_at_yahoo.comgt, Denis
    Bronnikov ltdenis_at_ucla.edugt, Palotie Leena
    ltleena.peltonen_at_ktl.figtCc Ker Chau Li (local)
    ltkcli_at_stat.ucla.edugt
  • Subject IL7R
  • (I thought this e-mail should have been sent
    out already but it has not)I take XSLC1A3,
    YMBP, Z any gene, using 2002 Atlas data. Two
    genes are from the short list of genes with
    highest LA scores.IL7R interleukin 7 receptor and
    HLA-A
  • IL7R is at 5p13. Interesting coincidence??
  • other interesting findings include GFAP
    glial fibrillary acidic protein on 17q21
    (Alexander disease)GRM3 (glutamate receptor,
    metabotropic )CDR1 (cerebellar degeneration-relate
    d protein 1)Ighg3 (immunoglobulin heavy constant
    gamma 3)Iglj3 ( immunoglobulin lambda joining 3)

67
A2M
  • The output of a short list of 25 gene pairs with
    the best
  • LA scores each from the positive and the negative
    ends is
  • given in Additional data file 1 (Table S1). The
    statistical significance
  • of the results of this gene search procedure is
    discussed
  • in Additional data file 2 (Supplementary Text 3).
    We find that
  • the gene A2M (encoding a2-macroglobulin, a
    cytokine
  • transporter and protease inhibitor) appears many
    times. We
  • further find an interesting biologic functional
    association
  • between A2M and MBP from some literature about
    the pathogenesis
  • of MS. Following demyelination in human MS and
  • rodent EAE, immunogenic MBP peptides are released
    into
  • cerebrospinal fluid and serum (see Oksenberg and
    coworkers
  • 2 for references) and A2M represents the major
    MBP-binding
  • protein in human plasma 17. A significant
    increase in
  • a2-macroglobulin is found in plasma of MS
    patients 18.
  • Analogously, in rodent EAE, infusion of
    a2-macroglobulin
  • significantly reduces disease symptoms 19.

68
Genome-wide LA scores,XMBP
69
P-values by randomizationEach dot represents a
case of simulated X highest corr v.s. 20th
highest LA
70
LAP website
71
Basic workflow is simple
72
User interface for browsing computation output
73
Acknowledgement
  • Mathematics in Biology (MIB), Institute of
    Statistical Science, Academia Sinica
  • Web-based Liquid association development team
  • Team leader Dr. Shin-sheng Yuan
  • IT specialists
  • Guan-I Wu, Hung-Wei Tseng, Shi-Hsien Yang, Yi-Wei
    Chen, Chang-Dao Chen, Ying-Fu Ho
  • Arabidopsis Gene Expression Analysis
  • Dr. Ai-Ling Hour

74
Acknowledgment
  • Biodata refining group,UCLA Statistics
  • Htpp//kiefer.stat.ucla.edu/lap
  • Shinsheng Yuan (chief architect for website
    development, gene-drug)
  • Wei Sun (yeast segregation )
  • Ching-Ti Liu (yeast protein complex)
  • Tianwei Yu (Stress, gene ontology)
  • Xuelian Wei (graphics, cancer, disease page),
    Tun-Hsian Yang (disease page)
  • Yijing Shen (clustering) Tongtong Wu(Stress)
  • Jack Li(graph)

75
Lung Cancer project
  • National Taiwan University Hospital
  • Pan-Chyr Yang
  • Sung-Liang Yu
  • Hsuan-Yu Chen

76
Causal analysis
  • X, Y, Z
  • X-gtY, X-gtZ
  • YaXberror
  • ZaXberror
  • Partial correlation corr (error, error)
  • X causes Y and Z if partial correlation0
  • (XCoke sale, Yeye disease incidence rate,
    Zseason)
  • Start with a pair of correlated genes Y, Z, find
    X to minimize partial correlation
  • This is very different from LA.

77
A limited goal remove the trend
  • Universal trend (affects all genes) could be
    artificial (due to chip technology imperfection )
  • Localized trend (affects a limited number of
    highly expressed genes) likely to be
    biologically real
  • Partial correlation can be used to detrend
  • Xone gene, Yone gene, Ztrend
  • X residual after regressing X on Z
  • Yresidual after regressing Y on Z
  • Find correlation between X and Y

78
  • Maximizing the absolute value of partial corr,
    given a pair of variables.
  • Partial corr cosine (angle between two planes,
    X,Z plane and Y,Z plane)
  • Consider the prediction of Z from Z, Y.
  • Fixing the error variance, then the optimal Z
    should have highest correlation with (XY) (if
    X,Y positively correlated) or with (X-Y)
    (otherwise) as possible.
Write a Comment
User Comments (0)
About PowerShow.com