Loading...

PPT – Automated Causal Inference PowerPoint presentation | free to download - id: 6fb487-NzhlY

The Adobe Flash plugin is needed to view this content

Report on IHMC- CMU-Pitt Research Full Report

NRA A2-37143 Automated Discovery Procedures

for Gene Expression and Regulation from

Microarray and Serial Analysis of Gene Expression

Data NCC 2-1295 Multi-Domain Network Learning

Algorithms of Latent Variable Interpretation and

Discovering Genetic Regulation April 2001

April 2002 http//www.phil.cmu.edu/projects/geneg

roup

Research Team

- William Buckles (Ph.D, Professor, Tulane)
- Tianjiao Chu (Ph.D Student, Logic, Methodology

and Computation, CMU) - Greg Cooper (M.D. Ph.D Associate Professor,

School of Medicine, Pitt - David Danks (Ph.D, Research Scientist, IHMC)
- Clark Glymour (Ph.D, P.I., Senior Resarch

Scientist and John Pace Scholar, IHMC Alumni

University Professor, CMU) - Dan Handley (M.S. Student, Logic, Methodology and

Computation, CMU - Subramani Mani (Ph.D Student, Biomedical

Informatics, Pitt) - Rob ODoherty (Ph.D ,Assistant Professor, School

of Medicine, Pitt) - Dave Peters (Ph.D , Human Genetics, Pitt

- Joseph Ramsey (Ph.D, Research Programmer, CMU)
- Jaime Robins, (M.D. School of Public Health,

Harvard) - Raul Saavedra (Ph.D, Student, Computer Science,

Tulane) - Richard Scheines (Ph.D, Associate Professor, CMU)

- Nicoleta Servan (Ph.D Student, Statistics, CMU)
- Ricardo Silva (Ph.D student, Computer Science,

CMU) - Peter Spirtes (Ph.D, Research Scientist IHMC

Professor, CMU) - Larry Wasserman (Ph.D, Professor, CMU)
- Frank Wimberly (Ph.D, Research Programmer, IHMC)

- Changwon Yoo (Ph.D Student, Biomedical

Informatics, Pitt)

Two Related Goals

- Investigating the prospects for more rapid and

accurate determination of genetic regulatory

networks using recently developed technologies

(microarrays and SAGE) - Investigating the prospects for determining the

underlying components of measured phenomena, and

the influences such components have on one another

Background on Genetics

- Proteins do most of the work in the cell
- Cell reproduction, metabolism, and responses to

the environment are all controlled by proteins - Each gene is a machine for constructing

(approximately) a single protein - The rate at which a gene constructs proteins is

influenced by concentrations of regulator proteins

Gene Regulatory Networks

- Some genes manufacture proteins which control the

rate at which other genes manufacture proteins

(either promoting or suppressing) - Hence some genes indirectly (via the proteins

they create) regulate other genes, which in turn

regulate the operation of the cell - The system by which genes regulate each other is

called the genetic regulatory network, and can be

represented by a directed graph (which is a

special case of a Bayes network)

Measuring Gene Expression Levels

- A genes expression level is an approximate

measure of the concentration of mRNA transcripts

and an more indirect measure of the rate of

synthesis of corresponding proteins. - Recently developed technologies--microarrays and

Serial Analysis of Gene Expression, or

SAGE--allow thousands of gene expression levels

to be measured simultaneously - The kinds of measurement errors that these

technologies introduce is not well understood - The best way to use these tools to discover gene

regulatory networks is not known

Relevance to NASA

- Gene expression in microgravity has been shown to

differ significantly from expression in Earth

gravity - Understanding gene regulation in plants, animals

and humans is likely to be important for long

term extraterrestrial habitation - Determining regulatory structure is a present

laborious, slow and costly - Need for systematic study of the reliability and

accuracy of scores of proposals for applying

statistical/machine learning procedures to speed

up the process

Background on Latent Structure Analysis

- Measurements are often of effects of other

scientifically interesting variables not directly

mesured. - Number and identity of underlying causal or

compositional variables may not be entirely

known. - Measured effects can influence other measured

effects (e.g., through between channel signal

leakage in multi-channel

Background on Latent Structure Analysis

- With no prior cluster information and with the

possibility of measured-measured and

latent-latent influences, none of the standard

data analysis procedures (e.g., factor analysis,

principal components, independent components)

give reliable (i.e., asymptotically correct)

information about all of - Number of latent variables
- Clustering of measured
- Causal or compositional relations among latent

variables

Relevance to NASA

- NASA collects vast quantities of observational

data on the Earth, the solar system and the

cosmos, much of it spectral - Need for automated, fast, reliable procedures

extracting relevant causal information from

diverse datasets procedures that integrate

expert knowledge - Inadequacy of current methods (model specific,

clustering algorithms) for this task - Principled procedures using Bayes network methods

offer promising alternatives - They have succeeded in other spectral

applications - (J. Ramsey, et al., Automated Identification of

Carbonate Composition from Reflectance Spectra,

Data Mining and Knowledge Discovery, in press.)

Structure of the Projects

- Statistical Foundations
- Multiple testing problem
- Measurement error models
- Search Algorithms
- Different kinds of inputs
- Different assumptions about background knowledge
- Experiments
- Microarray
- SAGE
- Testing
- Application to known genetic regulatory networks
- Application to simulated data

First Year Results Algorithms

- Many algorithms for inferring causal networks

that have been applied to inferring gene

regulatory networks assume the input is

associations between measured features of

individuals - But microarrays and SAGE measure average gene

expression levels over many cells rather than for

a single cell - What is the feasibility of inferring regulatory

networks from associations between averages? - Feasibility for linear and local-linear

regulatory functions - Impossibility for the mathematical form of the

regulatory function of sea urchin Endo 16 gene,

one of the best established. - T. Chu, C. Glymour, R. Scheines and P. Spirtes,

A Statistical Problem for Inference to

Regulatory Structure form Associations of Gene

Expression Measurements with Microarrays

Bioinformatics, submitted.

First Year Results Statistics

- Current methods for determining from SAGE

measurements which genes are changing in response

to experimental manipulations are incorrect - Correct method requires estimating additional

experimental parameters, and leads to the

conclusion that many fewer genes are changing

than had been previously thought - T. Chu, Computation of Variance in SAGE

Measurements of Gene Expression Technical

Report, Logic, Methodology and Computation, 2002. - Future plan apply the new method to SAGE

measurements of the response of genes to shear

stress (data already gathered)

First Year Results Statistics

- Standard techniques for testing whether a gene

expression level has changed due to an

experimental manipulation were not designed to be

applied to test thousands of genes simultaneously - Recent developments (False Discovery Rate tests)

do allow simultaneous testing of thousands of

genes - Further improvements of the False Discovery Rate

procedure have been made - C. Genovese, and L. Wasserman, Bayesian and

Frequentist Multiple Testing, CMU Department of

Statistics Technical Report 764, April, 2002.

First Year Results Algorithms

- Implementation and testing (on simulated data) of

a correct (under explicit assumptions) algorithm

for causal clustering and for determining latent

structure - R. Silva, CMU Masters Thesis, Center for

Automated Learning and Discovery - Extension to time series of learning algorithms

for dynamical Bayes Nets - D. Danks, Constraint-Based Learning Algorithm

for Dynamical Bayes Nets, Conference on

Uncertainty in Artificial Intelligence,

submitted. - Development and proof of correctness for an

improved algorithm for inferring Bayes networks

across distinct data sets with overlapping

variable sets - D. Danks, Efficient Learning of Bayes Nets from

Databases with Overlapping Variables, IHMC

Technical Report, 2002.

First Year Results Algorithms

- Development and testing of algorithms for

maximizing information obtained from knockout

experiments - R. Silva, C. Glymour, D. Danks, Inferring

Genetic Regulatory Structure from First and

Second Moments, Technical Report, Logic,

Methodology and Computation, 2002. - Development, implementation and testing of a

genetic algorithm for linear Bayes networks

(structural equation models) - S. Harwood and R. Scheines, Learning Linear

Causal Structure Equation Models with Genetic

Algorithms (2001) Tech Report CMU-PHIL-128,

submitted to Conference on Knowledge Discovery

and Data Mining. - S. Harwood and R. Scheines, Genetic Algorithm

Search over Causal Models (2001) Tech Report

CMU-PHIL-131, submitted to Conference on

Uncertainty in Artificial Intelligence. - Development of an algorithm for regulatory

structure from mixed observational and knockout

data

First Year Results Testing

- Very few genetic regulatory networks are known,

and even fewer details about the functional

relationships among the genes are known - How can the accuracy of a causal discovery

algorithm be tested? - Generate simulated data from made up gene

regulatory networks, so that the generating

mechanism is known

First Year Results Testing

- Implementation of a flexible program for

generating simulated microarray data that allows

the user to conveniently specify many different - Functional relationships between cells
- Measurement errors
- Averaging over different numbers of cells
- Gene regulatory network structures (including

varying time lags) - J. Ramsey and R. Scheines, (2001) Simulating

Genetic Regulatory Networks, Technical Report

CMU-PHIL-124. - Implementation of half a dozen algorithms

proposed in the literature for inferring

regulatory structure from expression associations

in microarray measurements (more to be

implemented)

First Year Results Experiments

- Fat cells from mice are treated with

troglitazone, which increases the efficiency of

the biological actions of insulin in diabetes and

obesity - Which genes are activated?
- Microarray chips used to make 47 measurements of

gene expression level at 35 time points for 5355

genes

First Year Results Experiments

- Normalize data to remove chip-to-chip effects
- Perform statistical tests to determine which

genes are changing, adjusting for multiple tests

Comparing 20 genes that change most with 20 that

change least

Current Work Experiments

- Remove outlying genes
- Improve the test performed for whether a gene is

changing over time - Introduce clustering methods for data
- Use slower but more accurate measurement

techniques (Northern Blots) to - Test the hypotheses about which genes change

according to the microarray analysis - Learn about errors in measurement when using

microarrays

Gene Research Plans May 2002 May 2003

- Study statistical properties of multiple

decisions and of conditional independence among

averaged variables - Develop new algorithms for optimal information

extraction and implement algorithms proposed in

the literature - Implement Simulator Laboratory SAGE and

microarray study of expression under

varying surface flows and drug treatments - Where we

are - Test algorithms on real and simulated data

Analyze data - Make

Predictions Where we will be - Knockout Experiments
- Overall Evaluation

Latent Structure Research Plans, 2002-2003

- Improve efficiency
- Test on large simulated data sets
- Prove asymptotic correctness
- Investigate non-linear generalizations

Supplementary Material Outline

- Discovering the Structure of Genetic Regulatory

Networks - Testing Algorithms Simulator
- Analysis of Gene Expression Levels Averaged Over

Many Cells - Analysis of SAGE Data
- Latent Structure---Causal Clustering
- Experiments
- Experiment 1 Microarray analysis
- Experiment 2 SAGE analysis

Discovering the Structure of Genetic Regulatory

Networks

Simplified Gene Regulatory Network

Environment

G1 G2 G3 G4 mRNA1 mRNA2 m

RNA3 mRNA4 protein1 protein2 protein3 protein4

G5 G6 mRNA5

mRNA6 protein5 protein6

Still More Simplified

Two Strategies for Discovering Gene Regulatory

Networks

- (Difference) Enhance or suppress specific genes

and measure the changes in expression levels of

other genes. Infer effects of manipulated gene

from differences in expression levels of other

genes versus unmanipuated controls - (Association). Use wild-type cells or cells with

specific enhanced or suppressed levels of other

genes. Infer effects from associations of

expression levels of all genes

Measurement Techniques

- Microarray techniques allow measurements of

relative mRNA concentrations from multiple tissue

sources - mRNA concentrations for thousands of genes can be

measured simultaneously - Measurements can be taken in time sequence, every

few minutes - Serial Analysis of Gene Expression (SAGE) allows

estimation of concentrations of mRNA transcripts

for essentially the entire genomedoes not

require prior knowledge of all genes

Difference Method

- Several examples of partial identification of

part of the regulatory network for several

species - Limitations
- Laborious and expensive
- Each experiment can only tell us which genes are

regulated by a manipulated gene, nothing about

the pathway of regulation - E.g, If gene A is suppressed and genes B and C

change in consequence, the experiment does not

distinguish among - A ? B ? C
- A ? C ? B
- C ? A ? B

Difference Method - Fundamental Problems

- How to make optimal multiple statistical

decisions about expression differences - How to efficiently extract all information from

an experiment - How to dynamically schedule experiments for

maximal information

Association Method

- An example or two of recovery of regulatory

structure previously established by Difference

methods. No novel discoveries so far. - Requires larger number of experimental

repetitions - Depends on statistical methods for implicitly or

explicitly estimating conditional probability

relations among cellular expression levels

Testing Algorithms - Simulator

Simulator

- User specifies
- Functional relationships between cells
- Measurement errors
- Averaging over different numbers of cells
- Gene regulatory network structures (including

varying time lags) - Type of experiment
- This provides a known structure to test

algorithms on, under a variety of assumptions

about how genes are related

Simulating MicroArray Data

- Tetrad 4 (www.phil.cmu.edu/projects/tetrad)

Network structure

Functional form

Parameters

Specifying the Network Structure

Specifying the Parameters

Data Output

- Cell by Cell Raw data

Aggregrated Measurements

Simulating MicroArray Data

- Simulated correlation between genes 1 and 3,

using different sizes averaged over (10, 100, and

1,000 cells/dish) over 450 time steps

Analysis of Gene Expression Levels Averaged Over

Many Cells

Averaging and Association

- Goal is to discover the structure of a regulatory

network from associations among expression levels

of each pair of genes, and their associations

conditional on values of other genes - But we measure only concentrationsaveragesformed

from the mRNA of many cells - For many systems, conditional associations are

altered by averaging

The Endo 16 Regulatory Function

- Regulation of the Endo16 gene of the sea urchin

(from C. Yuh, H. Bolouri, E. Davidson Genomic

Cis-Regulatory Logic Experimental and

Computational Analysis of a Sea Urchin Gene

Science, 1998, March 20 279 1896-1902

The Endo16 Regulatory Function

The Endo 16 Regulatory Function, Slightly More

Algebraically

If ( CG1 P) (B(t) G(t)) gt 0, then Q (t)

2 (1 (F E CD) Z) (1 CG2 CG3 CG4) (CG1

P) (B(t) G(t)) Else Q (t) 2 (1 (F E

CD) Z) ( 1 CG2 CG3 CG4)Otx(t) and is

Boolean sun

Conditional Independence Is Not Invariant in a

Simplified Form of Endo 16 Regulation

- X takes values in a discrete set, say 0,1,2,3,4
- Y g(X), g nonlinear, say Y X2
- Z a YW, a real, W Boolean (values in 0.1,

with a Bernoulli distribution - X Y Z W

Conditional Independence Is Not Invariant in a

Simplified Form of Endo 16 Regulation

- X is independent of Z conditional on Y, but.
- S X is not independent of S Y conditional on S Z,

where the sum is over values in n 4 or more

identically and independently distributed units - For large n this result generalizes to all cases

in which the range of X is finite (but not

binary), g is polynomial, and W is as above

General Pessimistic Conclusion (not a Theorem)

- Conditional probability relations that hold among

regulator and regulated gene transcript

concentrations at the cellular level will not be

preserved in probability relations as measured in

microarrays taking from multiple cell sources - They will be preserved for linear systems and

locally linear systems (see Chu, et al.), but

no regulatory systems are as yet known to have

such a structure

Analysis of SAGE Data

Difference Strategy and SAGE

- Estimating whether expression levels of genes

change in different environments, or which other

genes removed, requires a comparison of

expression levels across samples - Decision must be made as to whether observed

differences are or are not due to chance

SAGE and Variance

- Decisions as to whether differences expression

levels are or are not due to chance depend on the

estimate of the variance of the underlying

probability distribution - Standardly, a multinomial model is used which

gives a very large variancemeaning decisions

about the constancy of a genes expression across

environments cannot be reliably made

SAGE and Variance

- One step in SAGE measurements is an amplification

of the amount of mRNA measured through PCR

amplification - The multinomial model does not correctly

represent the statistics of PCR - A correct estimate of variance requires an

approximate estimate of the original total number

of transcripts before PCR amplification - Relevant measurements can easily be made
- Lead to a much lower estimate of variance of SAGE

estimates

Causal Clustering

The General Problem

- Given data on a number of variables, find

features of the underlying processes that

generated the data - Example Spectral measurements of solar radiation

intensities. Variables are intensities at each

measured frequency

The Most Common Solution Principal Components

Factor Analysis

- Explains data by new theoretical variables that

are linear functions of linear combinations of

measured variables - Chooses theoretical variables to account for as

much of the variance of measured variables as

possible - Theoretical variables are not

uniqueappropriate transformations will do as

well - Gives no clues to dependencies among real

underlying factors assumes they are independent

of one another

General Problems with Clustering Algorithms

- Tend to give misleading results if some of the

measured variables influence other measured

variables (e.g., through signal leakage between

channels) - Assume no correlations among the underlying

factors - E.g., Independent Components algorithms

A New Approach General Considerations

- For the time being, consider only linear models
- Think graphically and let the algebra take care

of itself - Be willing to make multiple hypothesis tests on

the same data set - Insist on computational tractability, but be

adventurous - Require asymptotic reliability under specifiable

assumptions

Think Graphically

- A system represented by the equations
- Xi ai T ei, ai a real constant, ei random, i

1,m - ei independent of ek for i not equal to k, is

represented as - T
- X1 X2 .Xm-1 Xm

Causal Clustering

- Assumptions (for some while)
- Linear Systems
- Non recursive (acyclic graph)
- Independent noises or error terms
- Normal distributions of error variables
- Independent, identically distributed cases
- Faithfulness vanishing partial correlations, if

any, hold for all values of the linear

coefficients

Input

- Values for variables X1 .Xn for a number of

cases - Significance level (a level) to be used in

hypothesis tests - Nothing else

Output

- Disjoint clusters of some of the observables a

set of directed acyclic graphs (DAGs) among

theoretical variables, one variable for each

cluster - Each DAG determines a linear model
- Just write each variable (node) in the graph as a

linear functional of its parent variables in the

graph and add an error term for each equation

The True Graph

Purify Start of Round 1

Purify Round 1

- For each measured variable X, do a test of the

one factor model, with latent common cause T0,

and with all measured variables except X, against

the one factor model with all measured variables

including X (Difference of chi squares) - If the model without X is not rejected, put X in

set Hold For 1

Purify Steps into Round 1

Purify End of Round 1

Washdown, Round 1

- Put all measured variables in Hold For 1 in a new

cluster with a single common latent factor, T1 - Correlate the new factor with the previous latent

factor, T0 - Empty Hold For 1

Washdown Round 1

Purify Round 2

- Repeat the Purify procedure on all measured

variables remaining in the first cluster. Put any

rejected variables in Hold For 1 - Apply the Purify procedure to all measured

variables in the second cluster. Put any rejected

variables in Hold for 2

Purify Round 2

Washdown, Round 2

- Add variables in Hold For 1 to the remaining

variables in cluster T1 - Form a new cluster, with a new latent common

cause T2 with the variables in Hold For 2 - Correlate all of the latent variables
- Empty Hold For 1 and Hold For 2

Washdown Round 2

Purify/Washdown Output (after 5 rounds)

Clean Up

- Remove any clusters with fewer than 3 observed

variables

Determining Latent Structure

- For each pair of latent variables, Tj and Tk, and

their measured effects, test the model in which

there is a directed edge Tj ? Tk against the

model in which there is no directed edge - If the model with a directed edge is not

rejected, keep an undirected edge between Tj ? Tk

If the model with a directed edge is rejected,

remove the Tj Tk undirected edge

MIMBuild Step 1 Testing Marginal Independencies

Testing T2 T3

versus

Not significantly different (? 0.05) Keep edge

?2 12.42 df 9

?2 11.42 df 8

Testing for Conditional Independence

- To test if Tj is independent of Tk conditional on

Tm, form the complete graph among Tj, Tk and Tm

(with measured variable effects) and test against

the same model without the Tj ? Tk edge - Similarly for conditioning on multiple variables

MIMBuild Step N Testing Independencies

Conditioned in a Set of Size N

Other example N 3, testing T0 T4 T1,

T2, T3

versus

Orienting Edges

- If, for example, there is a structure T0 T1

T2 but no T0 T2 edge, and the T0 T2 was

removed without conditioning on T1, orient T0

T1 T2 as T0 ?? T1 ? T2 (as a collider) - Orient undirected edges adjacent to a collider

node away from a collider

T0 T1 T3 T2

Final Outcome

Purify/Washdown/MIMBuild output

True graph

General Idea

- Measured variables are assigned to clusters by

testing whether the one factor model fits the

data better with them or without them - Every rejected variable is tested on each

succeeding cluster until it fits - The latent structure is determined by the PC

algorithm (Spirtes, et al. 1993) , known to be

asymptotically correct under the Faithfulness

assumption, and (in this case) under the

assumption that there are no unmeasured causes of

the latent cluster factors

Generalizations

- Using another algorithm for latent structure, the

FCI algorithm, procedure can be applied when

there may be unmeasured common causes of cluster

latent factors - Can be used with any distribution family for

which there are good tests of conditional

independence (not that there are many) - The algorithm can be easily integrated with prior

substantive knowledge about the actual structure - For linear systems, can be generalized to latent

structures with cyclic graphs (feedback systems) - Improved performance expected if Bayesian search

algorithms supplement constraint based search, or

with genetic algorithms

Limitations

- Only works for unmeasured causes having at least

3 unconfounded measured variables - But if there is a known or suspected common cause

of all measures (or any set of measures), it can

be estimated and partialed out - Does not give orientations of all edges
- Requires large sample sizes
- Computationally intensive
- No error probabilities are possible

Experiments

Experiment 1 Microarray analysis

Background of the Experiment

- Fat cells from mice are treated with

troglitazone, which is a member of the family of

drugs known as thiazolidendiones (TZDs) - TZDs are used in humans to increase the

efficiency of the biological actions of insulin

in diabetes and obesity - Decreased insulin sensitivity is a hallmark of

both diabetes and obesity - The action is to activate the expression of

specific genes - At the end of a particular incubation the cells

were quickly frozen to stop all biological

processes in the cell

cDNA Microarray Analysis of the 3T3-L1 Adipocyte

response to Troglitazone

- 3T3-L1 pre-adipocytes cultured in vitro
- 3T3-L1 pre-adipocytes differentiated into mature

adipocytes by addition of insulin and

dexamethasome - Mature adipocytes exposed to 10µM Troglitazone

for durations of between 15 minutes and 24 hours - Cells harvested directly in Trizol reagent and

total cellular RNA extracted by standard

procedures

cDNA Microarray Analysis of the 3T3-L1 Adipocyte

response to Troglitazone

- First strand cDNA synthesized by Reverse

Transcriptase in the presence of a-33P-dCTP - cDNA hybridized to Research Genetics GF400

(mouse) Gene Filters using standard methods - Hybridized signal captured using Storm (Molecular

Dynamics) phosphorimager and gene-specific signal

intensity extracted using Pathways 4TM software

(Research Genetics).

Data Scheme

- 20 array chips with 47 measurements
- 3 uses for each chip 20 for the first

hybridization, 20 for the second hybridization, 7

for the third hybridization - 3 treatments control without DMSO, control with

DMSO, test sample (drug DMSO) - 35 time points
- 5355 genes
- The data contains information about background,

chromosomes, release plates, the coordinates of

each spot on the plate, etc.

Normalization

- The data was logged because
- it gives a better sense of the amount of

variation - the amount of variance in a gene expression leval

was proportional to the gene expression level - Each chip was adjusted to have median zero in

order to remove global chip-to-chip variations - Outliers were removed because very high and low

intensity gene intensities are not reliably

measured

Determine the Effect of the Drug Treatment on the

Gene Expression Level Over Time

- Compare 20 genes with highest variability in

use-1 data with 20 genes with lowest variability - Perform statistical tests of hypothesis that

genes are not changing, adjusted for multiple

testing problem

Are the Measurements for the Second Use reliable?

- Chips are supposed to be re-usable
- However, the second measurement on each chip

resembles the first measurement on each chip more

closely than it resembles measurements that

occurred at the same time - Figure in next slide shows close resemblance

between different measurements on same chip, but

taken at different times

Are the Measurements for the Second Use Reliable?

Concerns

- Is it an experimental error?
- Should we use the chips only once?
- Is at least the use-1 data set reliable?
- We are using other more reliable, but more

expensive tests to evaluate these hypotheses

Future Plans

- Remove outlying genes
- Improve the test performed for data in use-1
- Clustering methods for data in use-1
- Check the data for use-2

Experiment 2 SAGE Analysis

Serial Analysis of Gene Expression (SAGE)

- Analysis of the effect of laminar shear stress on

gene expression in the vascular endothelium - Primary coronary artery endothelial cells (HCAEC)

grown to confluency on glass microscope slides - Slides placed in parallel plate flow chamber and

cells exposed to laminar shear stress for 0, 4,

8, 12, 20 and 24 hours - Cells harvested directly into Trizol reagent

(InVitrogen) and total RNA extracted - RNA used as substrate for construction of SAGE

library and SAGE tags analyzed by automated DNA

sequencing - SAGE tag data analyzed using SAGE2000 software

and gene expression measurement recorded for all

genes present

Preliminary Clustering Analysis of Genes

Regulated gt2-fold

NB Samples are clustered using the Pearson

correlation. Red, yellow and blue bars indicate

high, medium and low levels of gene expression

respectively.

Flow Loop

Flow chamber

Parallel Plate Flow Cell

Reservoir

Flow In

Cells

Flow Out

Reservoir