Loading...

PPT – Data Analysis PowerPoint presentation | free to download - id: 67a028-MmI5Z

The Adobe Flash plugin is needed to view this content

Data Analysis Part1 The Initial Questions of

the AFCS Madhu Natarajan, Rama Ranganathan AFCS

Annual Meeting 2003

Data Analysis The Initial Questions of the AFCS

- What are the goals of data analysis right now?

Our first general questionhow complex is

signaling in cells?

Data Analysis The Initial Questions of the AFCS

- What are the goals of the analysis?

- Quantitative measurement of the similarity (or

dissimilarity) of the responses to different

ligands.

Data Analysis The Initial Questions of the AFCS

- What are the goals of the analysis?

- (1) Quantitative measurement of the similarity

(or dissimilarity) of the responses to different

ligands.

- Quantitative evaluation of the interactions

between pairs of ligand responses, and an

estimation of total interaction density. The

next talk

Data Analysis The Initial Questions of the AFCS

- What are the goals of the analysis?

- Quantitative measurement of the similarity (or

dissimilarity) of the responses to different

ligands. This experiment is designed to provide

a response pattern for a ligand sampled at

several points in the signaling network. It may

or may not provide much information about

specific mechanism.

Data Analysis The Initial Questions of the AFCS

- What are the goals of the analysis?

Calcium time points

cAMP time points

. . .

- (1) Quantitative measurement of the similarity

(or dissimilarity) of the responses to different

ligands. The problems to solve - A way of combining all the multivariate output

data into general parameters that represent

signaling.

Data Analysis The Initial Questions of the AFCS

- What are the goals of the analysis?

Calcium time points

cAMP time points

. . .

- (1) Quantitative measurement of the similarity

(or dissimilarity) of the responses to different

ligands. Issues here - A way of combining all the multivariate output

data into general parameters that represent

signaling. - A way of collapsing the non-independent

outputshow many independent variables are there

in a calcium trace?

Data Analysis The Initial Questions of the AFCS

- What are the goals of the analysis?

Calcium time points

cAMP time points

. . .

- (1) Quantitative measurement of the similarity

(or dissimilarity) of the responses to different

ligands. Issues here - A way of combining all the multivariate output

data into general parameters that represent

signaling. - A way of collapsing the non-independent outputs
- A formalism for calculating the similarity of

ligand responses.

Quantitative measurement of similarity in ligand

screen data

- Merging different types of data. How can we put

all the experimental variables on a common scale?

If we do that, then how do we create a sensible

representation of the complete dataset for each

ligand?

Quantitative measurement of similarity in ligand

screen data

- Merging different types of data. How can we put

all the experimental variables on a common scale

and then create a unified representation of the

dataset for each ligand? - One approach is to make an Gaussian error model

for the unstimulated value of each variable.

Then convert each variable for a ligand into the

statistical significance of observing the value

given the unstimulated value and error model.

s

Observed value

basal

Quantitative measurement of similarity in ligand

screen data

- Merging different types of data. How can we put

all the experimental variables on a common scale

and then create a unified representation of the

dataset for each ligand? - One approach is to make an Gaussian error model

for the unstimulated value of each variable.

Then convert each variable for a ligand into the

statistical significance of observing the value

given the unstimulated value and error model.

s

Observed value

basal

So, we define a parameter S (for significance or

signaling)

Quantitative measurement of similarity in ligand

screen data

- Merging different types of data. How can we put

all the experimental variables on a common scale

and then create a unified representation of the

dataset for each ligand? - One approach is to make an Gaussian error model

for the unstimulated value of each variable. Then

convert each variable for a ligand into the

statistical significance of observing the value

given the unstimulated value and error model.

s0.7

3.8

1.5

So for an observed value of 3.8 given a basal

value of 1.5 and a standard deviation of 0.7, you

get an S value of 3.29.

Quantitative measurement of similarity in ligand

screen data

- Merging different types of data. How can we put

all the experimental variables on a common scale

and then create a unified representation of the

dataset for each ligand? - One approach is to make an Gaussian error model

for the unstimulated value of each variable. Then

convert each variable for a ligand into the

statistical significance of observing the value

given the unstimulated value and error model.

s1.4

3.8

1.5

So for an observed value of 3.8 given a basal

value of 1.5 and a standard deviation of 0.7, you

get an S value of 3.29. But if the standard

deviation was 1.4, then S is only 1.68

Quantitative measurement of similarity in ligand

screen data

- Merging different types of data. How can we put

all the experimental variables on a common scale

and then create a unified representation of the

dataset for each ligand?

3.29

Our observed variable and basal value get

transformed into these new units of statistical

significance.

Quantitative measurement of similarity in ligand

screen data

- Merging different types of data. How can we put

all the experimental variables on a common scale

and then create a unified representation of the

dataset for each ligand?

3.29

Our observed variable and basal value get

transformed into these new units of statistical

significance. Why do this? Every data element

we collect (regardless of type, time scale,

method of collection) can now be put on a common

basis for comparison, clustering, etc. The only

assumption is that the basal value is normally

distributed around its mean.

Quantitative measurement of similarity in ligand

screen data

- Merging different types of data. How can we put

all the experimental variables on a common scale

and then create a unified representation of the

dataset for each ligand?

3.29

Our observed variable and basal value get

transformed into these new units of statistical

significance. Why do this? Every data element

we collect (regardless of type, time scale,

method of collection) can now be put on a common

basis for comparison, clustering, etc. Also,

provides a suitable measure for talking about the

interaction of two ligandsthe additivity of S.

Quantitative measurement of similarity in ligand

screen data

- Merging different types of data. How can we put

all the experimental variables on a common scale

and then create a unified representation of the

dataset for each ligand?

3.29

Now what about all experimental variables?

Quantitative measurement of similarity in ligand

screen data

The Experiment Space

A highly multi-dimensional space, but one that

behaves just like three-dimensional space. Each

variable gets an independent dimension, and so a

complete single ligand dataset is one vector in

this space.

Quantitative measurement of similarity in ligand

screen data

The Experiment Space

What can we learn from this representation?

Quantitative measurement of similarity in ligand

screen data

The Experiment Space

- What can we learn from this representation?
- The response profile for each ligand is the final

S vector.

Quantitative measurement of similarity in ligand

screen data

The Experiment Space

- What can we learn from this representation?
- The response profile for each ligand is the final

S vector. - Differences between ligand responses have a

natural meaning

Quantitative measurement of similarity in ligand

screen data

The Experiment Space

- What can we learn from this representation?
- The response profile for each ligand is the final

S vector. - Differences between ligand responses have a

natural meaning

DS1,2

Quantitative measurement of similarity in ligand

screen data

The Experiment Space

- What can we learn from this representation?
- The response profile for each ligand is the final

S vector. - Differences between ligand responses have a

natural meaningand this preserves the dimensions

along which the differences occur.

DS1,2

Quantitative measurement of similarity in ligand

screen data

The Experiment Space

- What can we learn from this representation?
- The response profile for each ligand is the final

S vector. - Differences between ligand responses have a

natural meaningand this preserves the dimensions

along which the differences occur. - So lets start constructing the experiment space

for the B cell data

DS1,2

Converting raw data to S variables

LPA

Fluorescence units

Time (sec)

Converting raw data to S variables

LPA

LPA

Signaling units (S)

Fluorescence units

Time (sec)

Time (sec)

Converting raw data to S variables

LPA

LPA

Signaling units (S)

BLC

BLC

Fluorescence units

Signaling units (S)

AIG

AIG

Signaling units (S)

Time (sec)

Time (sec)

Converting all the raw data for one experiment

type to S variables

0

600

Time (sec)

Converting all the raw data for one experiment

type to S variables

0

600

Time (sec)

S

Converting all the raw data for one experiment

type to S variables

0

600

Time (sec)

S

Converting all the raw data for one experiment

type to S variables

0

600

Time (sec)

Now, 200 separate variables for the calcium

traces is clearly idiotic

S

Data reductiona cluster-based approach

0

600

Time (sec)

Data reductiona cluster-based approach

0

600

Time (sec)

1

2

3

4

5

2

1

3

4

5

The first five dimensions the reduced calcium

response

0

600

Time (sec)

1

2

3

4

5

All the dimensions (minus gene expression)

2.5

5.0

30

15

1

2

3

4

5

3

8

20

.5

1

Clustering the experiment space

5.0

2.5

30

15

1

2

3

4

5

.5

1

3

8

20

- Conclusions
- A simple transformation of raw data variables

into dimensionless S variables (units of

significance) permits construction of an unified

experiment space of all data, regardless of

source or differences in intrinsic dynamic range

and signal to noise.

- Conclusions
- A simple transformation of raw data variables

into dimensionless S variables (units of

significance) permits construction of an unified

experiment space of all data, regardless of

source or differences in intrinsic dynamic range

and signal to noise. - A potentially serious danger is

over-parameterization, the usage of many

non-independent variables to represent a

biological process (say, the inactivation of a

calcium response). We suggest that this can be

addressed through clustering variables over many

ligand responses.

- Conclusions
- A simple transformation of raw data variables

into dimensionless S variables (units of

significance) permits construction of an unified

experiment space of all data, regardless of

source or differences in intrinsic dynamic range

and signal to noise. - A potentially serious danger is

over-parameterization, the usage of many

non-independent variables to represent a

biological process (say, the inactivation of a

calcium response). We suggest that this can be

addressed through clustering variables over many

ligand responses. - 14 out of 32 ligands applied to the B cell showed

some significant response in at least one of the

54 experiment space dimensions.

- Conclusions
- A simple transformation of raw data variables

into dimensionless S variables (units of

significance) permits construction of an unified

experiment space of all data, regardless of

source or differences in intrinsic dynamic range

and signal to noise. - A potentially serious danger is

over-parameterization, the usage of many

non-independent variables to represent a

biological process (say, the inactivation of a

calcium response). We suggest that this can be

addressed through clustering variables over many

ligand responses. - 14 out of 32 ligands applied to the B cell showed

some significant response in at least one of the

54 experiment space dimensions. - Of the 14 with measurable responses, we discern 8

distinct patterns of response.

- Conclusions
- A simple transformation of raw data variables

into dimensionless S variables (units of

significance) permits construction of an unified

experiment space of all data, regardless of

source or differences in intrinsic dynamic range

and signal to noise. - A potentially serious danger is

over-parameterization, the usage of many

non-independent variables to represent a

biological process (say, the inactivation of a

calcium response). We suggest that this can be

addressed through clustering variables over many

ligand responses. - 14 out of 32 ligands applied to the B cell showed

some significant response in at least one of the

54 experiment space dimensions. - Of the 14 with measurable responses, we discern 8

distinct patterns of response. - The gene expression dataset will be integrated

into the experiment spaceas soon as we clearly

understand how to identify the gene clusters that

should be collapsed into experiment space

dimensions.

- Conclusions
- A simple transformation of raw data variables

into dimensionless S variables (units of

significance) permits construction of an unified

experiment space of all data, regardless of

source or differences in intrinsic dynamic range

and signal to noise. - A potentially serious danger is

over-parameterization, the usage of many

non-independent variables to represent a

biological process (say, the inactivation of a

calcium response). We suggest that this can be

addressed through clustering variables over many

ligand responses. - 14 out of 32 ligands applied to the B cell showed

some significant response in at least one of the

54 experiment space dimensions. - Of the 14 with measurable responses, we discern 8

distinct patterns of response. - The gene expression dataset will be integrated

into the experiment spaceas soon as we clearly

understand how to identify the gene clusters that

should be collapsed into experiment space

dimensions. - What predictions seem reasonable for the double

ligand screen? - combinations of ligands that show similar

response patterns might be expected to show

interaction, - combinations of ligands that are very different

in response might show less or no interaction.

Acknowledgements Madhu Natarajan Paul

Sternweis Elliott Ross Mel Simon Al Gilman