Statistics Tools in GeneSpring - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Statistics Tools in GeneSpring

Description:

Title: PowerPoint Presentation Author: train Last modified by: jjin Created Date: 11/15/2001 2:46:32 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 26

Provided by: tra760

Category:

more less

Transcript and Presenter's Notes

Title: Statistics Tools in GeneSpring

1
Statistics Tools in GeneSpring

The Center for Bioinformatics
UNC at Chapel Hill
Jianping Jin Ph.D.
Bioinformatics Scientist
Phone (919)843-6015
E-mail jjin_at_email.unc.edu
Fax (919)966-6821

2
What GeneSpring Can do?

Works with both Affymetrix and two-color data.
Views data graphically (classification, graph,
tree, scatter plot, Vann Diagram )
Performs statistical analyses.
Annotates genes (updating from GenBank,
LocusLink, Unigene biochemical pathways).

3
What statistical analyses does GS do?

Clustering
k-means (non-hierarchical)
Self-organizing map
Gene trees (hierarchical dendrograms).
principal component analysis
T-Test analyses ( p-values)
Like a known gene or average of genes
Like a pattern drawn with the mouse
Genes with high confidence
Genes with relative expression in certain ranges
Pathway analysis finding genes that fit in a
certain place in a pathway.
Sequence analysis to automatically find
regulatory sequences.
Automatic functional annotation of sub-trees in
dendrograms.

4
Tree Clustering

Standard correlation
Smooth correlation
Change correlation
Upregulated correlation
Pearson correlation
Spearman correlation
Spearman confidence
Two-sided Spearman confidence
Distance

5
Notations to the Formulas

Result the result of the calculation for genes
A and B.
n the numbers of samples being correlated
over.
a the vector (a 1 , a 2 , a 3 ... a n) of
expression values for gene A.
b the vector (b 1 , b 2 , b 3 ... b n) of
expression values for gene B.
a.b a 1 b 1 a 2 b 2 ...a n b n.
asquare root(a.a)

6
Standard Correlation

Equation a.b/(ab)
also called Pearson correlation around zero.
Measure the angular separation of expression
vectors for genes A B.
Answer the question do the peaks match up?

7
Pearson Correlation

Equation A.B / ( A B )
Very similar to the Std correlation, except it
measures the angle of expression vector for genes
A B around the mean of the expression vectors.
A the mean of all element in vector a - the
value from each element in a.
Do the same for b to make a vector B

8
Spearman Confidence

r the value of the Spearman correlation,
SC 1-(probability you would get a value of r
or higher by chance)
A measure of similarity, not a correlation
High SC value if a high Spearman corr, a low
p-value.
Takes account of the number of sub-experiment in
your experiment set.

9
Two-sided Spearman Confidence

A measure of similarity, very similar to the
Spearman conf.
Two-sided test of whether the Spearman corr. is
either significantly gt/lt zero.
what genes behave similarly/opposite to a
specific gene?
Probably not good for k-means/tree clustering.
1-(probability you would get a Spearman
correlation of r or higher, or -r or lower,
by chance).

10
Distance

A measurement of dissimilarity, not a correlation
at all.
Euclidian dist. b/w expression Profiles ( values
for each point in N-dimensional space) of genes A
B.
Distance a-b/square root of N (expt. points)

11
Special Case Correlations

Smooth correlation, Change correlation and
Upregulated correlation.
All three modified version of the Std.
correlation.
Only make sense when data in a sequence, such as
before/after, a time series, or a drug series.

12
Smooth Correlation

Make a new vector A from a by interpolating the
avg. of each consecutive pair of elements of a.
Insert this new value b/w the old values
Do this for each pair of elements that would
connected by a line in the graph screen
Do the same to make a vector B from b.

13
Change Correlation

The opposite of what the Smooth corr. looks for.
Only the chg. in expression level of adjacent
points.
Similar to the Std corr., but use an arc tangent
transformation of ratio b/w adjacent pairs of
points to create the expr. vector. Less sensitive
to outliers than using the ratio directly.
The value created b/w two values a i and a i1 is
atan(a i1 /a i )- ? /4

14
Upregulated Correlation

Very similar to the Chg. Corr., but it only
considers positive changes. All negative values
for the arc tangent are set to zero.
Make a new vector A from a by looking at the
change b/w each pair of elements of a.
The value created b/w two values a i and a i1
is max(atan(a i1 /a i )- ? /4.0).

15
Algorithm to Build Gene Tree

Determine if there is only one gene or subtree
left. If yes, go to step five.
Find the two closest genes/subtrees.
Merge these two into one subtree.
Return to step one.
Merge together branches where the distance
between sub-branches is less than the separation
ratio, subject to considering genes with less
than the minimum distance apart.

16
Algorithm to Build Tree

The minimum distance how far down the tree
discrete branches are depicted. Higher number,
more genes in a group, less specific.
The separate ratio the correlation diff. b/w
groups of clustered genes. B/w 0 and 1.
Increasing separation increases the branchiness
of the tree.

17
Principal Components Analysis

Not a clustering method.
PCA, the most abundant building blocks, a set of
expression patterns.
1st PC is obtained by finding the linear
combination of expr. Patterns for the most of
variability in the data. And so on.

18
k-Means Clustering

Divides genes into a user-defined (k) of
equal-sized groups, based on their expression
patterns.
Creates centroids at the avg. location of each
group of genes
With each iteration, genes are reassigned to the
group with closest centroid
After all of the genes have been reassigned, the
location of the centroids is recalculated.

19
Self-Organizing Maps

Similar to k-means clustering.
Relationship b/w groups in a 2-D map.
Best represents the variability of the data,
while still maintaining similarity b/w adjacent
nodes, e.g. point 1,2 is one unit away from 1,3.

20
What does t-test mean in GS

Replicates one-sample Students t-test
Comparisons for 2 groups Students two-sample
t-test.
Comparisons for multiple groups one-way analysis
of variance (ANOVA).
Filtering genes based on a one-sample t-test of
the mean expression level across replicates vs. a
reference value (Expression Percentage
Restriction)

21
Filter Genes Analysis Tools

Global Error Model filters out genes with large
std deviations or error values.
Raw data filtering gets rid of genes too close
to the background.
Sample to sample comparison fold cmp. Among
different samples.
Statistical Group cmp. filters out genes not
vary significantly across different groups.
Data File Restriction based on other field ( P/S
call, /- pairs).

22
Statistical Group Comparison

Genes statistically significant difference in the
mean expression levels across all group.
For two groups Studentss two-sample t-test.
For multiple groups ANOVA
Non-parametric cmp. for each gene, the rank
order is used for analysis. Wilcoxon two-sample
test (Mann-Whitney U test), the Kruskal-Wallis
test for multiple groups.

23
Data Normalization

In two-color experiments, normalizing vs. the
control channel (green) for each gene.
Normalize each sample to itself or to a positive
control. Make diff. samples comparable to one
another.
Normalizing each gene to itself remove the
differing intensity scales from multiple expt
readings (highly recommended if not using a
two-color experiment.

24
NCI-60 cell lines
25
DrugActivity_AT

Write a Comment

User Comments (0)