Clustering - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Clustering

Description:

Tutorial 8 Clustering * * * * Edit the input matrix: Transpose,Normalize,Randomize * Hierarchical clustering K-means clustering In the input matrix each column should ... – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 43
Provided by: YaelyM
Category:
Tags: clustering | data

less

Transcript and Presenter's Notes

Title: Clustering


1
Tutorial 8
  • Clustering

2
Clustering
  • General Methods
  • Unsupervised Clustering
  • Hierarchical clustering
  • K-means clustering
  • Expression data
  • GEO
  • UCSC
  • ArrayExpress
  • Tools
  • EPCLUST
  • Mev

3
Microarray - Reminder
4
Expression Data Matrix
Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6
Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9
Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7
Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1
Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3
Gene 5 0.1 2.6 2.2 2.7 -2.1
Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9
  • Each column represents all the gene expression
    levels from a single experiment.
  • Each row represents the expression of a gene
    across all experiments.

5
Expression Data Matrix
Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6
Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9
Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7
Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1
Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3
Gene 5 0.1 2.6 2.2 2.7 -2.1
Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9
  • Each element is a log ratio log2 (T/R).
  • T - the gene expression level in the testing
    sample
  • R - the gene expression level in the
    reference sample

6
Microarray Data Matrix
Black indicates a log ratio of zero, i.e. TR
Green indicates a negative log ratio, i.e. TltR
Grey indicates missing data
Red indicates a positive log ratio, i.e. TgtR
7
Microarray Data Different representations
TgtR
Log ratio
Log ratio
TltR
Exp
Exp
8
A real example
500 genes 3 knockdown conditions To complicate
to analyze without help
9
Microarray Data Clusters
10
  • How to determine the similarity between two
    genes? (for clustering)

Patrik D'haeseleer, How does gene expression
clustering work?, Nature Biotechnology 23, 1499 -
1501 (2005) , http//www.nature.com/nbt/journal/v
23/n12/full/nbt1205-1499.html
11
Unsupervised Clustering
Hierarchical Clustering
12
Hierarchical Clustering
genes with similar expression patterns are
grouped together and are connected by a series of
branches (dendrogram).
2
1
6
3
5
4
Leaves (shapes in our case) represent genes and
the length of the paths between leaves represents
the distances between genes.
13
Hierarchical clustering finds an entire hierarchy
of clusters.
If we want a certain number of clusters we need
to cut the tree at a level indicates that number
(in this case - four).
14
Hierarchical clustering result
Five clusters
15
K-means Clustering
An algorithm to classify the data into K number
of groups.
K4
16
How does it work?
1
2
3
4
The centroid of each of the k clusters becomes
the new means.
k initial "means" (in this casek3) are randomly
selected from the data set (shown in color).
k clusters are created by associating every
observation with the nearest mean
Steps 2 and 3 are repeated until convergence has
been reached.
The algorithm divides iteratively the genes into
K groups and calculates the center of each group.
The results are the optimal groups (center
distances) for K clusters.
17
Different types of clustering different results
18
How to search for expression profiles
  • GEO (Gene Expression Omnibus)
  • http//www.ncbi.nlm.nih.gov/geo/
  • Human genome browser
  • http//genome.ucsc.edu/
  • ArrayExpress
  • http//www.ebi.ac.uk/arrayexpress/

19
(No Transcript)
20
Searching for expression profiles in the GEO
Datasets - suitable for analysis with GEO tools
Expression profiles by gene
Probe sets
Microarray experiments
Groups of related microarray experiments
21
Clustering
Download dataset
Statistic analysis
22
Clustering analysis
23
Clustering
Download dataset
Statistic analysis
24
The expression distribution for different lines
in the cluster
25
(No Transcript)
26
Searching for expression profiles in the Human
Genome browser.
27
Keratine 10 is highly expressed in skin
28
ArrayExpress
http//www.ebi.ac.uk/arrayexpress/
29
(No Transcript)
30
What can we do with all the expression profiles?
Clusters!
How?
EPCLUST
http//www.bioinf.ebc.ee/EP/EP/EPCLUST/
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
In the input matrix each column should
represents a gene and each row should represent
an experiment (or individual).
Hierarchical clustering
Edit the input matrix Transpose,Normalize,Randomi
ze
K-means clustering
38
Data
Clusters
39
In the input matrix each column should
represents a gene and each row should represent
an experiment (or individual).
Hierarchical clustering
Edit the input matrix Transpose,Normalize,Randomi
ze
K-means clustering
40
Samples found in cluster
Graphical representation of the cluster
Graphical representation of the cluster
41
10 clusters, as requested
42
Multi experiment viewer
http//www.tm4.org/mev/
Write a Comment
User Comments (0)
About PowerShow.com