Gene expression analysis - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Gene expression analysis

Description:

Tutorial 7 Gene expression analysis * * * * * Edit the input matrix: Transpose,Normalize,Randomize * Hierarchical clustering K-means clustering In the input matrix ... – PowerPoint PPT presentation

Number of Views:548
Avg rating:3.0/5.0
Slides: 58
Provided by: YaelyM
Category:

less

Transcript and Presenter's Notes

Title: Gene expression analysis


1
Tutorial 7
  • Gene expression analysis

2
Gene expression analysis
  • Expression data
  • GEO
  • UCSC
  • ArrayExpress
  • General clustering methods
  • Unsupervised Clustering
  • Hierarchical clustering
  • K-means clustering
  • Tools for clustering
  • EPCLUST
  • Mev
  • Functional analysis
  • Go annotation

3
Gene expression data sources
Microarrays
RNA-seq experiments
4
Expression Data Matrix
Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6
Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9
Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7
Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1
Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3
Gene 5 0.1 2.6 2.2 2.7 -2.1
Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9
  • Each column represents all the gene expression
    levels from a single experiment.
  • Each row represents the expression of a gene
    across all experiments.

5
Expression Data Matrix
Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6
Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9
Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7
Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1
Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3
Gene 5 0.1 2.6 2.2 2.7 -2.1
Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9
  • Each element is a log ratio log2 (T/R).
  • T - the gene expression level in the testing
    sample
  • R - the gene expression level in the
    reference sample

6
Expression Data Matrix
Black indicates a log ratio of zero, i.e. TR
Green indicates a negative log ratio, i.e. TltR
Grey indicates missing data
Red indicates a positive log ratio, i.e. TgtR
7
Microarray Data Different representations
TgtR
Log ratio
Log ratio
TltR
Exp
Exp
8
How to search for expression profiles
  • GEO (Gene Expression Omnibus)
  • http//www.ncbi.nlm.nih.gov/geo/
  • Human genome browser
  • http//genome.ucsc.edu/
  • ArrayExpress
  • http//www.ebi.ac.uk/arrayexpress/

9
(No Transcript)
10
Searching for expression profiles in the GEO
Datasets - suitable for analysis with GEO tools
Expression profiles by gene
Probe sets
Microarray experiments
Groups of related microarray experiments
11
Clustering
Download dataset
Statistic analysis
12
Clustering analysis
13
Clustering
Download dataset
Statistic analysis
14
The expression distribution for different lines
in the cluster
15
Searching for expression profiles in the Human
Genome browser.
16
Keratine 10 is highly expressed in skin
17
ArrayExpress
http//www.ebi.ac.uk/arrayexpress/
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
How to analyze gene expression data
23
Unsupervised Clustering - Hierarchical Clustering

24
Hierarchical Clustering
genes with similar expression patterns are
grouped together and are connected by a series of
branches (dendrogram).
2
1
6
3
5
4
Leaves (shapes in our case) represent genes and
the length of the paths between leaves represents
the distances between genes.
25
  • How to determine the similarity between two
    genes? (for clustering)

Patrik D'haeseleer, How does gene expression
clustering work?, Nature Biotechnology 23, 1499 -
1501 (2005) , http//www.nature.com/nbt/journal/v
23/n12/full/nbt1205-1499.html
26
Hierarchical clustering finds an entire hierarchy
of clusters.
If we want a certain number of clusters we need
to cut the tree at a level indicates that number
(in this case - four).
27
Hierarchical clustering result
Five clusters
28
Unsupervised Clustering K-means clustering
An algorithm to classify the data into K number
of groups.
K4
29
How does it work?
1
2
3
4
The centroid of each of the k clusters becomes
the new means.
k initial "means" (in this casek3) are randomly
selected from the data set (shown in color).
k clusters are created by associating every
observation with the nearest mean
Steps 2 and 3 are repeated until convergence has
been reached.
The algorithm divides iteratively the genes into
K groups and calculates the center of each group.
The results are the optimal groups (center
distances) for K clusters.
30
How should we determine K?
  • Trial and error
  • Take K as square root of gene number

31
Tools for clustering - EPclust
http//www.bioinf.ebc.ee/EP/EP/EPCLUST/
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
In the input matrix each column should
represents a gene and each row should represent
an experiment (or individual).
Hierarchical clustering
Edit the input matrix Transpose,Normalize,Randomi
ze
K-means clustering
39
In the input matrix each column should
represents a gene and each row should represent
an experiment (or individual).
Hierarchical clustering
40
Data
Clusters
41
In the input matrix each column should
represents a gene and each row should represent
an experiment (or individual).
K-means clustering
42
Samples found in cluster
Graphical representation of the cluster
Graphical representation of the cluster
43
10 clusters, as requested
44
Tools for clustering - MeV
http//www.tm4.org/mev/
45
Gene expression function analysis
1007_s_at 1053_at 117_at 121_at 1255_g_at 1294_at
1316_at 1320_at 1405_i_at 1431_at 1438_at 1487_at
1494_f_at 1598_g_at
What can we learn from clusters?
46
Gene Ontology (GO)
http//www.geneontology.org/
The Gene Ontology project provides an ontology of
defined terms representing gene
product properties. The ontology covers three
domains
47
Gene Ontology (GO)
  • Cellular Component (CC) - the parts of a cell or
    its extracellular environment.
  • Molecular Function (MF) - the elemental
    activities of a gene product at the molecular
    level, such as binding or catalysis.
  • Biological Process (BP) - operations or sets of
    molecular events with a defined beginning and
    end, pertinent to the functioning of integrated
    living units cells, tissues, organs,
    and organisms.

48
The GO tree
49
GO sources
ISS Inferred from Sequence/Structural
Similarity IDA Inferred from Direct Assay IPI
Inferred from Physical Interaction TAS Traceab
le Author Statement NAS Non-traceable Author
Statement IMP Inferred from Mutant
Phenotype IGI Inferred from Genetic
Interaction IEP Inferred from Expression
Pattern IC Inferred by Curator ND No Data
available IEA Inferred from electronic annotation
50
Search by AmiGO
51
Results for alpha-synuclein
52
 
DAVID 
http//david.abcc.ncifcrf.gov/
Functional Annotation Bioinformatics Microarray
Analysis
  • Identify enriched biological themes,
    particularly GO terms
  • Discover enriched functional-related
    gene/protein groups
  • Cluster redundant annotation terms
  • Explore gene names in batch 

53
annotation
classification
ID conversion
54
Functional annotation
Upload
Annotation options
55
(No Transcript)
56
(No Transcript)
57
Gene expression analysis
  • Expression data
  • GEO
  • UCSC
  • ArrayExpress
  • General clustering methods
  • Unsupervised Clustering
  • Hierarchical clustering
  • K-means clustering
  • Tools for clustering
  • EPCLUST
  • Mev
  • Functional analysis
  • Go annotation
Write a Comment
User Comments (0)
About PowerShow.com