Analysis of Gene Expression Data presentation

About This Presentation

Transcript and Presenter's Notes

Title: Analysis of Gene Expression Data

1
Analysis of Gene Expression Data

Yoonsoo Pyon
ysp2_at_case.edu
Feb. 8th, 2008

2
MicroArray

What are they?
allow 1000s of expression analyses to be
performed concurrently.

3
DNA Chip Microarrays

Put a large number (100K) of cDNA sequences or
synthetic DNA oligomers onto a glass slide (or
other subtrate) in known locations on a grid.
Label an RNA sample and hybridize
Measure amounts of RNA bound to each square in
the grid
Make comparisons
Cancerous vs. normal tissue
Treated vs. untreated
Time course
Many applications in both basic and clinical
research

4
Goals of a Microarray Experiment

Find the genes that change expression between
experimental and control samples
Classify samples based on a gene expression
profile
Find patterns Groups of biologically related
genes that change expression together across
samples/treatments

5
Potential Microarray Applications

Drug discovery / toxicology studies
Mutation/polymorphism detection Differing
expression of genes over
Time
Tissues
Disease States
Sub-typing complex genetic diseases

6
cDNA Microarray Technologies

Spot cloned cDNAs onto a glass microscope slide
usually PCR amplified segments of plasmids
Label 2 RNA samples with 2 different colors of
flourescent dye - control vs. experimental
Mix two labeled RNAs and hybridize to the chip
Make two scans - one for each color
Combine the images to calculate ratios of amounts
of each RNA that bind to each spot

7
Spot your own Chip
Robot spotter
Ordinary glass microscope slide
8
cDNA Spotted Microarrays
9
Data Acquisition

Scan the arrays
Quantitate each spot
Subtract background
Normalize
Export a table of fluorescent intensities for
each gene in the array

10
MicroArray

Overview of image analysis
Grid finding
grid alignment
skew
Quantification of image
variable background
uneven hybridization

11
Image Analysis/Data Quantization

Feature (target ? probe) segmentation
Data extraction and quantization of
Background
Feature
Correlation of feature identity and location
within image
Display of pseudo-color image

12
Image Segmentation

13
Normalization

Can control for many of the experimental sources
of variability (systematic, not random or gene
specific)
Bring each image to the same average brightness
Can use simple math or fancy -
divide by the mean (whole chip or by sectors)
LOESS (locally weighted regression)
No sure biological standards

14
Are the Treatments Different?

Analysis of microarray data has tended to focus
on making lists of genes that are up or down
regulated between treatments
Before making these lists, ask the
question "Are the treatments different?"
Use standard statistical methods to evaluate
expression profiles for each treatment (t-test or
f-test)
If there are differences, find the genes most
responsible
If there are not significant overall differences,
then lists of genes with large fold changes may
only reflect random variability.

15
Microarray Experiment Design

Type I (n 2)
How is this gene expressed in target 1 as
compared to target 2?
Which genes show up/down regulation between the
two targets?
Type II (n gt 2)
How does the expression of gene A vary over time,
tissues, or treatments?
Do any of the expression profiles exhibit similar
patterns of expression?

16
Basic Data Analysis

Fold change (relative increase or decrease in
intensity for each gene)
Set cutoff filter for low values (background
noise)
Cluster genes by similar changes - only really
meaningful across multiple treatments or time
points
Cluster samples by similar gene expression
profiles

17
Streamlined Affy Analysis
Normalize
Filter
Present/AbsentMinimum valueFold change
Raw data
Classification
Significance
Clustering
Machine learning
t-test Rank Product
Gene lists
18
Differential Expression

Type I analysis
Look for genes with vastly different expression
under different conditions
How do you measure vastly different?
What role should derived statistics play?

19
Type I Differential Expression
20
Multiple Test

In a microarray experiment, each gene (each probe
or probe set) is really a separate experiment
Yet if you treat each gene as an independent
comparison, you will always find some with
significant differences
(the tails of a normal distribution)

21
Multiple test

Bonferroni correction
ag/n global level divided by the number of
tests
Too strict
Holms stepwise correction
If p1 lt ag/n then adjust the remaing n-1 p-values
by comparing the next p-value p2 lt ag/(n-1).
If m is the largest integer for which pm lt
ag/(n-m1), then we call gene 1, , m is
significantly differentially expressed.
Still too strict

22
False Discovery

Statisticians call false positives a "type 1
error" or a "False Discovery"
False Discovey Rate (FDR) is equal to the p-value
of the t-test X the number of genes in the array
For a p-value of 0.01 X 10,000 genes 100
false different genes
You cannot eliminate false positives, but by
choosing a more stringent p-value, you can keep
them manageable (try p0.001)
The FDR must be smaller than the number of real
differences that you find - which in turn depends
on the size of the differences and varability of
the measured expression values

23
type I , II error
24
Higher LevelMicroarray data analysis

Clustering and pattern detection
Data mining and visualization
Controls and normalization of results
Statistical validatation
Linkage between gene expression data and gene
sequence/function/metabolic pathways databases
Discovery of common sequences in co-regulated
genes
Meta-studies using data from multiple experiments

25
Clustering

Identify co-regulated genes with microarray
experiments (assumption?)
Identify genes with similar expression
Grouping unknown genes with known genes may
provide insight into function of unknown genes
Only useful for genes with varying expression
levels

26
Types of Clustering

Herarchical
Link similar genes, build up to a tree of all
Self Organizing Maps (SOM)
Split all genes into similar sub-groups
Finds its own groups (machine learning)
Principle Component Analysis (PCA)
every gene is a dimension (vector), find a single
dimension that best represents the differences in
the data

27
Clustering

Pairwise similarity measure
Minkowskys distance
if q 1 ? Manhattan distance
if q2 ? Euclidean distance
Pearsons or Spearmans correlation coefficient
Treating with missing value

28
Clustering

Data transformation
Useful before compute pairwise similarity
Ex) x1(100,200,300), x2(10,20,30),
x3(30,20,10)
Divide each component xj of p-dimensional data
vector by its Euclidean norm

29
Hierarchical Clustering
30
Hierarchical Clustering

require
Dissimilarity measure between pair of cluster
Update procedure for recalculation of merged
cluster
Weakness
do not repair false joining of data points from
previous step

31
K-means clustering
32
Self Organizing Maps
33
Classification vs. Clustering

Purpose
Clustering To partition genes into
co-expression group by suitable optimization
method
Classification To assign given condition to
preexisting classes of condition

34
Classification

How to sort samples into two classes based on
gene expression data
Cancer vs. normal
Cancer sub-types (benign vs. malignant)
Responds well to drug vs. poor response (i.e.
tamoxifen for breast cancer)

35
Support Vector Machine (SVM)

Main idea Select hyperplane that is more likely
to generalize on a future datum

36
Cross-validation

Holdout validation
K-fold cross-validation
Leave-one-out cross-validation

37
Reverse Engineering Genetic Networks

Reconstruction of the interactions in a
qualitative way from experimental data
Once determined, these networks can be used to
predict gene expression of corresponding genes
Can we reconstruct the qualitative interactions
of corresponding genes? No.
Time-dependent measurement
Knockout experiments

38
Gene Regulatory Networks
39
Network motif

Problem Dimensionality of gene regulatory
network
Breakdown this network into small components
called network motif and connect to Ensemble
network

Write a Comment

User Comments (0)

About PowerShow.com

Analysis of Gene Expression Data PowerPoint PPT Presentation