Analysis of Gene Expression Data - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Analysis of Gene Expression Data

Description:

Label an RNA sample and hybridize. Measure amounts of RNA bound to each square in the grid ... Drug discovery / toxicology studies ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 40
Provided by: Case56
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Gene Expression Data


1
Analysis of Gene Expression Data
  • Yoonsoo Pyon
  • ysp2_at_case.edu
  • Feb. 8th, 2008

2
MicroArray
  • What are they?
  • allow 1000s of expression analyses to be
    performed concurrently.

3
DNA Chip Microarrays
  • Put a large number (100K) of cDNA sequences or
    synthetic DNA oligomers onto a glass slide (or
    other subtrate) in known locations on a grid.
  • Label an RNA sample and hybridize
  • Measure amounts of RNA bound to each square in
    the grid
  • Make comparisons
  • Cancerous vs. normal tissue
  • Treated vs. untreated
  • Time course
  • Many applications in both basic and clinical
    research

4
Goals of a Microarray Experiment
  • Find the genes that change expression between
    experimental and control samples
  • Classify samples based on a gene expression
    profile
  • Find patterns Groups of biologically related
    genes that change expression together across
    samples/treatments

5
Potential Microarray Applications
  • Drug discovery / toxicology studies
  • Mutation/polymorphism detection Differing
    expression of genes over
  • Time
  • Tissues
  • Disease States
  • Sub-typing complex genetic diseases

6
cDNA Microarray Technologies
  • Spot cloned cDNAs onto a glass microscope slide
  • usually PCR amplified segments of plasmids
  • Label 2 RNA samples with 2 different colors of
    flourescent dye - control vs. experimental
  • Mix two labeled RNAs and hybridize to the chip
  • Make two scans - one for each color
  • Combine the images to calculate ratios of amounts
    of each RNA that bind to each spot

7
Spot your own Chip
Robot spotter
Ordinary glass microscope slide
8
cDNA Spotted Microarrays
9
Data Acquisition
  • Scan the arrays
  • Quantitate each spot
  • Subtract background
  • Normalize
  • Export a table of fluorescent intensities for
    each gene in the array

10
MicroArray
  • Overview of image analysis
  • Grid finding
  • grid alignment
  • skew
  • Quantification of image
  • variable background
  • uneven hybridization

11
Image Analysis/Data Quantization
  • Feature (target ? probe) segmentation
  • Data extraction and quantization of
  • Background
  • Feature
  • Correlation of feature identity and location
    within image
  • Display of pseudo-color image

12
Image Segmentation

13
Normalization
  • Can control for many of the experimental sources
    of variability (systematic, not random or gene
    specific)
  • Bring each image to the same average brightness
  • Can use simple math or fancy -
  • divide by the mean (whole chip or by sectors)
  • LOESS (locally weighted regression)
  • No sure biological standards

14
Are the Treatments Different?
  • Analysis of microarray data has tended to focus
    on making lists of genes that are up or down
    regulated between treatments
  • Before making these lists, ask the
    question "Are the treatments different?"
  • Use standard statistical methods to evaluate
    expression profiles for each treatment (t-test or
    f-test)
  • If there are differences, find the genes most
    responsible
  • If there are not significant overall differences,
    then lists of genes with large fold changes may
    only reflect random variability.

15
Microarray Experiment Design
  • Type I (n 2)
  • How is this gene expressed in target 1 as
    compared to target 2?
  • Which genes show up/down regulation between the
    two targets?
  • Type II (n gt 2)
  • How does the expression of gene A vary over time,
    tissues, or treatments?
  • Do any of the expression profiles exhibit similar
    patterns of expression?

16
Basic Data Analysis
  • Fold change (relative increase or decrease in
    intensity for each gene)
  • Set cutoff filter for low values (background
    noise)
  • Cluster genes by similar changes - only really
    meaningful across multiple treatments or time
    points
  • Cluster samples by similar gene expression
    profiles

17
Streamlined Affy Analysis
Normalize
Filter
Present/AbsentMinimum valueFold change
Raw data
Classification
Significance
Clustering
Machine learning
t-test Rank Product
Gene lists
18
Differential Expression
  • Type I analysis
  • Look for genes with vastly different expression
    under different conditions
  • How do you measure vastly different?
  • What role should derived statistics play?

19
Type I Differential Expression
20
Multiple Test
  • In a microarray experiment, each gene (each probe
    or probe set) is really a separate experiment
  • Yet if you treat each gene as an independent
    comparison, you will always find some with
    significant differences
  • (the tails of a normal distribution)

21
Multiple test
  • Bonferroni correction
  • ag/n global level divided by the number of
    tests
  • Too strict
  • Holms stepwise correction
  • If p1 lt ag/n then adjust the remaing n-1 p-values
    by comparing the next p-value p2 lt ag/(n-1).
  • If m is the largest integer for which pm lt
    ag/(n-m1), then we call gene 1, , m is
    significantly differentially expressed.
  • Still too strict

22
False Discovery
  • Statisticians call false positives a "type 1
    error" or a "False Discovery"
  • False Discovey Rate (FDR) is equal to the p-value
    of the t-test X the number of genes in the array
  • For a p-value of 0.01 X 10,000 genes 100
    false different genes
  • You cannot eliminate false positives, but by
    choosing a more stringent p-value, you can keep
    them manageable (try p0.001)
  • The FDR must be smaller than the number of real
    differences that you find - which in turn depends
    on the size of the differences and varability of
    the measured expression values

23
type I , II error
24
Higher LevelMicroarray data analysis
  • Clustering and pattern detection
  • Data mining and visualization
  • Controls and normalization of results
  • Statistical validatation
  • Linkage between gene expression data and gene
    sequence/function/metabolic pathways databases
  • Discovery of common sequences in co-regulated
    genes
  • Meta-studies using data from multiple experiments

25
Clustering
  • Identify co-regulated genes with microarray
    experiments (assumption?)
  • Identify genes with similar expression
  • Grouping unknown genes with known genes may
    provide insight into function of unknown genes
  • Only useful for genes with varying expression
    levels

26
Types of Clustering
  • Herarchical
  • Link similar genes, build up to a tree of all
  • Self Organizing Maps (SOM)
  • Split all genes into similar sub-groups
  • Finds its own groups (machine learning)
  • Principle Component Analysis (PCA)
  • every gene is a dimension (vector), find a single
    dimension that best represents the differences in
    the data

27
Clustering
  • Pairwise similarity measure
  • Minkowskys distance
  • if q 1 ? Manhattan distance
  • if q2 ? Euclidean distance
  • Pearsons or Spearmans correlation coefficient
  • Treating with missing value

28
Clustering
  • Data transformation
  • Useful before compute pairwise similarity
  • Ex) x1(100,200,300), x2(10,20,30),
    x3(30,20,10)
  • Divide each component xj of p-dimensional data
    vector by its Euclidean norm

29
Hierarchical Clustering
30
Hierarchical Clustering
  • require
  • Dissimilarity measure between pair of cluster
  • Update procedure for recalculation of merged
    cluster
  • Weakness
  • do not repair false joining of data points from
    previous step

31
K-means clustering
32
Self Organizing Maps
33
Classification vs. Clustering
  • Purpose
  • Clustering To partition genes into
    co-expression group by suitable optimization
    method
  • Classification To assign given condition to
    preexisting classes of condition

34
Classification
  • How to sort samples into two classes based on
    gene expression data
  • Cancer vs. normal
  • Cancer sub-types (benign vs. malignant)
  • Responds well to drug vs. poor response (i.e.
    tamoxifen for breast cancer)

35
Support Vector Machine (SVM)
  • Main idea Select hyperplane that is more likely
    to generalize on a future datum

36
Cross-validation
  • Holdout validation
  • K-fold cross-validation
  • Leave-one-out cross-validation

37
Reverse Engineering Genetic Networks
  • Reconstruction of the interactions in a
    qualitative way from experimental data
  • Once determined, these networks can be used to
    predict gene expression of corresponding genes
  • Can we reconstruct the qualitative interactions
    of corresponding genes? No.
  • Time-dependent measurement
  • Knockout experiments

38
Gene Regulatory Networks
39
Network motif
  • Problem Dimensionality of gene regulatory
    network
  • Breakdown this network into small components
    called network motif and connect to Ensemble
    network
Write a Comment
User Comments (0)
About PowerShow.com