Microarray - PowerPoint PPT Presentation

1 / 70

About This Presentation

Title:

Microarray

Description:

essential for autophagy. A_1_4. YAL008W. protein of unknown function. A_1_5. YAR062W. putative pseudogene. A_1_6. YBL087C. 60s large subunit ribosomal protein l23.e ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 71

Provided by: ntutEduT7

Category:

Tags: microarray

more less

Transcript and Presenter's Notes

Title: Microarray

1
Microarray

Yuki Juan
NTUST
May 26, 2003

2
Content

Biology background of microarray
Design of microarray
The workflow of microarray
Image analysis of microarray
Data analysis of microarray
Discussion

3
The Biology Background of Microarray

The central dogma of life forms
DNA
RNA
Monitoring the expression of genes

4
Central Dogma

DNA Replication
--ACGCGA--
--TGCGCT--
RNA Transcription
--UGCGCU--
Protein Translation
--CYSALA--

5
DNA
replication
transcription
translation
DNA
RNA
Protein
6
DNA

The double helix
stable
Nucleotide
A, T, G, C
Base pair
A T
G C
Oligonucleotide
short DNA (tens of nucleotides, or bps)

(http//www.nhgri.nih.gov/)
7
DNA Strand

DNA has canonical orientation
read from 5 to 3
antiparallel one strand has direction opposite
to its complements
5 TACTGAA 3
3 ATGACTT 5

8
Hydrogen Bond Makes DNA Binding Specifically
Hydrogen bond
5
3
5
3
9
Hydrogen Bond Makes DNA Binding Specifically

The force between base pair is hydrogen bond,
This force let
A-T(U), C-G can specifically match together.

10
RNA
replication
transcription
translation
DNA
RNA
Protein
11
RNA

Types
messenger RNA
ribosomal RNA (rRNA)
transfer RNA (tRNA)

Gene is expressed by transcribing DNA
into single-stranded mRNA

12
RNA (Detailed)
(http//www.nhgri.nih.gov/)
13
Reverse Transcription
replication
transcription
translation
DNA
RNA
Protein
Reverse Transcription
By reverse transcriptase, we can convert RNA into
cDNA.
14
The Southern Blot

Basic DNA detection technique that has been used
for over 30 years, known as Southern blots
A known strand of DNA is deposited on a solid
support (i.e. nitocellulose paper)
An unknown mixed bag of DNA is labelled
(radioactive or flourescent)
Unknown DNA solution allowed to mix with known
DNA (attached to nitro paper), then excess
solution washed off
If a copy of known DNA occurs in unknown
sample, it will stick (hybridize), and labeled
DNA will be detected on photographic film

15
mRNA Represent Gene Function

When measure the level of a mRNA, we are
monitoring the activity of a gene.
Thus, if we can understand all the level of
mRNAs, we can study the expression of whole
genome.
Microarray takes the advantage of getting over
10000 of blotting data in a single experiment,
which makes monitoring the genome activity
possible.

16
Content

Biology background of microarray
Design of microarray
The workflow of microarray
Image analysis of microarray
Data analysis of microarray
Discussion

17
Design of Microarray

Microarray in different context
The idea of microarray
Main type of array chips

18
mRNA Levels Compared in Many Different Contexts

Different tissues, same organism (brain v.
liver)
Same tissue, same organism (tumor v. non-tumor)
Same tissue, different organisms (wt v. mutant)
Time course experiments (development)
Other special designs (e.g. to detect spatial
patterns).

19
Idea of Microarray
Cell A
Cell B
Labeled cDNA from geneX
Hybridizaton to chip
Spot of geneX with complementary sequence of
colored cDNA
This spot shows red color after scanning.
20
Over 10,000 Hybridization Could Be Down at One
Time
21
Several Types of Arrays

Spotted DNA arrays
Developed by Pat Browns lab at Stanford
PCR products of full-length genes (gt100nt)
Affymetrix gene chips
Photolithography technology from computer
industry allows building many 25-mers
Ink-jet microarrays from Agilent
25-60-mers printed directly on glass slides
Flexible, rapid, but expensive

22
Array Fabrication Spotting

Use PCR to amplify DNA
Robotic "pen" deposits DNA at defined coordinates
approximately 1-10 ng per spot
Experimentation with oligos (40, 70 bp)

23
This machine can make 48 microarrays
simultaneously.
24
Array Fabrication Photolithography

Light activated synthesis
synthesize oligonucleotides on glass slides
107copies per oligo in 24 x 24 um square
Use 20 pairs of different 25-mers per gene
Perfect match and mismatch

25
Array Fabrication Photolithography
26
Affymetrix Microarrays
Raw image
1.28cm
107 oligonucleotides, half perfectly match mRNA
(PM), half have one mismatch (MM) Raw gene
expression is intensity difference PM - MM
27
Agilent cDNA microarray and oligonucelotides
microarray

Agilent delivering printed 60-mer microarrays in
addition to 25-mer formats.
The inkjet process uses standard phosphoramidite
chemistry to deliver extremely small volumes
(picoliters) of the chemicals to be spotted.

28
Content

Biology background of microarray
Design of microarray
The workflow of microarray
Image analysis of microarray
Data analysis of microarray

29
The Workflow of Microarray
sample
Plate
Plate Preparation
RNA extraction
Array Fabrication
cDNA synthesis and labeled
Array
Hybridization
Labeled cDNA
Hybridized Array
Scanning
30
cDNA Synthesis And Directly Labeling
31
Cy3 and Cy5 cDNA Hybridization On To The Chip
e.g. treatment / control normal / tumor
tissue
Sample loading
1.Loading from the corner of the cover slip It is
time consuming and easily producing bubbles.
1
2. Loading sample at the center of array then put
the slip smoothly Faster, and have lower chance
of bubble producing then the last one.
2
Sample loading
3. Loading sample at the side of the array then
put the slip on. Solution would attach to the
slip right after the slip contact with it, and
would diffuse with the movement of slip when we
slowly move down.
3
Sample loading
32
Scan
Green down regulate Red up regulate Yellow
equal level
33
Content

Biology background of microarray
Design of microarray
The workflow of microarray
Image analysis of microarray
Data analysis of microarray
Discussion

34
Image analysis

To find a spot
Convert feature into numeric data
Image normalization

35
The Algorithms

1. Find spots Finds the location of each spot on
the microarray.
2. Cookie cutter algorithm
(1).Suppose the distribution of pixels vs
intensity is Gaussian curve
(2).Using SD or IQR to identify the feature and
background of each spot
(3).Calculates statistics for the pixel
population

36
Interquartile Range(IQR)
D
KIQR/2
1.42 IQR
50
75
25
Boundary for rejection
Boundary for rejection
IQR
37
Feature or cookie
D
Local background
Exclusion zone
38
Data Quality

Irregular size or shape
Irregular placement
Low intensity

Saturation
Spot variance
Background variance

artifact
miss alignment
bad print
indistinguishable
saturated
39
Convert Feature Into Numeric Value
Green background
Green b.g.-corrected
Red b.g.-corrected
(R. b.g.-c)/(G. b.g.-c)
Red intensity
Green intensity
Systematic name
Red b.g.
Gene function
40
Data Normalization

Normalize data to correct for variances
Dye bias
Location bias
Intensity bias
Pin bias
Slide bias
Control vs. non-control spots

41
Data Normalization
Calibrated, red and green equally detected
Uncalibrated, red light under detected
42
Data Normalization

Assumptions
Overall mean average ratio should be 1
Most genes are not differentially expressed
Total intensity of dyes are equivalent

43
Intensity Dependent Normalization
44
After Normalization
45
Additional Normalization

Pin dependent
Similar to intensity dependent fit.
Compute individual lowess fits for each pin group
Within slide normalization
After pin dependent normalization, log ratios for
each pin are centered around 0
Scale variance for each pin
Uses MAD (median absolute deviation)

46
Additional Normalization

Dye swap
Combine relative expression levels without
explicit normalization
Compute lowess fit for
log2(RR/GG)/2 vs. log2(A A)/2
Normalized ratio is
log2(R/G) - c(A)
where c(A) is the lowess prediction

47
Content

Biology background of microarray
Design of microarray
The workflow of microarray
Image analysis of microarray
Data analysis of microarray
Discussion

48
Data analysis

Data filtering
Fold change analysis
Classification
Clustering
Future direction

49
Microarray Data Classification
Microarray chips
Images scanned by laser
Gene Value D26528_at
193 D26561_cds1_at -70 D26561_cds2_at
144 D26561_cds3_at 33 D26579_at
318 D26598_at 1764 D26599_at
1537 D26600_at 1204 D28114_at
707
Datasets
New sample
Data Mining and analysis
Prediction
50
The Threshold of Spots

Filtering - remove genes with insufficient
variation
Remove insufficient spot
saturated, None uniform, too high background
Remove extreme signal
e.g. MaxVal - MinVal lt 500 and MaxVal/MinVal lt
5
Statistical filtering (e.g. p-valuelt0.01)
biological reasons
feature reduction for algorithmic

51
Microarray Data Analysis Types

Different gene expression
Fold change analysis
Classification (Supervised)
identify disease
predict outcome / select best treatment
Clustering (Unsupervised)
find new biological classes / refine existing
ones
exploration

52
Differential Gene Expression

n-fold change
n typically gt 2
May hold no biological relevance
Often too restrictive
2? expression
Calculate standard deviation ?
Genes with expression more than 2? away are
differentially expressed

53
Fold Changes-Scatter Plot
21
54
Fold Changes Table
23
55
Classification Multi-Class

Similar Approach
select top genes most correlated to each class
select best subset using cross-validation
build a single model separating all classes
Advanced
build separate model for each class vs. rest
choose model making the strongest prediction

56
Popular Classification Methods

Decision Trees/Rules
find smallest gene sets, but also false positives
Neural Nets -
work well if number of genes is reduced
SVM
good accuracy, does its own gene selection, hard
to understand
K-nearest neighbor - robust for small number
genes
Bayesian nets - simple, robust

57
Multi-class Data Example

Brain data, Pomeroy et al 2002, Nature (415), Jan
2002
42 examples, about 7,000 genes, 5 classes
Selected top 100 genes most correlated to each
class
Selected best subset by testing 1,2, , 20 genes
subsets, leave-one-out x-validation for each

58
Classification Other Applications

Combining clinical and genetic data
Outcome / Treatment prediction
Age, Sex, stage of disease, are useful
e.g. if Data from Male, not Ovarian cancer

59
Clustering

Goals
Find natural classes in the data
Identify new classes / gene correlations
Refine existing taxonomies
Support biological analysis / discovery
Different Methods
Hierarchical clustering, SOM's, etc

60
SOM clustering

SOM - self organizing maps
Preprocessing
filter away genes with insufficient biological
variation
normalize gene expression (across samples) to
mean 0, st. dev 1, for each gene separately.
Run SOM for many iterations
Plot the results

61
SOM K Mean By GeneSpring
27
62
Hierarchical Clustering

The most popular hierarchical clustering method
used in microarray data analysis is the so called
agglomerative method
works with the data in a bottom-up manner.
Initially, each data point forms a cluster and
the algorithm works through the cluster sets by
repeatedly merging the two which are the most
similar or have the shortest distance.
algorithm involves the computation of the
distance or similarity matrix
O(N2) complexity and thus is not very efficient.

63
Hierarchical clustering
64
Future directions

Algorithms optimized for small samples (the no.
of samples will remain small for many tasks)
Integration with other data
biological networks
medical text
protein data
cost-sensitive classification algorithms
error cost depends on outcome (dont want to miss
treatable cancer), treatment side effects, etc.

65
Integrate biological knowledge when analyzing
microarray data (from Cheng Li, Harvard SPH)
Right picture Gene Ontology tool for the
unification of biology, Nature Genetics, 25, p25
66
Content

Biology background of microarray
Design of microarray
The workflow of microarray
Image analysis of microarray
Data analysis of microarray
Discussion

67
Microarray Potential Applications

Biological discovery
new and better molecular diagnostics
new molecular targets for therapy
finding and refining biological pathways
Mutation and polymorphism detection
Recent examples
molecular diagnosis of leukemia, breast cancer,
...
appropriate treatment for genetic signature
potential new drug targets

68
Microarray Limitations

Cross-hybridization of sequences with high
identity
Chip to chip variation
True measure of abundance?
Does mRNA levels reflect protein levels?
Generally, do not prove new biology - simply
suggest genes involved in a process, a hypothesis
that will require traditional experimental
verification.
What fold change has biological relevance?
Need cloned EST or some sequence knowledge --
rare messages may be undetected
Expensive!! Not every lab can afford experiment
repeat.
The real limitation is Bioinformatics

69
Additional Information

Review papers on microarray
Genomics, gene expression and DNA arrays (Nature,
June 2000)
Microarray - technology review (Natural Cell
Biology, Aug. 2001)
Magic of Microarray (Scientific American, Feb.
2002)
Molecular biology tutorial
http//www.lsic.ucla.edu/ls3/tutorials/

70
Biological data retrieval systems Entrez
http//www.ncbi.nlm.nih.gov/Database/index.html

A retrieval system for searching a number of
inter-connected databases at the NCBI. It
provides access to
PubMed The biomedical literature (Medline)
Genbank Nucleotide sequence database
Protein sequence database
Structure three-dimensional macromolecular
structures
Genome complete genome assemblies
PopSet population study data sets
OMIM Online Mendelian Inheritance in Man
Taxonomy organisms in GenBank
Books online books
ProbeSet gene expression and microarray datasets
3D Domains domains from Entrez Structure
UniSTS markers and mapping data
SNP single nucleotide polymorphisms
CDD conserved domains
2. Entrez allows users to perform various
searches.