Microarray - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Microarray

Description:

essential for autophagy. A_1_4. YAL008W. protein of unknown function. A_1_5. YAR062W. putative pseudogene. A_1_6. YBL087C. 60s large subunit ribosomal protein l23.e ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 71
Provided by: ntutEduT7
Category:
Tags: microarray

less

Transcript and Presenter's Notes

Title: Microarray


1
Microarray
  • Yuki Juan
  • NTUST
  • May 26, 2003

2
Content
  • Biology background of microarray
  • Design of microarray
  • The workflow of microarray
  • Image analysis of microarray
  • Data analysis of microarray
  • Discussion

3
The Biology Background of Microarray
  • The central dogma of life forms
  • DNA
  • RNA
  • Monitoring the expression of genes

4
Central Dogma
  • DNA Replication
  • --ACGCGA--
  • --TGCGCT--
  • RNA Transcription
  • --UGCGCU--
  • Protein Translation
  • --CYSALA--

5
DNA
replication
transcription
translation
DNA
RNA
Protein
6
DNA
  • The double helix
  • stable
  • Nucleotide
  • A, T, G, C
  • Base pair
  • A T
  • G C
  • Oligonucleotide
  • short DNA (tens of nucleotides, or bps)

(http//www.nhgri.nih.gov/)
7
DNA Strand
  • DNA has canonical orientation
  • read from 5 to 3
  • antiparallel one strand has direction opposite
    to its complements
  • 5 TACTGAA 3
  • 3 ATGACTT 5

8
Hydrogen Bond Makes DNA Binding Specifically
Hydrogen bond
5
3
5
3
9
Hydrogen Bond Makes DNA Binding Specifically
  • The force between base pair is hydrogen bond,
    This force let
  • A-T(U), C-G can specifically match together.

10
RNA
replication
transcription
translation
DNA
RNA
Protein
11
RNA
  • Types
  • messenger RNA
  • ribosomal RNA (rRNA)
  • transfer RNA (tRNA)
  • Gene is expressed by transcribing DNA
  • into single-stranded mRNA

12
RNA (Detailed)
(http//www.nhgri.nih.gov/)
13
Reverse Transcription
replication
transcription
translation
DNA
RNA
Protein
Reverse Transcription
By reverse transcriptase, we can convert RNA into
cDNA.
14
The Southern Blot
  • Basic DNA detection technique that has been used
    for over 30 years, known as Southern blots
  • A known strand of DNA is deposited on a solid
    support (i.e. nitocellulose paper)
  • An unknown mixed bag of DNA is labelled
    (radioactive or flourescent)
  • Unknown DNA solution allowed to mix with known
    DNA (attached to nitro paper), then excess
    solution washed off
  • If a copy of known DNA occurs in unknown
    sample, it will stick (hybridize), and labeled
    DNA will be detected on photographic film

15
mRNA Represent Gene Function
  • When measure the level of a mRNA, we are
    monitoring the activity of a gene.
  • Thus, if we can understand all the level of
    mRNAs, we can study the expression of whole
    genome.
  • Microarray takes the advantage of getting over
    10000 of blotting data in a single experiment,
    which makes monitoring the genome activity
    possible.

16
Content
  • Biology background of microarray
  • Design of microarray
  • The workflow of microarray
  • Image analysis of microarray
  • Data analysis of microarray
  • Discussion

17
Design of Microarray
  • Microarray in different context
  • The idea of microarray
  • Main type of array chips

18
mRNA Levels Compared in Many Different Contexts
  • Different tissues, same organism (brain v.
    liver)
  • Same tissue, same organism (tumor v. non-tumor)
  • Same tissue, different organisms (wt v. mutant)
  • Time course experiments (development)
  • Other special designs (e.g. to detect spatial
    patterns).

19
Idea of Microarray
Cell A
Cell B
Labeled cDNA from geneX
Hybridizaton to chip
Spot of geneX with complementary sequence of
colored cDNA
This spot shows red color after scanning.
20
Over 10,000 Hybridization Could Be Down at One
Time
21
Several Types of Arrays
  • Spotted DNA arrays
  • Developed by Pat Browns lab at Stanford
  • PCR products of full-length genes (gt100nt)
  • Affymetrix gene chips
  • Photolithography technology from computer
    industry allows building many 25-mers
  • Ink-jet microarrays from Agilent
  • 25-60-mers printed directly on glass slides
  • Flexible, rapid, but expensive

22
Array Fabrication Spotting
  • Use PCR to amplify DNA
  • Robotic "pen" deposits DNA at defined coordinates
  • approximately 1-10 ng per spot
  • Experimentation with oligos (40, 70 bp)

23
This machine can make 48 microarrays
simultaneously.
24
Array Fabrication Photolithography
  • Light activated synthesis
  • synthesize oligonucleotides on glass slides
  • 107copies per oligo in 24 x 24 um square
  • Use 20 pairs of different 25-mers per gene
  • Perfect match and mismatch

25
Array Fabrication Photolithography
26
Affymetrix Microarrays
Raw image
1.28cm
107 oligonucleotides, half perfectly match mRNA
(PM), half have one mismatch (MM) Raw gene
expression is intensity difference PM - MM
27
Agilent cDNA microarray and oligonucelotides
microarray
  • Agilent delivering printed 60-mer microarrays in
    addition to 25-mer formats.
  • The inkjet process uses standard phosphoramidite
    chemistry to deliver extremely small volumes
    (picoliters) of the chemicals to be spotted.

28
Content
  • Biology background of microarray
  • Design of microarray
  • The workflow of microarray
  • Image analysis of microarray
  • Data analysis of microarray

29
The Workflow of Microarray
sample
Plate
Plate Preparation
RNA extraction
Array Fabrication
cDNA synthesis and labeled
Array
Hybridization
Labeled cDNA
Hybridized Array
Scanning
30
cDNA Synthesis And Directly Labeling
31
Cy3 and Cy5 cDNA Hybridization On To The Chip
e.g. treatment / control normal / tumor
tissue
Sample loading
1.Loading from the corner of the cover slip It is
time consuming and easily producing bubbles.
1
2. Loading sample at the center of array then put
the slip smoothly Faster, and have lower chance
of bubble producing then the last one.
2
Sample loading
3. Loading sample at the side of the array then
put the slip on. Solution would attach to the
slip right after the slip contact with it, and
would diffuse with the movement of slip when we
slowly move down.
3
Sample loading
32
Scan
Green down regulate Red up regulate Yellow
equal level
33
Content
  • Biology background of microarray
  • Design of microarray
  • The workflow of microarray
  • Image analysis of microarray
  • Data analysis of microarray
  • Discussion

34
Image analysis
  • To find a spot
  • Convert feature into numeric data
  • Image normalization

35
The Algorithms
  • 1. Find spots Finds the location of each spot on
    the microarray.
  • 2. Cookie cutter algorithm
  • (1).Suppose the distribution of pixels vs
    intensity is Gaussian curve
  • (2).Using SD or IQR to identify the feature and
    background of each spot
  • (3).Calculates statistics for the pixel
    population

36
Interquartile Range(IQR)
D
KIQR/2
1.42 IQR
50
75
25
Boundary for rejection
Boundary for rejection
IQR
37
Feature or cookie
D
Local background
Exclusion zone
38
Data Quality
  • Irregular size or shape
  • Irregular placement
  • Low intensity
  • Saturation
  • Spot variance
  • Background variance

artifact
miss alignment
bad print
indistinguishable
saturated
39
Convert Feature Into Numeric Value
Green background
Green b.g.-corrected
Red b.g.-corrected
(R. b.g.-c)/(G. b.g.-c)
Red intensity
Green intensity
Systematic name
Red b.g.
Gene function
40
Data Normalization
  • Normalize data to correct for variances
  • Dye bias
  • Location bias
  • Intensity bias
  • Pin bias
  • Slide bias
  • Control vs. non-control spots

41
Data Normalization
Calibrated, red and green equally detected
Uncalibrated, red light under detected
42
Data Normalization
  • Assumptions
  • Overall mean average ratio should be 1
  • Most genes are not differentially expressed
  • Total intensity of dyes are equivalent

43
Intensity Dependent Normalization
44
After Normalization
45
Additional Normalization
  • Pin dependent
  • Similar to intensity dependent fit.
  • Compute individual lowess fits for each pin group
  • Within slide normalization
  • After pin dependent normalization, log ratios for
    each pin are centered around 0
  • Scale variance for each pin
  • Uses MAD (median absolute deviation)

46
Additional Normalization
  • Dye swap
  • Combine relative expression levels without
    explicit normalization
  • Compute lowess fit for
  • log2(RR/GG)/2 vs. log2(A A)/2
  • Normalized ratio is
  • log2(R/G) - c(A)
  • where c(A) is the lowess prediction

47
Content
  • Biology background of microarray
  • Design of microarray
  • The workflow of microarray
  • Image analysis of microarray
  • Data analysis of microarray
  • Discussion

48
Data analysis
  • Data filtering
  • Fold change analysis
  • Classification
  • Clustering
  • Future direction

49
Microarray Data Classification
Microarray chips
Images scanned by laser
Gene Value D26528_at
193 D26561_cds1_at -70 D26561_cds2_at
144 D26561_cds3_at 33 D26579_at
318 D26598_at 1764 D26599_at
1537 D26600_at 1204 D28114_at
707
Datasets
New sample
Data Mining and analysis
Prediction
50
The Threshold of Spots
  • Filtering - remove genes with insufficient
    variation
  • Remove insufficient spot
  • saturated, None uniform, too high background
  • Remove extreme signal
  • e.g. MaxVal - MinVal lt 500 and MaxVal/MinVal lt
    5
  • Statistical filtering (e.g. p-valuelt0.01)
  • biological reasons
  • feature reduction for algorithmic

51
Microarray Data Analysis Types
  • Different gene expression
  • Fold change analysis
  • Classification (Supervised)
  • identify disease
  • predict outcome / select best treatment
  • Clustering (Unsupervised)
  • find new biological classes / refine existing
    ones
  • exploration

52
Differential Gene Expression
  • n-fold change
  • n typically gt 2
  • May hold no biological relevance
  • Often too restrictive
  • 2? expression
  • Calculate standard deviation ?
  • Genes with expression more than 2? away are
    differentially expressed

53
Fold Changes-Scatter Plot
21
54
Fold Changes Table
23
55
Classification Multi-Class
  • Similar Approach
  • select top genes most correlated to each class
  • select best subset using cross-validation
  • build a single model separating all classes
  • Advanced
  • build separate model for each class vs. rest
  • choose model making the strongest prediction

56
Popular Classification Methods
  • Decision Trees/Rules
  • find smallest gene sets, but also false positives
  • Neural Nets -
  • work well if number of genes is reduced
  • SVM
  • good accuracy, does its own gene selection, hard
    to understand
  • K-nearest neighbor - robust for small number
    genes
  • Bayesian nets - simple, robust

57
Multi-class Data Example
  • Brain data, Pomeroy et al 2002, Nature (415), Jan
    2002
  • 42 examples, about 7,000 genes, 5 classes
  • Selected top 100 genes most correlated to each
    class
  • Selected best subset by testing 1,2, , 20 genes
    subsets, leave-one-out x-validation for each

58
Classification Other Applications
  • Combining clinical and genetic data
  • Outcome / Treatment prediction
  • Age, Sex, stage of disease, are useful
  • e.g. if Data from Male, not Ovarian cancer

59
Clustering
  • Goals
  • Find natural classes in the data
  • Identify new classes / gene correlations
  • Refine existing taxonomies
  • Support biological analysis / discovery
  • Different Methods
  • Hierarchical clustering, SOM's, etc

60
SOM clustering
  • SOM - self organizing maps
  • Preprocessing
  • filter away genes with insufficient biological
    variation
  • normalize gene expression (across samples) to
    mean 0, st. dev 1, for each gene separately.
  • Run SOM for many iterations
  • Plot the results

61
SOM K Mean By GeneSpring
27
62
Hierarchical Clustering
  • The most popular hierarchical clustering method
    used in microarray data analysis is the so called
    agglomerative method
  • works with the data in a bottom-up manner.
  • Initially, each data point forms a cluster and
    the algorithm works through the cluster sets by
    repeatedly merging the two which are the most
    similar or have the shortest distance.
  • algorithm involves the computation of the
    distance or similarity matrix
  • O(N2) complexity and thus is not very efficient.

63
Hierarchical clustering
64
Future directions
  • Algorithms optimized for small samples (the no.
    of samples will remain small for many tasks)
  • Integration with other data
  • biological networks
  • medical text
  • protein data
  • cost-sensitive classification algorithms
  • error cost depends on outcome (dont want to miss
    treatable cancer), treatment side effects, etc.

65
Integrate biological knowledge when analyzing
microarray data (from Cheng Li, Harvard SPH)
Right picture Gene Ontology tool for the
unification of biology, Nature Genetics, 25, p25
66
Content
  • Biology background of microarray
  • Design of microarray
  • The workflow of microarray
  • Image analysis of microarray
  • Data analysis of microarray
  • Discussion

67
Microarray Potential Applications
  • Biological discovery
  • new and better molecular diagnostics
  • new molecular targets for therapy
  • finding and refining biological pathways
  • Mutation and polymorphism detection
  • Recent examples
  • molecular diagnosis of leukemia, breast cancer,
    ...
  • appropriate treatment for genetic signature
  • potential new drug targets

68
Microarray Limitations
  • Cross-hybridization of sequences with high
    identity
  • Chip to chip variation
  • True measure of abundance?
  • Does mRNA levels reflect protein levels?
  • Generally, do not prove new biology - simply
    suggest genes involved in a process, a hypothesis
    that will require traditional experimental
    verification.
  • What fold change has biological relevance?
  • Need cloned EST or some sequence knowledge --
    rare messages may be undetected
  • Expensive!! Not every lab can afford experiment
    repeat.
  • The real limitation is Bioinformatics

69
Additional Information
  • Review papers on microarray
  • Genomics, gene expression and DNA arrays (Nature,
    June 2000)
  • Microarray - technology review (Natural Cell
    Biology, Aug. 2001)
  • Magic of Microarray (Scientific American, Feb.
    2002)
  • Molecular biology tutorial
  • http//www.lsic.ucla.edu/ls3/tutorials/

70
Biological data retrieval systems Entrez
http//www.ncbi.nlm.nih.gov/Database/index.html
  • A retrieval system for searching a number of
    inter-connected databases at the NCBI. It
    provides access to
  • PubMed The biomedical literature (Medline)
  • Genbank Nucleotide sequence database
  • Protein sequence database
  • Structure three-dimensional macromolecular
    structures
  • Genome complete genome assemblies
  • PopSet population study data sets
  • OMIM Online Mendelian Inheritance in Man
  • Taxonomy organisms in GenBank
  • Books online books
  • ProbeSet gene expression and microarray datasets
  • 3D Domains domains from Entrez Structure
  • UniSTS markers and mapping data
  • SNP single nucleotide polymorphisms
  • CDD conserved domains
  • 2. Entrez allows users to perform various
    searches.
Write a Comment
User Comments (0)
About PowerShow.com