Semester project: Microarrays and Statistics Part 2 of 2: Introduction to Microarrays - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Semester project: Microarrays and Statistics Part 2 of 2: Introduction to Microarrays

Description:

Semester project: Microarrays and Statistics Part 2 of 2: Introduction to Microarrays – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 56
Provided by: albertomac
Category:

less

Transcript and Presenter's Notes

Title: Semester project: Microarrays and Statistics Part 2 of 2: Introduction to Microarrays


1
Semester projectMicroarrays and
StatisticsPart 2 of 2Introduction to
Microarrays
2
Alberto Macias-Duarte BioME-iPC FellowGraduate
StudentSchool of NaturalResourcesUniversity of
Arizona
3
Definitions
GENETICS the study of heritable traits
genes. GENES the information to make
proteins. PROTEINS building blocks for the
machines that perform lifes functions within all
cells. GENOME contains all the genes more a
blueprint for the individual GENOMICS tools to
study ALL the genes of an individual (humans,
animals, plants, fungi, bacteria)
4
Genes and chromosomes
5
(No Transcript)
6
Genes, Proteins and Molecular Machines
  • Our DNA contains our genes
  • Composed of four chemicals symbolized by A T C
    G AT, CG
  • The order of the letters determine the protein
    each gene makes
  • The types of proteins and where and when they
    are made makes a person a person, a plant a plant

BUT proteins are not made directly from DNA
  • In many organisms (us plants) most of the DNA
    is not genes.

7
The Central Dogma of Molecular Biology
  • DNA codes for the production of RNA and RNA codes
    for the production of protein (polypeptides)

8
The Central Dogma of Molecular Biology
9
5 stages to understanding genomes
1. Identify the order of the A, T, C, G s
2. Determine where the genes are
3. Determine what each gene does
  • Determine where each gene is expressed and how is
    it controlled

5. Determine the nature and function of the
non-genic DNA
10
Why is genome and gene information so important?
  • Provides information and tools to address large
    number of biological questions from ecosystems to
    molecular networks
  • Questions can be tackled now that could not be
    tackled before
  • Information can be used to improve human health,
    agriculture and environment

11
5 stages to understanding genomes
1. Identify the order of the A, T, C, G s
In 2003 the human and first plant genome
sequences were announced
Explosion of genome sequences since that time!!!!
12
5 stages to understanding genomes
1. Determine where genes are
13
These are often much larger than genes
14
What kinds of sequences are in non-coding regions?
  • Simple sequence repeats, CAAn
  • Transposable elements
  • DNA to DNA
  • DNA to RNA to DNA (retrotransposons)
  • Pseudogenes (pieces of genes that are non
    functional)
  • Other

15
In Eukaryotes RNA from Genes Is Extensively
Processed to Form mRNA
  • mRNA are products of splicing from primary
    transcript

exon
intron
16
5 stages to understanding genomes
3. Determine what each gene does
17
Findings from comparing genomes
1. Many genes (50) are very similar between
diverse species highly conserved domains.
2. The genes that are shared often carry out key
functions that are similar in different species,
i.e. transcription, protein synthesis
3. Hypothesize function of gene in one species if
shares domains with a gene with known function
from another species.
4. The DNA sequences that are not genes are
rarely shared between species, but there are
common classes of sequences in many organisms,
i.e. transposons, ss repeats, etc.
5. Sizes of intergenic regions vary dramatically
even between closely related species major
contributor to differences in genome sizes.
18
5 stages to understanding genomes
  • Determine where each gene is expressed and how is
    it controlled

All the parts of animal bodies have the same DNA,
yet different tissues have distinct functions.
Only a subset of genes are functioning in each
tissue or organ.
How do different cells know which genes should
function and which should not and how are only
the correct genes expressed?
Field of Gene Regulation
19
A powerful tool to investigate gene regulation
  • Microarrays
  • Monitor the whole genome in a single experiment
  • Identify genes that are differentially expressed
    between two treatments, tissues, developmental
    stages or genotypes

20
cDNA
Expressed Sequence Tag
Sequence the cDNAs made from different tissues,
developmental stages or disease states, etc.
21
Microarrays
Each spot is a different gene immobilized onto a
slide. Can look at 50,000 genes in one
experiment. Ask which genes are active and
which genes are not under different conditions,
in different tissues, in healthy and diseased
individuals, in wild type and mutant plants
22
Genes more active in Tissue 1
Genes mores active in Tissue 2
Genes active to same extent in both tissues
Courtesy of R. Elumalai
23
Monogenic Traits
  • Usually on a discrete scale (absence vs.
    presence O, A, B blood types, etc.)
  • Determined for a single gene

24
Quantitative Traits
  • Usually on a continuous scale (weight, yield,
    height, drought tolerance, etc.)
  • Determined for many genes (polygenic)

25
Microarrays
26
Why do we need statistical Inference in
microarray data analysis?
1. mRNA content is a variable trait - among
individuals - among tissue types within
individuals - among cells within tissue type 2.
It is not practical to sample every individual,
tissue, or cell in the population of interest
27
Why do we need statistical Inference in
microarray data analysis?
  • Statistics allow us to make statements about an
    entire population from a sample of that
    population
  • Uncertainty about the statement is included (the
    famous P-value)

28
Simple example of a microarray experiment and
data analysis
Gene expression of seedlings exposed to high
salinity in wheat (Triticum)
29
Microarray experiment2 treatments, 4 genes and 5
replicates
Treatment 1 Seedlings growing in saline soil
Treatment 2 Seedlings growing in normal soil
-
-
-





-


-
-

-
30
Microarray chip with 4 probes
Gene 1
Gene 2
Gene 3
Gene 4
31
2 Treatments and 6 replicates
32
Assignment of dyes to each treatment
Treatment 1 Seedlings growing in saline soil Red
Channel
Treatment 2 Seedlings growing in normal
soil Green Channel
-
-
-





-


-
-

-
33
Expression of genes under salinity stress
Treatment 1 Seedlings growing in saline
soil Red Channel
Microarray
Treatment 2 Seedlings growing in normal
soil Green Channel
-
-
-





-


-
-

-
34
Expression of genes under salinity stress
Treatment 1 Seedlings growing in saline
soil Red Channel
Microarray
Treatment 2 Seedlings growing in normal
soil Green Channel
-
-
-





-


-
-

-
35
Expression of genes under salinity stress
Treatment 1 Seedlings growing in saline
soil Red Channel
Microarray
Treatment 2 Seedlings growing in normal
soil Green Channel
-
-
-





-


-
-

-
36
Results for gene 1
37
Data for Gene 1
Fluorescence intensity units
38
Data for Gene 1
Do replicates look comparable?
39
Data for Gene 1
Normalization makes every replicate comparable
40
Data for Gene 1
One simple normalization is the Ratio R/G
41
Data for Gene 1
One simple normalization is the Ratio R/G
42
Data for Gene 1
One simple normalization is the Ratio R/G
43
A twofold induction or repression of an
experimental sample, relative to the reference
sample, is indicative of a meaningful change in
gene expression
44
Data for Gene 1
Further normalization log2-transformation
45
Hypothesis test
Let M be mean log2(Ratio) for the population of
all seedings growing in saline soils compared to
seedlings growing in normal soils
How can you state the hypothesis of difference in
gene expression between the treatments in terms
of M?
46
Hypothesis test
How can you state the null hypothesis of no
difference in gene expression between the
treatments in terms of M?
47
Data for Gene 1
Does our M from sample supports the no-effect
hypothesis or the effect hypothesis?
Msample -0.019
48
Data for Gene 1
How certain we are that is or is not supportive?
Msample -0.019
49
The famous P-value
  • P-value is the probability that chance alone
    leads to a favorable result (to reject no-effect
    hypotheses or null hypothesis)
  • P-value is the probability that our conclusion of
    declaring an effect is wrong if the null
    hypothesis is true
  • Accepted P-values in science are usually lt0.05

50
Hypothesis testing
We cannot take many samples and calculate Msample
from each to see how likely or unlikely is
Msample -0.019 to occur if M 0 is true
51
Hypothesis testing
Instead, mathematics show that the t-statistic
has a very well known distribution, i.e., we can
know the probability of all values of t
52
Hypothesis testing
Lets do a little bit of work in MS Excel to
calculate the t-statistic and the P-value
53
Hypothesis testing
It turns out that t -0.065 and P 0.95,
meaning that
the probability that an inference that M ? 0 is
wrong is 95 when actually M 0, i.e. we have
more support for the hypothesis that M 0
54
Hypothesis testing
and therefore, the levels of expression for
gene 1 are the same for both treatments. Gene 1
may not be involved in the response of wheat
seedlings to salinity
Treatment 1 Seedlings growing in saline soil
Treatment 2 Seedlings growing in normal soil
-
-
-





-


-
-

-
55
Thanks!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com