Error Correction and Clustering Gene Expression Data Using Majority Logic Decoding Humberto OrtizZua - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Error Correction and Clustering Gene Expression Data Using Majority Logic Decoding Humberto OrtizZua

Description:

Humberto Ortiz-Zuazaga, Sandra Pe a de Ortiz and Oscar Moreno ... CTA lends itself as an excellent model system to study the dynamics of gene ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 2
Provided by: humbertoor
Category:

less

Transcript and Presenter's Notes

Title: Error Correction and Clustering Gene Expression Data Using Majority Logic Decoding Humberto OrtizZua


1
Error Correction and Clustering Gene Expression
Data Using Majority Logic DecodingHumberto
Ortiz-Zuazaga, Sandra Peña de Ortiz and Oscar
MorenoDepartmetns of Biology and Computer
ScienceRio Piedras Campus, University of Pueto
Rico
Abstract Microarrays allow researchers to
simultaneously measure the expression of
thousands of genes. They give invaluable insight
into the transcriptional state of biological
systems, and can be important in understanding
physiological as well as diseased conditions.
However, the analysis of data from many thousands
of genes, from only a few replications is very
difficult. We have devised a novel method of
correcting errors in microarray experiments, that
also clusters genes into groups, and categorizes
their measurements into coarse divisions,
suitable for discrete techniques for reverse
engineering. These techniques are based on finite
fields and algebraic coding theory. We test these
new techniques on a data set obtained from
behavioral training experiments on rats, and
identify two novel genes that may be involved in
learning and memory. Methods
  • Results
  • We have performed the analysis described above on
    the CTA data set. In this data set, there are 127
    consistent genes, which we divide into clusters
    by grouping together the genes that have the same
    set of calls in the 1 - 24 hour timepoints. This
    results in 23 clusters.
  • In Dr. Sandra Peña's lab they study the role of
    CREB, a gene known to be required for long term
    memory. We focus on the expression of this gene,
    and other genes with the same pattern of
    expression. CREB binds to a DNA element called
    cAMP-response element (CRE) in the target genes,
    and in conjunction with a co-activator initiates
    transcription of the target genes.
  • We focus on the cluster labeled 000''. The
    consensus of the calls for these genes represents
    no change over the 1, 3, and 6 hour time points,
    followed by upregulation at the 24 hour
    timepoint. This cluster consists of genes whose
    expression most closely matches the expression
    profile of CREB. We investigated the genes in
    this cluster in depth, retrieving the gene
    information and sequence from the Ensembl Genome
    Browser version 32.
  • From Ensembl we obtained genomic sequence for
    each gene, 1020 base pairs starting 800 base
    pairs upstream of the transcription start site.
    These sequences were then submitted to TESS to
    search for transcription factor binding sites. We
    look for the CRE element, a DNA sequence that is
    the target site for CREB. Genes that have CRE in
    their upstream region are potential targets of
    regulation by CREB.
  • Two genes in particular caught our interest Pmch
    and Calca. Both genes have CRE elements in their
    upstream regions. According to the Rat Genome
    Database, Pmch is a cyclic neuropeptide that
    induces hippocampal synaptic transmission. It
    seems to have an effect on appetite or metabolism
    and anxiety, and promotes synaptic transmission
    in the hippocampus 3. Calca is principally a
    vasodilator, but seems to have a role in axonal
    regeneration or synaptogenesis. 4 Thus these
    genes exhibit a pattern of expression consistent
    with the expression of Creb1, have CRE elements
    upstream of their transcription start site, and
    seem to have a role in strengthening or creating
    new synapses.
  • Discussion
  • We have developed a method for error correction
    of microarray experiments. The technique produces
    a clustering of genes and describes each gene as
    unchanged, upregulated, or downregulated, in
    accordance to biologists natural description of
    expression levels. We applied these techniques to
    a microarray data set derived from a CTA
    experiment in rats, looking for genes that may be
    important in learning and memory processes. We
    found two genes, Pmch and Calca, that share an
    expression pattern with CREB, contain CRE in
    their upstream regions, and have demonstrated
    function related to synaptic plasticity. Pmch and
    Calca are strongly implicated as important genes
    for the formation of memories. We are now
    actively seeking confirmation of these genes'
    role in CTA and of their regulation by CREB as a
    result of CTA training.
  • Acknowledgements
  • The authors received partial support from a SCORE
    grant (S06GM08102), an INBRE grant (P20RR16470)
    and an IDeA Program grant (P20RR15565), from the
    National Institutes of Health, and National
    Science Foundation grant CNS-0540592.

Fig. 1 Conditioned Taste Aversion Task. CTA is an
associative aversive conditioning paradigm in
which pairing gastrointestinal malaise (induced
by lithium chloride, LiCl, the unconditioned
stimulus) with prior exposure to a novel taste
(the conditioned stimulus) may create a strong
and long lasting aversion to the novel taste.
CTA lends itself as an excellent model system to
study the dynamics of gene regulation in learning
and memory because it is a single trial
associative learning paradigm, which involves
discrete regions in the brain, including selected
amygdala nuclei. The gene profiling experiment
was replicated five times. Four animals were
used per condition for each replicate. Thus, a
total of twenty rats were used per condition.
Animals were sacrificed by decapitation at 1, 3,
6, and 24 hours after conditioning and amygdala
enriched tissue punches were obtained for RNA
isolation. Hybridization, image capture and
analysis was similar to the procedures described
in 1. The data set thus obtained (CTA data
set) is described in 2. In summary, the data
has two controls, the pre-treatment group and the
one hour saline group, and four time points, 1,
3, 6, and 24 hours after conditioning. Each array
has 1185 genes, and we have 5 biological
replicates of each array. Error Correction and
Clustering We have developed a method for error
correction and clustering the gene expression
data that proceeds in stages. First we discretize
each replicate separately by comparing each
expression value to the mean of the control
values. We choose an epsilon such that either the
1 h or 24 h timpepoints are within epsilon of the
controls. We call a gene if it's expresion is
greater than the contol epsilon, - if it is
less than control epsilon, and 0 otherwise. The
second stage uses majority logic decoding to
summarize all the repetitions into a single call
for each timepoint. The third stage repeats the
discretization using a single control value for
all repetitions, the averaged control. The fourth
stage tosses out extreme values and repeats the
discretization. Finally, we test if at least two
of the stages agree, and the calls across the 1,
3, 6, and 24 h timepoints have the consecutive
zeros properties. We call genes that pass this
test consistent. These heuristics try to
capture biological knowlege about the behaviour
of the genes.
Write a Comment
User Comments (0)
About PowerShow.com