An analysis of upstream sequences of Dictyostelium discoideum using a distributed computer system No - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

An analysis of upstream sequences of Dictyostelium discoideum using a distributed computer system No

Description:

Step 1. Sequence alignment between the genome sequence and cDNA ... 2 Johann Radon Institute for Computational and Applied Mathematics, Austria. 1. Introduction ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 2
Provided by: norioko
Category:

less

Transcript and Presenter's Notes

Title: An analysis of upstream sequences of Dictyostelium discoideum using a distributed computer system No


1
An analysis of upstream sequences of
Dictyostelium discoideum using a distributed
computer systemNorio Kobayashi1, Mircea Marin2,
Takahiro Morio1, Yoshimasa Tanaka1, and Hideko
Urushihara1 1 University of Tsukuba, Japan 2
Johann Radon Institute for Computational and
Applied Mathematics, Austria
  • 4. Development of distributed system
  • We would like to perform our sequence analyzer
    against longer probe sequences.
  • cis-elements longer than 20 have already been
    identified in biological experiments.
  • Huge computation resource is required for length
    20.
  • Distributed approach works well.
  • Each probe sequence can be applied to our
    program.
  • System architecture
  • 1. Introduction
  • An upstream sequence analysis with information
    science technique
  • Our goal
  • Investigation of characterized elements in
    upstream sequences as candidates of cis-elements.
  • Life cycle stage specific
  • Location specific in upstream sequences
  • Stage and location specific
  • Our approach
  • Development of upstream sequence database
  • Implementation of a sequence analyzer program
    which extracts characterized elements in upstream
    sequences
  • Development of distributed environment for
    collaborative computing with the program
  • 3. Upstream sequence analyzer
  • Development of a program which extracts statistic
    characteristic elements as candidates of
    cis-elements from upstream sequences
  • Design
  • Let be the set of all possible sequences of
    length constructed from characters A,T,G and
    C.For every ,
  • Step 1. Extract the elements of the upstream
    sequences of the cDNA clones which are similar to
    and obtain a list of their position
  • Step 2. Obtain the statistic distribution of
    on the upstream sequences from . If the
    distribution satisfies a certain criterion, then
    is regarded as a candidate of cis-element.
  • Implementation in Java
  • For step 1 local alignment based on dynamic
    programming
  • align(n) performs local alignments between each
    and upstream sequences

100 bases
2,000 bases
V stage
A stage
S stage
C stage
Red V stage specific elementGreen location
specificBlue V stage and location specific
Transcription initiation site
LAN 1
Figure 3. An illustration of statistic
characterized elements in upstream sequences
Request
Request
Result
Upstreamsequences
Upstream sequence database
User client
Coordinator
Web
Protocol SOAP / Jini
LAN 2
LAN 3
  • 2. Upstream sequence database
  • Acquisition of upstream sequences
  • Step 1. Sequence alignment between the genome
    sequence and cDNA contig sequences
  • The genome sequence of chromosome 2 (The
    Dictyostelium discoideum genome project)
  • 8,402 cDNA contig sequences (The Dictyostelium
    cDNA project in Japan)
  • We have obtained 2,152 upstream sequences

Sequence analyzers
Sequence analyzers
Figure 5. Architecture of distributed system
which performs sequence analyzers
in parallel.
1999-min(a,-1)
99max(ß,1)
a
ß
-1 1
Genomesequence
Upstream part
Contig part
5
3
Clone 1
Clone 2
  • 5. Results
  • The system extracted the candidate elements
    listed in Table 1 in 436 minutes.
  • TTGSSCAA is an known element called Harwood
    element (TTGN2,4CAA) which deactivates the
    expression at a prestalk cell in Dictyostelium.
  • GRGTGTAT partially matches to a known element
    (DGKGKGDN4-7DGKGKGD) which regulate prestalk gene
    expression.

Clone 3
2000 bases
100 bases
Figure 1. Structure of upstream sequence for cDNA
contigs. It includes 2,000 bases
of upstream and 100 bases of downstream
of the transcription initiation site of all
concerned cDNA clones.
Table 1. Extracted elements with probe sequences
of lengths 4-10
A
T A T C G A C A C G T
  • 6. Conclusions
  • We have presented upstream sequence analysis with
    an approach from information science.
  • We have developed a upstream sequence database,
    an analyzer and a distributed environment.
  • The system extracted 6 candidate sequences, and
    two of them are known elements.
  • As the result, we could confirm that our system
    is effective and practical.
  • Future work
  • We will perform our system for probe sequences of
    lengths up to 20.
  • We would like to examine if the extracted
    elements really work as cis-elements with
    biological experiments.

B
Figure 4. Computation time for probe sequences of
lengths 4-7 for score threshold
0, 2, 4, 6 and 8.
Figure 2. Graphical user interfaces embedded in
web pages. A Sequence information
page consisting of the viewer of transcription
initiation sites of concerned
cDNA clones, B Viewer for the
detail of upstream sequences in a list format.
Write a Comment
User Comments (0)
About PowerShow.com