An analysis of upstream sequences of Dictyostelium discoideum using a distributed computer system No - PowerPoint PPT Presentation

1 / 1

About This Presentation

Title:

An analysis of upstream sequences of Dictyostelium discoideum using a distributed computer system No

Description:

Step 1. Sequence alignment between the genome sequence and cDNA ... 2 Johann Radon Institute for Computational and Applied Mathematics, Austria. 1. Introduction ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 2

Provided by: norioko

Category:

more less

Transcript and Presenter's Notes

Title: An analysis of upstream sequences of Dictyostelium discoideum using a distributed computer system No

1
An analysis of upstream sequences of
Dictyostelium discoideum using a distributed
computer systemNorio Kobayashi1, Mircea Marin2,
Takahiro Morio1, Yoshimasa Tanaka1, and Hideko
Urushihara1 1 University of Tsukuba, Japan 2
Johann Radon Institute for Computational and
Applied Mathematics, Austria

4. Development of distributed system
We would like to perform our sequence analyzer
against longer probe sequences.
cis-elements longer than 20 have already been
identified in biological experiments.
Huge computation resource is required for length
20.
Distributed approach works well.
Each probe sequence can be applied to our
program.
System architecture

1. Introduction
An upstream sequence analysis with information
science technique
Our goal
Investigation of characterized elements in
upstream sequences as candidates of cis-elements.
Life cycle stage specific
Location specific in upstream sequences
Stage and location specific
Our approach
Development of upstream sequence database
Implementation of a sequence analyzer program
which extracts characterized elements in upstream
sequences
Development of distributed environment for
collaborative computing with the program

3. Upstream sequence analyzer
Development of a program which extracts statistic
characteristic elements as candidates of
cis-elements from upstream sequences
Design
Let be the set of all possible sequences of
length constructed from characters A,T,G and
C.For every ,
Step 1. Extract the elements of the upstream
sequences of the cDNA clones which are similar to
and obtain a list of their position
Step 2. Obtain the statistic distribution of
on the upstream sequences from . If the
distribution satisfies a certain criterion, then
is regarded as a candidate of cis-element.
Implementation in Java
For step 1 local alignment based on dynamic
programming
align(n) performs local alignments between each
and upstream sequences

100 bases
2,000 bases
V stage
A stage
S stage
C stage
Red V stage specific elementGreen location
specificBlue V stage and location specific
Transcription initiation site
LAN 1
Figure 3. An illustration of statistic
characterized elements in upstream sequences
Request
Request
Result
Upstreamsequences
Upstream sequence database
User client
Coordinator
Web
Protocol SOAP / Jini
LAN 2
LAN 3

2. Upstream sequence database
Acquisition of upstream sequences
Step 1. Sequence alignment between the genome
sequence and cDNA contig sequences
The genome sequence of chromosome 2 (The
Dictyostelium discoideum genome project)
8,402 cDNA contig sequences (The Dictyostelium
cDNA project in Japan)
We have obtained 2,152 upstream sequences

Sequence analyzers
Sequence analyzers
Figure 5. Architecture of distributed system
which performs sequence analyzers
in parallel.
1999-min(a,-1)
99max(ß,1)
a
ß
-1 1
Genomesequence
Upstream part
Contig part
5
3
Clone 1
Clone 2

5. Results
The system extracted the candidate elements
listed in Table 1 in 436 minutes.
TTGSSCAA is an known element called Harwood
element (TTGN2,4CAA) which deactivates the
expression at a prestalk cell in Dictyostelium.
GRGTGTAT partially matches to a known element
(DGKGKGDN4-7DGKGKGD) which regulate prestalk gene
expression.

Clone 3
2000 bases
100 bases
Figure 1. Structure of upstream sequence for cDNA
contigs. It includes 2,000 bases
of upstream and 100 bases of downstream
of the transcription initiation site of all
concerned cDNA clones.
Table 1. Extracted elements with probe sequences
of lengths 4-10
A
T A T C G A C A C G T

6. Conclusions
We have presented upstream sequence analysis with
an approach from information science.
We have developed a upstream sequence database,
an analyzer and a distributed environment.
The system extracted 6 candidate sequences, and
two of them are known elements.
As the result, we could confirm that our system
is effective and practical.
Future work
We will perform our system for probe sequences of
lengths up to 20.
We would like to examine if the extracted
elements really work as cis-elements with
biological experiments.

B
Figure 4. Computation time for probe sequences of
lengths 4-7 for score threshold
0, 2, 4, 6 and 8.
Figure 2. Graphical user interfaces embedded in
web pages. A Sequence information
page consisting of the viewer of transcription
initiation sites of concerned
cDNA clones, B Viewer for the
detail of upstream sequences in a list format.

Write a Comment

User Comments (0)