Combining the results of different motif discovery programs for de novo prediction of TFBS A critica

About This Presentation

Title:

Combining the results of different motif discovery programs for de novo prediction of TFBS A critica

Description:

Best ranked(according to Tompa et al.) are Meme MotifSampler and Weeder ... Meme sequence X65568. Also the second and third best Hits do not report the same Sequences ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 33

Provided by: adrianaen

Category:

more less

Transcript and Presenter's Notes

Title: Combining the results of different motif discovery programs for de novo prediction of TFBS A critica

1
Combining the results of different motif
discovery programs for de novo prediction of TFBS
A critical approach

Speaker Thomas Engleitner

Question
Can we trust the results of tools for de novo
motif (TFBS) detection?
If not, how can we improve the results?

3
Introduction

Why de novo motif discovery ?
Finding TFBS is a time and money consuming
problem in the lab
Prediction tools do not only identify TFBS in the
input sequences but provide PSSM to search
genome-wide for a given TF

4
Introduction

Many different computational approaches for the
identification of motifs in biological sequences
HMM, hexamer counts, EM algorithms
Correct prediction for eukaryotic TFBS is still a
hard problem in Computational Biology

5
Introduction

Detection rate for every tool alone is bad
Tompa et al. suggests combining different tools
to improve results of motif discovery
Hypothesis TFBS reported by more than one tool
are more reliable
Best ranked(according to Tompa et al.) are Meme
MotifSampler and Weeder

Tompa et al., Assessing computational tools for
the discovery of transcription factor binding
sites, Nature Biotechnology, 23,1,137-144
6
Preliminary Considerations
7
Sequence data constraints in this study

Validation / Knowledge !
The Motif has to be validated experimentally
Appearance !
Motif must appear in all sequences in the
dataset one or more times
Motiflength !
Length of motif must be sufficient

8
Sequence data constraints in this study

One motif that satisfy our constraints is the
Camp response Element

9
(No Transcript)
10
Sequence data constraints in this study

Test data set
7 human DNA Sequences each containing the CRE
For each sequence the binding position of CREB as
well as the Binding sites sequence is known

Next step Use dataset as input for Meme
MotifSampler and Weeder
Motifs that are reported by all Tools and show an
userdefined overlap were taken and compared to
the known CRE

Consensus based approach
12
For those hits it is checked if they overlap with
the known binding site of CREB
13
First Result

None of the overlapping hits shows overlap with
the known CRE

Possible Solution Parameter Tuning
14

All programs have a wide variety of parameters
that can be changed by the user
Idea
Tune the parameters for each program such
that the TP rate is maximized
But what is a TP hit for each program alone?

15
TP/ FP Example
16
Results

Meme
Tested parameters
Number of motifs
Motifwidth

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Results

MotifSampler
Tested parameters
Prior probability
Motifwidth
Number of motifs

21
(No Transcript)
22
(No Transcript)
23
Results

Weeder
Tested parameters
Motifwidth
Number of Mutations

24
(No Transcript)
25
(No Transcript)
26
Results

We have seen that the initial parameter settings
have great influence on the results
The runs which shows the best TP rate were
selected and the TP hits were allocated to the
corresponding sequences

27
Results

MotifSampler sequence X65568
Weeder sequence X00274
Meme sequence X65568
Also the second and third best Hits do not report
the same Sequences
Conclusion Even with tuned parameters for
each programm the result is
even
worse !!!!

28
Discussion

Combining the output of three different programs
leads to no better motif prediction
To address this the parameters for each program
were varied systematically
We have found that the parameter choice has great
influence on the overall result

29
Discussion

Even if the Run is done with the best parameter
settings the CRE motif is only identified in one
sequence of the dataset by 2 programs
Remember Normally the user does not know much
about the motiflength, distribution within the
dataset, etc
De novo prediction of TFBS without any knowledge
is nearly impossible

30
Discussion

Even if masked sequences were used the result is
not better (Result not shown)
This is also true for another dataset containing
sequences having the Hormon Response Element
(Result not shown)

31
Take home message

Results of tools for de novo prediction of
TFBS are very sensitive to the initial parameters
Do not trust those motifs that are reported

Thank you for your attention.
Any questions ?

Write a Comment

User Comments (0)

About PowerShow.com

Combining the results of different motif discovery programs for de novo prediction of TFBS A critica - PowerPoint PPT Presentation

Combining the results of different motif discovery programs for de novo prediction of TFBS A critica

Best ranked(according to Tompa et al.) are Meme MotifSampler and Weeder ... Meme sequence X65568. Also the second and third best Hits do not report the same Sequences ... – PowerPoint PPT presentation