Combining the results of different motif discovery programs for de novo prediction of TFBS A critica - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Combining the results of different motif discovery programs for de novo prediction of TFBS A critica

Description:

Best ranked(according to Tompa et al.) are Meme MotifSampler and Weeder ... Meme sequence X65568. Also the second and third best Hits do not report the same Sequences ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 33
Provided by: adrianaen
Category:

less

Transcript and Presenter's Notes

Title: Combining the results of different motif discovery programs for de novo prediction of TFBS A critica


1
Combining the results of different motif
discovery programs for de novo prediction of TFBS
A critical approach
  • Speaker Thomas Engleitner

2
  • Question
  • Can we trust the results of tools for de novo
    motif (TFBS) detection?
  • If not, how can we improve the results?

3
Introduction
  • Why de novo motif discovery ?
  • Finding TFBS is a time and money consuming
    problem in the lab
  • Prediction tools do not only identify TFBS in the
    input sequences but provide PSSM to search
    genome-wide for a given TF

4
Introduction
  • Many different computational approaches for the
    identification of motifs in biological sequences
  • HMM, hexamer counts, EM algorithms
  • Correct prediction for eukaryotic TFBS is still a
    hard problem in Computational Biology

5
Introduction
  • Detection rate for every tool alone is bad
  • Tompa et al. suggests combining different tools
    to improve results of motif discovery
  • Hypothesis TFBS reported by more than one tool
    are more reliable
  • Best ranked(according to Tompa et al.) are Meme
    MotifSampler and Weeder

Tompa et al., Assessing computational tools for
the discovery of transcription factor binding
sites, Nature Biotechnology, 23,1,137-144
6
Preliminary Considerations
7
Sequence data constraints in this study
  • Validation / Knowledge !
  • The Motif has to be validated experimentally
  • Appearance !
  • Motif must appear in all sequences in the
    dataset one or more times
  • Motiflength !
  • Length of motif must be sufficient

8
Sequence data constraints in this study
  • One motif that satisfy our constraints is the
    Camp response Element

9
(No Transcript)
10
Sequence data constraints in this study
  • Test data set
  • 7 human DNA Sequences each containing the CRE
  • For each sequence the binding position of CREB as
    well as the Binding sites sequence is known

11
  • Next step Use dataset as input for Meme
    MotifSampler and Weeder
  • Motifs that are reported by all Tools and show an
    userdefined overlap were taken and compared to
    the known CRE

Consensus based approach
12
For those hits it is checked if they overlap with
the known binding site of CREB
13
First Result
  • None of the overlapping hits shows overlap with
    the known CRE

Possible Solution Parameter Tuning
14
  • All programs have a wide variety of parameters
    that can be changed by the user
  • Idea
  • Tune the parameters for each program such
    that the TP rate is maximized
  • But what is a TP hit for each program alone?

15
TP/ FP Example
16
Results
  • Meme
  • Tested parameters
  • Number of motifs
  • Motifwidth

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Results
  • MotifSampler
  • Tested parameters
  • Prior probability
  • Motifwidth
  • Number of motifs

21
(No Transcript)
22
(No Transcript)
23
Results
  • Weeder
  • Tested parameters
  • Motifwidth
  • Number of Mutations

24
(No Transcript)
25
(No Transcript)
26
Results
  • We have seen that the initial parameter settings
    have great influence on the results
  • The runs which shows the best TP rate were
    selected and the TP hits were allocated to the
    corresponding sequences

27
Results
  • MotifSampler sequence X65568
  • Weeder sequence X00274
  • Meme sequence X65568
  • Also the second and third best Hits do not report
    the same Sequences
  • Conclusion Even with tuned parameters for
  • each programm the result is
    even
  • worse !!!!

28
Discussion
  • Combining the output of three different programs
    leads to no better motif prediction
  • To address this the parameters for each program
    were varied systematically
  • We have found that the parameter choice has great
    influence on the overall result

29
Discussion
  • Even if the Run is done with the best parameter
    settings the CRE motif is only identified in one
    sequence of the dataset by 2 programs
  • Remember Normally the user does not know much
    about the motiflength, distribution within the
    dataset, etc
  • De novo prediction of TFBS without any knowledge
    is nearly impossible

30
Discussion
  • Even if masked sequences were used the result is
    not better (Result not shown)
  • This is also true for another dataset containing
    sequences having the Hormon Response Element
    (Result not shown)

31
Take home message
  • Results of tools for de novo prediction of
    TFBS are very sensitive to the initial parameters
  • Do not trust those motifs that are reported

32
  • Thank you for your attention.
  • Any questions ?
Write a Comment
User Comments (0)
About PowerShow.com