Promoter Panel - PowerPoint PPT Presentation

About This Presentation
Title:

Promoter Panel

Description:

For a length of 1K background sequence, you can get about 1000-Matrix.length scores. ... Get more information about 13K sequences. Cache the threshold for 13K. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 9
Provided by: johnwat
Category:
Tags: panel | promoter

less

Transcript and Presenter's Notes

Title: Promoter Panel


1
Promoter Panel
  • Review

2
Background related Promoter
  • In genetics, a promoter is a DNA sequence that
    enables a gene to be transcribed. It may be very
    long and may have multiple elements.
  • In geWorkBench, Promoter Panel is used to
    discover potential transcription factor binding
    sites, based on known transcription factor
    binding profiles.

3
Background Contd
  • Available Transcription factor binding profile
    databases
  • Transfac most complete but commercial, about
    700 matrices
  • JASPAR open source. Now it has 3 categories
  • JASPAR CORE 123 profiles
  • JASPAR PHYLOFACTS 174 profiles
  • JASPAR FAM familial profiles based on CORE.
  • geWorkbench uses 108 matrices from an old
    versioned JASPAR CORE.

4
Background Contd

For sequence AAAGTA SCORE 21/21 21/21 21/21
21/21 8/21 6/21 0.108
5
Algorithm
  • Normalize the matrix, P(i) will be gt 0.
  • The formula for the score is very simple
    SlogP(i)
  • Create a background sequence, two ways to create
    background sequence.
  • Scan the background sequence to set up the
    threshold. For a length of 1K background
    sequence, you can get about 1000-Matrix.length
    scores. The threshold is based on the P-value.
    For example, for P-Value 0.05. The threshold is
    the lowest score for the top 5 scores.
  • Scan the input sequence and report hits above the
    threshold.
  • Report results
  • In summary, The result is very stringent.
    Bonferroni Correction is used. P-Value is really
    PValue/1K. Best for detecting enrichment of some
    patterns.

6
Issues - Programmatic
  • The algorithm is not very efficient. For every
    TF, one scan of the background and input sequence
    is required. Most of the time is spent on
    scanning background sequences.
  • Do all tests on Protein sequences. Stop button
    doesnt work.
  • Different species.
  • The 13K background sequence? Different programs
    use different background sequence.
  • Module discovery is not correctly programmed?
  • Too stringent for finding hits, good for checking
    enrichments.
  • Miss All Sequences button.
  • What can we do after we get the patterns? Save
    result do not work properly.

7
Issues - GUI
  • The logo is in poor quality. It should provide
    more information and should be in a separate
    panel.
  • Separate parameters and results.
  • The TFBS should be marked with direction, 5 or
    3.
  • Use updated Sequence Panel.
  • No Image snapshot function.

8
Proposed fixes
  • Update JASPAR Profiles
  • Provide more information about the matrix. Use
    JASPAR Logos for JASPAR CORE, Use enoLogos
    instead of BioJava for user defined matrix to get
    high quality pictures.
  • Scan once only.
  • Get more information about 13K sequences. Cache
    the threshold for 13K.
  • Change the GUI.
Write a Comment
User Comments (0)
About PowerShow.com