Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov

Description:

Is a given target sequence likely to fold to a 3D template core? Find the alignment that minimizes some ... Objective function coefficients generated from FROST ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 20
Provided by: csg3
Learn more at: https://www.cs.gsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov


1
Solving the Protein Threading Problem in
ParallelNocola Yanev, Rumen Andonov
  • Indrajit Bhattacharya
  • CMSC 838T Presentation

2
Motivation
  • Problem paper is trying to solve
  • 3D structure prediction using threading
  • Is a given target sequence likely to fold to a 3D
    template core?
  • Find the alignment that minimizes some score
    function
  • NP-complete optimal solution not possible
  • MAX-SNP-hard arbitrary approximation not
    possible
  • Why do we care
  • 3D structure determines biological function of
    protein
  • Amino acid sequence (almost) uniquely determines
    3D structure
  • Threading is usually less accurate than
    comparative modeling but easier to solve

3
Talk Overview
  • Overview of talk
  • Motivation
  • Techniques
  • Evaluation
  • Related work
  • Observations

4
Techniques
  • Approach
  • Reduce the problem to some known theoretical
    problem of interest
  • In this case, network flow
  • Use existing tools for solving the theoretical
    problem efficiently
  • CPLEX
  • Explore possibilities for parallelizing the
    problem
  • Investigate the intrinsic hardness for real
    biological examples

5
Mathematical Formulation
6
Reduction to Network Flow An Example
7
Reduction to Network FlowVariables and
Constraints
  • Standard Network Flow
  • Variable xi,t for each segment to position
    assignment
  • Restricted to 0, 1
  • With standard flow conservation constraints
  • Additional cost for non-local interactions
  • Variable zi,t,i,t for each non-local
    interaction
  • Restricted to 0, 1
  • Constrained to sum to 1 for each non-local pair
    (i, i)
  • Upper bounded by flow entering (i, t) and leaving
    (i, t)

8
Drawbacks of Approach
  • Integer programming is hard to solve!
  • Relax to linear programming with (0, 1) variables
  • Approximate to integer solution using standard
    heuristics
  • Existing tools like CPLEX
  • Huge number of variables
  • For 36 segments and 81 positions, IP problem has
    741264 rows, 360945 columns and 54145231 non-zero
    variables!
  • Need to reduce number of variables and
    constraints
  • Calls for parallelization if possible

9
Parallel Solution
  • Utilize special flow constraints
  • Split into sub-problems that may be solved
    parallely
  • Split the k-th layer in the graph into r
    intervals
  • Force path for a sub-problem to pass through a
    particular interval in the layer
  • Pass best bound for objective function found so
    far as parameter to sub-problem
  • Sub-task aborts when dual objective function
    exceeds the current best bound

10
Improving Parallel Solution
  • Drawback Hardest Sub-Problem Dominates!
  • Parallel strategy was found to be slower than the
    sequential!
  • Sub-problems can potentially become harder to
    solve
  • Many more difficult sub-problems than easy ones
  • Solution
  • Break the atomicity of the tasks
  • Each sub-task periodically checks the current
    best bound and updates its cut-off
  • Extra overhead is still small compared to task
    granularity
  • Now the easiest executing sub-task dominates!

11
Evaluation
  • Experimental environment
  • Real protein sequences
  • ILOG CPLEX Callable Library
  • SUN Ultra-Sparc II, 450 Mhz
  • Objective function coefficients generated from
    FROST
  • Maximum of 7 processors and 29 sub-problems
  • Evaluation results
  • Sequential version much faster than previous
    branch-and-bound results for the same problem
    formulation
  • Time taken comparable to PROSPECT
  • Splitting and parallelization significantly
    improve turnaround
  • Really tiny gap between relaxed LP and ILP
    solutions
  • Mostly integer solutions even for relaxed LP!

12
Result Tables
Comparison with branch and bound algorithm
  • Comment Self threading results in significantly
    lower scores (as should be)

13
Result Tables
  • Comment Tiny relaxation gap. (significance?)

14
Result Tables
Size of the LP formulation
  • Comment LP problem size is still too large.

15
Result Tables
Performance with parallel sub-tasks
  • Comment Longer times with more sub-problems??

16
Related Work
  • Similar / previous approaches
  • Lathrop and Smith, 1998
  • Uses same cost function
  • Branch and bound algorithm for searching the
    space of threadings
  • Xu, Xu and Uberbacher, 1998
  • Divide and conquer algorithm
  • Xu, Li, Lin, Kim and Xu, 2003
  • Linear programming formulation
  • Solved using bb algorithm
  • None of the above suggest any parallelizing scheme

17
Observations
  • Points of Interest
  • Mapping to a known problem of interest
  • Nicely utilizes particular constraints to break
    into independent subtasks
  • Threading of real amino acid sequences seems
    possible
  • Raises interesting questions about real-life
    protein threading being in P
  • Solver tailored for this particular problem may
    yield better results

18
Observations
  • Criticism
  • Not enough experiments with large number of
    subtasks and processors to show scaling
  • Prohibitively large number of variables and
    constraints
  • How accurate are the objective function
    coefficients?
  • What is the resolution of the objective function?
  • Threading onto multiple sequences for prediction
    still looks daunting
  • Not clear how to extend the idea for 3-way and
    more complex interactions
  • Improvements
  • Seems possible to break up the sub-tasks
    recursively

19
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com