Parallelizing Incremental Bayesian Segmentation (IBS) - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Parallelizing Incremental Bayesian Segmentation (IBS)

Description:

Parallelizing Incremental Bayesian Segmentation (IBS) Joseph Hastings Sid Sen Outline Background on IBS Code Overview Parallelization Methods (Cilk, MPI) Cilk Version ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 28
Provided by: beowulfL
Category:

less

Transcript and Presenter's Notes

Title: Parallelizing Incremental Bayesian Segmentation (IBS)


1
Parallelizing Incremental Bayesian Segmentation
(IBS)
  • Joseph Hastings
  • Sid Sen

2
Outline
  • Background on IBS
  • Code Overview
  • Parallelization Methods (Cilk, MPI)
  • Cilk Version
  • MPI Version
  • Summary of Results
  • Final Comments

3
Background on IBS
4
IBS
  • Incremental Bayesian Segmentation 1 is an
    on-line machine learning algorithm designed to
    segment time-series data into a set of distinct
    clusters
  • It models the time-series as the concatenation of
    processes, each generated by a distinct Markov
    chain, and attempts to find the most-likely break
    points between the processes

5
Training Process
  • During the training phase of the algorithm, IBS
    builds a set of Markov matrices that it believes
    are most likely to describe the set of processes
    responsible for generating the time series

6
Code Overview
7
Hi-Level Control Flow
  • main()
  • loops through input file
  • Runs break-point detection
  • Foreach segment
  • check_out_process()
  • Foreach existing matrix
  • compute_subsumed_marginal_likelihood()
  • Adds segment to set of matrices or subsumes

8
Parallelizable Computation
  • compute_subsumed_marginal_likelihood()
  • Depends on a single matrix and the new segment
  • Produces a single score
  • The index of the best score must be calculated

9
Code Status
10
Parallelization Methods
11
MPI
  • Library facilitating inter-process communication
  • Provides useful communication routines,
    particularly MPI_Allreduce, which simultaneously
    reduces data on all nodes and broadcasts the
    result

12
Cilk
  • Originally developed by the Supercomputing
    Technologies Group at the MIT Laboratory for
    Computer Science
  • Cilk is a language for multithreaded parallel
    programming based on ANSI C that is very
    effective for exploiting highly asynchronous
    parallelism 3 (which can be difficult to write
    using message-passing interfaces like MPI)

13
Cilk
  • Specify number of worker threads or processors
    to create when running a Cilk job
  • No one-to-one mapping of worker threads to
    processors, hence the quotes
  • Work-stealing algorithm
  • When a processor runs out of work, asks another
    processor chosen at random for work to do
  • Cilks work-stealing scheduler executes any Cilk
    computation in nearly optimal time
  • Computation on P processors executed in time
  • Tp T1/P O(T?)

14
Code Status
15
Cilk Version
16
Code Modifications
  • Keywords cilk, spawn, sync
  • Convert any methods that will be spawned or that
    will spawn other (Cilk) methods into Cilk methods
  • In our case main(), check_out_process(),
    compute_subsumed_marginal_likelihood()
  • Main source of parallelism comes from subsuming
    current process with each existing process and
    choosing subsumption with the best score
  • spawn compute_subsumed_marginal_likelihood(proc,
  • get(processes,i),
  • copy_process_list(processes))

17
Code Modifications
  • When updating global_score need to enforce mutual
    exclusion between worker threads
  • Cilk_lockvar score_lock
  • ...
  • Cilk_lock(score_lock)
  • ...
  • Cilk_unlock(score_lock)

18
Cilk Results
Optimal performance achieved using 2
processors (trade-off between overhead of Cilk
and parallelism of program)
19
Adaptive Parallelism
  • Real intelligence is in the Cilk runtime system,
    which handles load balancing, paging, and
    communication protocols between running worker
    threads
  • Currently have to specify the number of
    processors to run a Cilk job on
  • Goal is to eventually make the runtime system
    adaptively parallel by intelligently determining
    how many threads/processors to use
  • Fair and efficient allocation among all running
    Cilk jobs
  • Cilk Macroscheduler 4 uses steal rate of worker
    thread as a measure of its processor desire (if a
    Cilk job spends a substantial amount of its time
    stealing, then the job has more processors than
    it desires)

20
MPI Version
21
Code Modifications
  • check_out_process() first broadcasts the segment
    using MPI_Bcast()
  • Each process loops over all matrices, but only
    performs subsumption if (I np rank)
  • Each process computes best score, and
    MPI_Allreduce() is used to reduce this
    information to the globally best score
  • Each process learns the index of the best matrix
    and performs the identical subsumption

22
MPI Results
Big improvement from 1 to 2 processors levels
off for 3 or more
23
Summary of Results
24
MPI vs. Cilk
25
Final Comments
26
MPI vs. Cilk
  • MPI version much more complicated, involved more
    lines of code, and much more difficult to debug
  • Cilk version required thinking about
    mutual-exclusion, which MPI avoids
  • Cilk version required few code changes, but
    conceptually more complicated to think about

27
References (Presentation)
  • 1 Paola Sebastiani and Marco Ramoni.
    Incremental Bayesian Segmentation of Categorical
    Temporal Data. 2000.
  • 2 Wenke Lee and Salvatore J. Stolfo. Data
    Mining Approaches for Intrusion Detection. 1998.
  • 3 Cilk 5.3.2 Reference Manual. Supercomputing
    Technologies Group, MIT Lab for Computer Science.
    November 9, 2001. Available online
    http//supertech.lcs.mit.edu/manual-5.3.2.pdf.
  • 4 R. D. Blumofe, C. E. Leiserson, B. Song.
    Automatic Processor Allocation for Work-Stealing
    Jobs. (Work in progress)
Write a Comment
User Comments (0)
About PowerShow.com