Parallelizing Incremental Bayesian Segmentation (IBS)

About This Presentation

Title:

Parallelizing Incremental Bayesian Segmentation (IBS)

Description:

Parallelizing Incremental Bayesian Segmentation (IBS) Joseph Hastings Sid Sen Outline Background on IBS Code Overview Parallelization Methods (Cilk, MPI) Cilk Version ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 28

Provided by: beowulfL

Learn more at: http://beowulf.lcs.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Parallelizing Incremental Bayesian Segmentation (IBS)

1
Parallelizing Incremental Bayesian Segmentation
(IBS)

Joseph Hastings
Sid Sen

2
Outline

Background on IBS
Code Overview
Parallelization Methods (Cilk, MPI)
Cilk Version
MPI Version
Summary of Results
Final Comments

3
Background on IBS
4
IBS

Incremental Bayesian Segmentation 1 is an
on-line machine learning algorithm designed to
segment time-series data into a set of distinct
clusters
It models the time-series as the concatenation of
processes, each generated by a distinct Markov
chain, and attempts to find the most-likely break
points between the processes

5
Training Process

During the training phase of the algorithm, IBS
builds a set of Markov matrices that it believes
are most likely to describe the set of processes
responsible for generating the time series

6
Code Overview
7
Hi-Level Control Flow

main()
loops through input file
Runs break-point detection
Foreach segment
check_out_process()
Foreach existing matrix
compute_subsumed_marginal_likelihood()
Adds segment to set of matrices or subsumes

8
Parallelizable Computation

compute_subsumed_marginal_likelihood()
Depends on a single matrix and the new segment
Produces a single score
The index of the best score must be calculated

9
Code Status
10
Parallelization Methods
11
MPI

Library facilitating inter-process communication
Provides useful communication routines,
particularly MPI_Allreduce, which simultaneously
reduces data on all nodes and broadcasts the
result

12
Cilk

Originally developed by the Supercomputing
Technologies Group at the MIT Laboratory for
Computer Science
Cilk is a language for multithreaded parallel
programming based on ANSI C that is very
effective for exploiting highly asynchronous
parallelism 3 (which can be difficult to write
using message-passing interfaces like MPI)

13
Cilk

Specify number of worker threads or processors
to create when running a Cilk job
No one-to-one mapping of worker threads to
processors, hence the quotes
Work-stealing algorithm
When a processor runs out of work, asks another
processor chosen at random for work to do
Cilks work-stealing scheduler executes any Cilk
computation in nearly optimal time
Computation on P processors executed in time
Tp T1/P O(T?)

14
Code Status
15
Cilk Version
16
Code Modifications

Keywords cilk, spawn, sync
Convert any methods that will be spawned or that
will spawn other (Cilk) methods into Cilk methods
In our case main(), check_out_process(),
compute_subsumed_marginal_likelihood()
Main source of parallelism comes from subsuming
current process with each existing process and
choosing subsumption with the best score
spawn compute_subsumed_marginal_likelihood(proc,
get(processes,i),
copy_process_list(processes))

17
Code Modifications

When updating global_score need to enforce mutual
exclusion between worker threads
Cilk_lockvar score_lock
...
Cilk_lock(score_lock)
...
Cilk_unlock(score_lock)

18
Cilk Results
Optimal performance achieved using 2
processors (trade-off between overhead of Cilk
and parallelism of program)
19
Adaptive Parallelism

Real intelligence is in the Cilk runtime system,
which handles load balancing, paging, and
communication protocols between running worker
threads
Currently have to specify the number of
processors to run a Cilk job on
Goal is to eventually make the runtime system
adaptively parallel by intelligently determining
how many threads/processors to use
Fair and efficient allocation among all running
Cilk jobs
Cilk Macroscheduler 4 uses steal rate of worker
thread as a measure of its processor desire (if a
Cilk job spends a substantial amount of its time
stealing, then the job has more processors than
it desires)

20
MPI Version
21
Code Modifications

check_out_process() first broadcasts the segment
using MPI_Bcast()
Each process loops over all matrices, but only
performs subsumption if (I np rank)
Each process computes best score, and
MPI_Allreduce() is used to reduce this
information to the globally best score
Each process learns the index of the best matrix
and performs the identical subsumption

22
MPI Results
Big improvement from 1 to 2 processors levels
off for 3 or more
23
Summary of Results
24
MPI vs. Cilk
25
Final Comments
26
MPI vs. Cilk

MPI version much more complicated, involved more
lines of code, and much more difficult to debug
Cilk version required thinking about
mutual-exclusion, which MPI avoids
Cilk version required few code changes, but
conceptually more complicated to think about

27
References (Presentation)

1 Paola Sebastiani and Marco Ramoni.
Incremental Bayesian Segmentation of Categorical
Temporal Data. 2000.
2 Wenke Lee and Salvatore J. Stolfo. Data
Mining Approaches for Intrusion Detection. 1998.
3 Cilk 5.3.2 Reference Manual. Supercomputing
Technologies Group, MIT Lab for Computer Science.
November 9, 2001. Available online
http//supertech.lcs.mit.edu/manual-5.3.2.pdf.
4 R. D. Blumofe, C. E. Leiserson, B. Song.
Automatic Processor Allocation for Work-Stealing
Jobs. (Work in progress)