ProcessingInMemory Technology for Knowledge Discovery Algorithms - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

ProcessingInMemory Technology for Knowledge Discovery Algorithms

Description:

To gain insight into whether processing-in-memory (PIM) technology can be used ... Invisible: terrorist groups, drug cartel. 8. Two functions ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 39

Provided by: thom6

Category:

more less

Transcript and Presenter's Notes

Title: ProcessingInMemory Technology for Knowledge Discovery Algorithms

1
Processing-In-Memory Technology for Knowledge
Discovery Algorithms

Jafar Adibi et.al. USC/ISI
Presented by Chen Qian

2
Outline

The goal of this paper
Link Discovery Algorithm
Processing in Memory (PIM)
Graph clustering simulations
LD on PIM Performance Analysis
Conclusion

3
Goal

To gain insight into whether processing-in-memory
(PIM) technology can be used to accelerate the
performance of link discovery algorithms

4
Link Discovery

LD is a new challenge in data mining whose
primary concerns are to identify strong links and
discover hidden relationships among entities and
organizations.
Data-intensive and highly parallel.

5
PIM

PIM chips that integrate processor logic into
memory devices offer a new opportunity for
bridging the growing gap between processor and
memory speeds.
especially for applications with high
memory-bandwidth requirements.

6
PIM and LD

Parallel computing power has the potential of
providing dramatic computing speedups.
This paper evaluate the LD algo to a PIM
workstation-class architecture.

7
Link Discovery

A major problem in LD is the discovery of hidden
organizational structure such as groups and their
members.
Visible relationship (groups) class
Invisible terrorist groups, drug cartel

8
Two functions

Group detection can be further broken down into
(1) discovering hidden members of known groups
(or group extension)
and (2) identifying completely unknown groups.

9
Algo in this paper

The KOJAK Group Finder.

10
KOJAK Group Finder

1. A seed generator outputs a set of seed groups
using deductive and abductive reasoning.

11
KOJAK Group Finder

2. A mutual information model finds likely new
candidates for each group, producing extended
groups.

12
KOJAK Group Finder

3. The mutual information model is used to rank
these likely members by how strongly connected
they are to the seed members.

13
KOJAK Group Finder

4. the ranked extended group is pruned using a
threshold to produce the final output.

14
Transfer space to graph

Each node represents an entity and each link
represents the set of actions (email, phone call
etc).

15
Transfer space to graph

Graph Partitioning Problem

16
Link Discovery

LD algo are data-intensive and highly parallel.

17
PIM
18
PIM

However the limitation of memory bandwidth will
also cause slow processing.
Processor speed increases at 60/year
Memory speed increases at 7/year.
Increasing the CPU speed is not enough.

19
PIM

PIM chips that integrate processor logic into
memory devices offer a new opportunity for
bridging the growing gap between processor and
memory speeds.
especially for applications with high
memory-bandwidth requirements.

20
DIVA Key ideas

First smart memory PIM that is
Capable of executing independent threads of
control
Designed to support in-memory virtual addressing.
Target applications Image processing and
multimedia (streaming)

21
DIVA system architecture

PIMs are separated by host memory and host
processor.

22
chip architecture

Two components. Two interconnects.

23
chip architecture

Fast processing can be performed inside the
memory (PIM units) without loading into host
processor.
Drawback
the ability of process logic in PIM is limited.
Special design for the process logic

24
DIVA PIM

Can provide dramatic computing speedups for
paralleled algorithms.

25
Parallel Graph Clustering Algo

To take the advantage of PIM machines.

26
Sum.

The data and computation are partitioned among
PIM nodes as follows
Each PIM keeps a fraction of the PairTable, that
is, a subset of the pairs of nodes in the graph,
where a pair is a set of two nodes with at least
one link between them.
A PairTable is partitioned so that, for a given
node, all pairs containing that node are kept on
the same PIM.

27
Sum.

Hence the whole graph is represented in a link
table of twice the number of links times the
number of all link types plus 2 (for two nodes).
This duplication of information avoids
unnecessary cross communication among PIMs.

28
Sum.

At each iteration each PIM computes the node,
among the nodes in its subset, with highest
InOutRatio.
After that, all PIMs communicate to find the node
with highest InOutRatio across all PIMs.
Finally, at the end of each iteration, each PIM
adds the new global best node to its local copy
of the current group.
Algo iterates until a desired no. of new nodes is
added or there are no more to be added.

29
Benchmarks

The measurements on two common bandwidth
benchmarks tests out the hypothesis that PIMs
really do offer a bandwidth advantage.
StreamAdd measures the performance of a stream
of floating point additions and updates.
RandomAccess is designed to stress the memory
system by performing updates to randomly selected
entries of a large table.

30
Benchmarks

PIMs were able to deliver comparable bandwidth to
the host-only machine with just a single PIM, and
8X more bandwidth with 8 PIMs

31
Simulation

Simulation for Graph clustering algo.
Incresing the number of PIMs reduces the cost of
overall computation.
However, the cost of communication also increased

32
Simulation

Load imbalance problem exists.
The amount of work at each PIM varies
significantly. But the imbalance reduces as the
of seed members increases.

33
Experiment

Compared with Itanium-2

34
Experiment

For Mutual Information
Note that the Itanium-2 clock is operating at 900
MHz while the PIM clock rate is 140MHz. The of
cycles is used for comparison. The PIM code is
executing 18 fewer cycles than the Itanium-2.

35
Experiment

For Graph Clustering
Significant improvement. Far more promising than
that of mutual information

36
Experiment

For Multi-PIM
For both kernels, our PIMs were able to deliver
comparable bandwidth to the Itanium-2 with just a
single PIM, and 8X more bandwidth with 8 PIMs

37
Summary

What is the most important
LD ?
PIM ?
How to perform graph clustering in parallel and
manage the communication overhead.
There still exists more potential.