ProcessingInMemory Technology for Knowledge Discovery Algorithms - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

ProcessingInMemory Technology for Knowledge Discovery Algorithms

Description:

To gain insight into whether processing-in-memory (PIM) technology can be used ... Invisible: terrorist groups, drug cartel. 8. Two functions ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 39
Provided by: thom6
Category:

less

Transcript and Presenter's Notes

Title: ProcessingInMemory Technology for Knowledge Discovery Algorithms


1
Processing-In-Memory Technology for Knowledge
Discovery Algorithms
  • Jafar Adibi et.al. USC/ISI
  • Presented by Chen Qian

2
Outline
  • The goal of this paper
  • Link Discovery Algorithm
  • Processing in Memory (PIM)
  • Graph clustering simulations
  • LD on PIM Performance Analysis
  • Conclusion

3
Goal
  • To gain insight into whether processing-in-memory
    (PIM) technology can be used to accelerate the
    performance of link discovery algorithms

4
Link Discovery
  • LD is a new challenge in data mining whose
    primary concerns are to identify strong links and
    discover hidden relationships among entities and
    organizations.
  • Data-intensive and highly parallel.

5
PIM
  • PIM chips that integrate processor logic into
    memory devices offer a new opportunity for
    bridging the growing gap between processor and
    memory speeds.
  • especially for applications with high
    memory-bandwidth requirements.

6
PIM and LD
  • Parallel computing power has the potential of
    providing dramatic computing speedups.
  • This paper evaluate the LD algo to a PIM
    workstation-class architecture.

7
Link Discovery
  • A major problem in LD is the discovery of hidden
    organizational structure such as groups and their
    members.
  • Visible relationship (groups) class
  • Invisible terrorist groups, drug cartel

8
Two functions
  • Group detection can be further broken down into
  • (1) discovering hidden members of known groups
    (or group extension)
  • and (2) identifying completely unknown groups.

9
Algo in this paper
  • The KOJAK Group Finder.

10
KOJAK Group Finder
  • 1. A seed generator outputs a set of seed groups
    using deductive and abductive reasoning.

11
KOJAK Group Finder
  • 2. A mutual information model finds likely new
    candidates for each group, producing extended
    groups.

12
KOJAK Group Finder
  • 3. The mutual information model is used to rank
    these likely members by how strongly connected
    they are to the seed members.

13
KOJAK Group Finder
  • 4. the ranked extended group is pruned using a
    threshold to produce the final output.

14
Transfer space to graph
  • Each node represents an entity and each link
    represents the set of actions (email, phone call
    etc).

15
Transfer space to graph
  • Graph Partitioning Problem

16
Link Discovery
  • LD algo are data-intensive and highly parallel.

17
PIM
18
PIM
  • However the limitation of memory bandwidth will
    also cause slow processing.
  • Processor speed increases at 60/year
  • Memory speed increases at 7/year.
  • Increasing the CPU speed is not enough.

19
PIM
  • PIM chips that integrate processor logic into
    memory devices offer a new opportunity for
    bridging the growing gap between processor and
    memory speeds.
  • especially for applications with high
    memory-bandwidth requirements.

20
DIVA Key ideas
  • First smart memory PIM that is
  • Capable of executing independent threads of
    control
  • Designed to support in-memory virtual addressing.
  • Target applications Image processing and
    multimedia (streaming)

21
DIVA system architecture
  • PIMs are separated by host memory and host
    processor.

22
chip architecture
  • Two components. Two interconnects.

23
chip architecture
  • Fast processing can be performed inside the
    memory (PIM units) without loading into host
    processor.
  • Drawback
  • the ability of process logic in PIM is limited.
  • Special design for the process logic

24
DIVA PIM
  • Can provide dramatic computing speedups for
    paralleled algorithms.

25
Parallel Graph Clustering Algo
  • To take the advantage of PIM machines.

26
Sum.
  • The data and computation are partitioned among
    PIM nodes as follows
  • Each PIM keeps a fraction of the PairTable, that
    is, a subset of the pairs of nodes in the graph,
    where a pair is a set of two nodes with at least
    one link between them.
  • A PairTable is partitioned so that, for a given
    node, all pairs containing that node are kept on
    the same PIM.

27
Sum.
  • Hence the whole graph is represented in a link
    table of twice the number of links times the
    number of all link types plus 2 (for two nodes).
  • This duplication of information avoids
    unnecessary cross communication among PIMs.

28
Sum.
  • At each iteration each PIM computes the node,
    among the nodes in its subset, with highest
    InOutRatio.
  • After that, all PIMs communicate to find the node
    with highest InOutRatio across all PIMs.
  • Finally, at the end of each iteration, each PIM
    adds the new global best node to its local copy
    of the current group.
  • Algo iterates until a desired no. of new nodes is
    added or there are no more to be added.

29
Benchmarks
  • The measurements on two common bandwidth
    benchmarks tests out the hypothesis that PIMs
    really do offer a bandwidth advantage.
  • StreamAdd measures the performance of a stream
    of floating point additions and updates.
  • RandomAccess is designed to stress the memory
    system by performing updates to randomly selected
    entries of a large table.

30
Benchmarks
  • PIMs were able to deliver comparable bandwidth to
    the host-only machine with just a single PIM, and
    8X more bandwidth with 8 PIMs

31
Simulation
  • Simulation for Graph clustering algo.
  • Incresing the number of PIMs reduces the cost of
    overall computation.
  • However, the cost of communication also increased

32
Simulation
  • Load imbalance problem exists.
  • The amount of work at each PIM varies
    significantly. But the imbalance reduces as the
    of seed members increases.

33
Experiment
  • Compared with Itanium-2

34
Experiment
  • For Mutual Information
  • Note that the Itanium-2 clock is operating at 900
    MHz while the PIM clock rate is 140MHz. The of
    cycles is used for comparison. The PIM code is
    executing 18 fewer cycles than the Itanium-2.

35
Experiment
  • For Graph Clustering
  • Significant improvement. Far more promising than
    that of mutual information

36
Experiment
  • For Multi-PIM
  • For both kernels, our PIMs were able to deliver
    comparable bandwidth to the Itanium-2 with just a
    single PIM, and 8X more bandwidth with 8 PIMs

37
Summary
  • What is the most important
  • LD ?
  • PIM ?
  • How to perform graph clustering in parallel and
    manage the communication overhead.
  • There still exists more potential.

38
  • Questions?
  • Thank you !
Write a Comment
User Comments (0)
About PowerShow.com