PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS

Description:

Create computational model to test role of behavioral strategies and related variables ... Speed bump. 72 racks ( 8) BG System Overview: Integrated system ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 25

Provided by: ape58

Category:

more less

Transcript and Presenter's Notes

Title: PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS

1
PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS

Amanda Peters
MIT 18.337
5/13/2009

2
Outline

Motivation
Model
GPU Implementation
Blue Gene Implementation
Hardware
Results
Future Work

3
Motivation

Why does cooperation evolve?
Examples
Total War vs. Limited War
Quorum Sensing Bacteria
Pathogens
Goal of the project
Create computational model to test role of
behavioral strategies and related variables

4
Model

Focus on finding evolutionarily stable strategies
Five strategies
Mouse
Hawk
Bully
Retaliator
Prober-Retaliator
Payoffs
Win 60
Seriously Injured -100
Small Injuries Each -2
Emerge from Short Game uninjured 20

5
Why parallelize it?

Reduce computational time
Enable trials of more strategies
Enable analysis of different variables roles
Introduce more actions to the action space

6
CUDA Implementation

Embarrassingly parallel code
Distribute rounds of the game to different
threads
Only payoff array in global memory
Copy it back for post processing

7
Sample Code

__global__ void gameGPU(int player1, int player2,
float d_payoff1, float d_payoff2,float
rand_si, int max_rounds)
//Thread index __global__ void gameGPU(int
player1, int player2, float d_payoff1, float
d_payoff2,float rand_si, int max_rounds)
//Thread index
const int tidblockDim.x blockIdx.x
threadIdx.x
//Total number of threads in grid
const int THREAD_N blockDim.x
gridDim.x
int max_moves500
for (int round tid round lt max_rounds
round THREAD_N)
play_round(player1, player2,
d_payoff1round, d_payoff2round,
rand_siround,max_moves)

8
Blue Gene Implementation
9
System Overview
10
Design Fundamentals

Low Power PPC440 Processing Core
System-on-a-chip ASIC Technology
Dense Packaging
Ducted, Air Cooled, 25 kW Racks
Standard proven components for reliability and
cost

11
(No Transcript)
12
BG/P
Blue Gene/L
System
Rack
32 node cards
180/360 TF/s 32 TB (For the original 64 rack
system)
Node card
(32 chips 4x4x2) 16 compute, 0-2 IO cards
2.8/5.6 TF/s 512 GB
Compute card
2 chips, 1x2x1
90/180 GF/s 16 GB
Chip
2 processors
5.6/11.2 GF/s 1.0 GB
2.8/5.6 GF/s 4 MB
13
Blue Gene/P
System
Cabled 8x8x16
Rack
32 Node Cards
1 PF/s 144 TB
14 TF/s 2 TB
Compute Card
1 chip, 20 DRAMs

Key Differences
4 cores per chip
Speed bump
72 racks (8)

435 GF/s 64 GB
Chip
4 processors
13.6 GF/s 2.0 (or 4.0) GB DDR
13.6 GF/s 8 MB EDRAM
14
BG System Overview Integrated system

Lightweight kernel on compute nodes
Linux on I/O nodes handling syscalls
Optimized MPI library for high speed messaging
Control system on Service Node with private
control network
Compilers and job launch on Front End Nodes

15
Blue Gene/L interconnection networks

3 Dimensional Torus
Interconnects all compute nodes (65,536)
Virtual cut-through hardware routing
1.4Gb/s on all 12 node links (2.1 GB/s per node)
Communications backbone for computations
0.7/1.4 TB/s bisection bandwidth, 67TB/s total
bandwidth
Global Collective Network
One-to-all broadcast functionality
Reduction operations functionality
2.8 Gb/s of bandwidth per link Latency of tree
traversal 2.5 µs
23TB/s total binary tree bandwidth (64k machine)
Interconnects all compute and I/O nodes (1024)
Low Latency Global Barrier and Interrupt
Round trip latency 1.3 µs
Control Network
Boot, monitoring and diagnostics
Ethernet
Incorporated into every node ASIC
Active in the I/O nodes (164)

16
C/MPI Implementation of Code

Static Partitioning of work units
work_unit number_rounds/partition_size
Each node will get a chunk of the data
Loops that in serial iterate over the length of
the game will now be split up to handle specific
rounds
Bookkeeping Node
MPI Collectives to coalesce data

17
Pseudo Code

Foreach species
Foreach species
gamePlay(var1)
MPI_Reduce(var1)
If (rank0) Calculate_averages()
If (rank0) Print_game_results

18
Results
19
Game Dynamics

Evolutionarily Stable Strategies
Retaliator
Prober-Retaliator
Result
Limited War is a stable and dominant strategy
given individual selection

20
CUDA Implementation
97 time reduction
21
CUDA Implementation
22
Blue Gene Implementation
99 time reduction
23
Blue Gene Implementation
24
Future Directions