PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS

Description:

Create computational model to test role of behavioral strategies and related variables ... Speed bump. 72 racks ( 8) BG System Overview: Integrated system ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 25
Provided by: ape58
Category:

less

Transcript and Presenter's Notes

Title: PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS


1
PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS
  • Amanda Peters
  • MIT 18.337
  • 5/13/2009

2
Outline
  • Motivation
  • Model
  • GPU Implementation
  • Blue Gene Implementation
  • Hardware
  • Results
  • Future Work

3
Motivation
  • Why does cooperation evolve?
  • Examples
  • Total War vs. Limited War
  • Quorum Sensing Bacteria
  • Pathogens
  • Goal of the project
  • Create computational model to test role of
    behavioral strategies and related variables

4
Model
  • Focus on finding evolutionarily stable strategies
  • Five strategies
  • Mouse
  • Hawk
  • Bully
  • Retaliator
  • Prober-Retaliator
  • Payoffs
  • Win 60
  • Seriously Injured -100
  • Small Injuries Each -2
  • Emerge from Short Game uninjured 20

5
Why parallelize it?
  • Reduce computational time
  • Enable trials of more strategies
  • Enable analysis of different variables roles
  • Introduce more actions to the action space

6
CUDA Implementation
  • Embarrassingly parallel code
  • Distribute rounds of the game to different
    threads
  • Only payoff array in global memory
  • Copy it back for post processing

7
Sample Code
  • __global__ void gameGPU(int player1, int player2,
    float d_payoff1, float d_payoff2,float
    rand_si, int max_rounds)
  • //Thread index __global__ void gameGPU(int
    player1, int player2, float d_payoff1, float
    d_payoff2,float rand_si, int max_rounds)
  • //Thread index
  • const int tidblockDim.x blockIdx.x
    threadIdx.x
  •  
  • //Total number of threads in grid
  • const int THREAD_N blockDim.x
    gridDim.x
  •  
  • int max_moves500
  • for (int round tid round lt max_rounds
    round THREAD_N)
  • play_round(player1, player2,
    d_payoff1round, d_payoff2round,
    rand_siround,max_moves)

8
Blue Gene Implementation
9
System Overview
10
Design Fundamentals
  • Low Power PPC440 Processing Core
  • System-on-a-chip ASIC Technology
  • Dense Packaging
  • Ducted, Air Cooled, 25 kW Racks
  • Standard proven components for reliability and
    cost

11
(No Transcript)
12
BG/P
Blue Gene/L
System
Rack
32 node cards
180/360 TF/s 32 TB (For the original 64 rack
system)
Node card
(32 chips 4x4x2) 16 compute, 0-2 IO cards
2.8/5.6 TF/s 512 GB
Compute card
2 chips, 1x2x1
90/180 GF/s 16 GB
Chip
2 processors
5.6/11.2 GF/s 1.0 GB
2.8/5.6 GF/s 4 MB
13
Blue Gene/P
System
Cabled 8x8x16
Rack
32 Node Cards
1 PF/s 144 TB
14 TF/s 2 TB
Compute Card
1 chip, 20 DRAMs
  • Key Differences
  • 4 cores per chip
  • Speed bump
  • 72 racks (8)

435 GF/s 64 GB
Chip
4 processors
13.6 GF/s 2.0 (or 4.0) GB DDR
13.6 GF/s 8 MB EDRAM
14
BG System Overview Integrated system
  • Lightweight kernel on compute nodes
  • Linux on I/O nodes handling syscalls
  • Optimized MPI library for high speed messaging
  • Control system on Service Node with private
    control network
  • Compilers and job launch on Front End Nodes

15
Blue Gene/L interconnection networks
  • 3 Dimensional Torus
  • Interconnects all compute nodes (65,536)
  • Virtual cut-through hardware routing
  • 1.4Gb/s on all 12 node links (2.1 GB/s per node)
  • Communications backbone for computations
  • 0.7/1.4 TB/s bisection bandwidth, 67TB/s total
    bandwidth
  • Global Collective Network
  • One-to-all broadcast functionality
  • Reduction operations functionality
  • 2.8 Gb/s of bandwidth per link Latency of tree
    traversal 2.5 µs
  • 23TB/s total binary tree bandwidth (64k machine)
  • Interconnects all compute and I/O nodes (1024)
  • Low Latency Global Barrier and Interrupt
  • Round trip latency 1.3 µs
  • Control Network
  • Boot, monitoring and diagnostics
  • Ethernet
  • Incorporated into every node ASIC
  • Active in the I/O nodes (164)

16
C/MPI Implementation of Code
  • Static Partitioning of work units
  • work_unit number_rounds/partition_size
  • Each node will get a chunk of the data
  • Loops that in serial iterate over the length of
    the game will now be split up to handle specific
    rounds
  • Bookkeeping Node
  • MPI Collectives to coalesce data

17
Pseudo Code
  • Foreach species
  • Foreach species
  • gamePlay(var1)
  • MPI_Reduce(var1)
  • If (rank0) Calculate_averages()
  • If (rank0) Print_game_results

18
Results
19
Game Dynamics
  • Evolutionarily Stable Strategies
  • Retaliator
  • Prober-Retaliator
  • Result
  • Limited War is a stable and dominant strategy
    given individual selection

20
CUDA Implementation
97 time reduction
21
CUDA Implementation
22
Blue Gene Implementation
99 time reduction
23
Blue Gene Implementation
24
Future Directions
  • Investigate more behavioral strategies
  • Increase action space
  • CUDA implementation data management
  • Blue Gene implementation
  • Examine superlinearity
  • Test larger problem sizes
  • Optimize single node performance
Write a Comment
User Comments (0)
About PowerShow.com