An FPGA Implementation of the Ewald Direct Space and LennardJones Compute Engines PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: An FPGA Implementation of the Ewald Direct Space and LennardJones Compute Engines


1
An FPGA Implementation of theEwald Direct Space
and Lennard-JonesCompute Engines
  • By David Chui
  • Supervisor Professor P. Chow

2
Overview
  • Introduction and Motivation
  • Background and Previous Work
  • Hardware Compute Engines
  • Results and Performance
  • Conclusions and Future Work

3
  • 1. Introduction and Motivation

4
What is Molecular Dynamics (MD) simulation?
  • Biomolecular simulations
  • Structure and behavior of biological systems
  • Uses classical mechanics to model a molecular
    system
  • Newtonian equations of motion (F ma)
  • Compute forces and integrate acceleration through
    time to move atoms
  • A large scale MD system takes years to simulate

5
Why is this an interesting computational problem?
6
Motivation
  • Special-purpose computers for MD simulation have
    become an interesting application
  • FPGA technology
  • Reconfigurable
  • Low cost for system prototype
  • Short turn around time and development cycle
  • Latest technology
  • Design portability

7
Objectives
  • Implement the compute engines on FPGA
  • Calculate the non-bonded interactions in an MD
    simulation (Lennard-Jones and Ewald Direct Space)
  • Explore the hardware resources
  • Study the trade-off between hardware resources
    and computational precision
  • Analyze the hardware pipeline performance
  • Become the components of a larger project in the
    future

8
  • 2. Background and Previous Work

9
Lennard-Jones Potential
  • Attraction due to instantaneous dipole of
    molecules
  • Pair-wise non-bonded interactions O(N2)
  • Short range force
  • Use cut-off radius to reduce computations
  • Reduced complexity close to O(N)

10
Lennard-Jones Potential of Argon gas
11
Electrostatic Potential
  • Attraction and repulsion due to electrostatic
    charge of particles (long range force)
  • Reformulate using Ewald Summation
  • Decompose to Direct Space and Reciprocal Space
  • Direct Space computation similar to Lennard-Jones
  • Direct Space complexity close to O(N)

12
Ewald Summation - Direct Space
13
Previous Hardware Developments
14
Recent work - FPGA based MD simulator
  • Transmogrifier-3 FPGA system
  • University of Toronto (2003)
  • Estimated speedup of over 20 times over software
    with better hardware resources
  • Fixed-point arithmetic, function table lookup,
    and interpolation
  • Xilinx Virtex-II Pro XC2VP70 FPGA
  • Boston University (2005)
  • Achieved a speedup of over 88 times over software
  • Fixed-point arithmetic, function table lookup,
    and interpolation

15
MD Simulation software - NAMD
  • Parallel runtime system (Charm/Converse)
  • Highly scalable
  • Largest system simulated has over 300,000 atoms
    on 1000 processors
  • Spatial decomposition
  • Double precision floating point

16
NAMD - Spatial Decomposition
17
  • 3. Hardware Compute Engines

18
Purpose and Design Approach
  • Implement the functionality of the software
    compute object
  • Calculate the non-bonded interactions given the
    particle information
  • Fixed-point arithmetic, function table lookup,
    and interpolation
  • Pipelined architecture

19
Compute Engine Block Diagram
20
Function Lookup Table
  • The function to be looked up is a function of
    r2 (the separation distance between a pair of
    atoms)
  • Block floating point lookup
  • Partition function based on different precision

21
Function Lookup Table
22
Hardware Testing Configuration
23
  • 4. Results and Performance

24
Simulation Overview
  • Software model
  • Different coordinate precisions and lookup table
    sizes
  • Obtain the error compared to computation using
    double precision

25
Total Energy Fluctuation
26
Average Total Energy
27
Operating Frequency
28
Latency and Throughput
29
Hardware Improvement
  • Operating frequency
  • Place-and-route constraints
  • More pipeline stages
  • Throughput
  • More hardware resources
  • Avoid sharing of multipliers

30
Compared with previous work
  • Pipelined adders and multipliers
  • Block floating point memory lookup
  • Support different types of atoms

31
  • 5. Conclusions and Future Work

32
Hardware Precision
  • A combination of fixed-point arithmetic, function
    table lookup, and interpolation can achieve high
    precision
  • Similar result in RMS energy fluctuation and
    average energy
  • Coordinate precision of 7.41
  • Table lookup size of 1K
  • Block floating memory
  • Data precision maximized
  • Different types of functions

33
Hardware Performance
  • Compute engines operating frequency
  • Ewald Direct Space 82.2 MHz
  • Lennard-Jones 80.0 MHz
  • Achieving 100 MHz is feasible with newer FPGAs

34
Future Work
  • Study different types of MD systems
  • Simulate computation error with different table
    lookup sizes and interpolation orders
  • Hardware usage storing data in block RAMs
    instead of external ZBT memory
Write a Comment
User Comments (0)
About PowerShow.com