Optimization of Loop Unrolling on dense Vectormatrix multiplication Parallel Processing - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Optimization of Loop Unrolling on dense Vectormatrix multiplication Parallel Processing

Description:

Optimization of Loop Unrolling on dense Vector-matrix multiplication -Parallel Processing ... Conclusion - Result Table. 1.00. 2.23, 2.23. 1048.576. 0.66. 2.000 ... – PowerPoint PPT presentation

Number of Views:216
Avg rating:3.0/5.0
Slides: 9
Provided by: smal9
Category:

less

Transcript and Presenter's Notes

Title: Optimization of Loop Unrolling on dense Vectormatrix multiplication Parallel Processing


1
Optimization of Loop Unrolling on dense
Vector-matrix multiplication
-Parallel Processing
  • By Sumit Malhotra
  • Computer Science, Florida Tech
  • 767050340
  • Dr. Charles Fulton

2
Aim of project
  • To find the best loop unrolling parameters for
    different number of processors on a 5120 X 5120
    matrix.

3
Algorithm for Matrix Multiplication
  • m n 5120
  • for (i0 i lt local_m iUNROLL2)
  • for (j0 j lt n jUNROLL)
  • matrix multiplication
  • Where UNROLL2 and UNROLL are loop unrolling
    parameters
  • and local_m m/p and p number of processors.
  • Therefore the size of matrix on each processor
    will be local_m x n.

4
Size of matrix on each processor
  • Size of Matrix when p1 5120 X 5120
  • Size of Matrix when p2 2560 X 5120
  • Size of Matrix when p4 1280 X 5120
  • Size of Matrix when p8 640 X 5120
  • Where p
    number of processors.

5
Sample Code
  • UNROLL2 UNROLL 2
  • for (i0 i lt local_m iUNROLL2)
  • for (j0 j lt n jUNROLL)
  • yi local_Aij xj
  • local_Aij1 xj1
  • yi1 local_Ai1j xj
  • local_Ai1j1
    xj1

6
Sample Code
  • UNROLL2 2, UNROLL 4.
  • for (i0 i lt local_m iUNROLL2)
  • for (j0 j lt n jUNROLL)
  • yi local_Aij xj
    local_Aij1 xj1
  • local_Aij2 xj2
    local_Aij3 xj3
  • yi1 local_Ai1j xj
    local_Ai1j1 xj1
  • local_Ai1j2 xj2
    local_Ai1j3 xj3

7
Time Calculation
  • Start clock()
  • Multiplication code //Computation.
  • MPI_Gather() //Communication.
  • End clock()
  • Total Computation Communication Time Start
    End
  • Start clock()
  • MPI_Gather() //Communication.
  • End clock()
  • Communication Time Start End
  • Start clock()
  • MPI_Scatter() //Communication.
  • End clock()
  • Scatter Time Start End

8
Conclusion - Result Table
Write a Comment
User Comments (0)
About PowerShow.com