Optimization of Sparse Matrix Kernels for Data Mining - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Optimization of Sparse Matrix Kernels for Data Mining

Description:

Im and Yelick. 2. Outline. SPARSITY : Performance optimization of Sparse Matrix Vector Operations ... Im and Yelick. 3. The Need for optimized sparse matrix codes ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 21
Provided by: eunj
Category:

less

Transcript and Presenter's Notes

Title: Optimization of Sparse Matrix Kernels for Data Mining


1
Optimization of Sparse Matrix Kernels for Data
Mining
  • Eun-Jin Im and Katherine Yelick
  • U.C.Berkeley

2
Outline
  • SPARSITY Performance optimization of Sparse
    Matrix Vector Operations
  • Sparse Matrices in Data Mining Applications
  • Performance Improvements by SPARSITY for Data
    Mining Matrices

3
The Need for optimized sparse matrix codes
  • Sparse matrix is represented with indirect data
    structures.
  • Sparse matrix routines are slower than dense
    matrix counterparts.
  • The performance is dependent on the distribution
    of nonzero element of the sparse matrix.

4
The Solution SPARSITY System
  • System that provides optimized C codes for sparse
    matrix vector operations
  • http//www.cs.berkeley.edu/ejim/sparsity
  • Related Work ATLAS, PHiPAC for dense matrix
    routines and FFTW

5
SPARSITY optimizations (1) Register Blocking
2x2 register blocked matrix
  • Identify a small dense blocks of nonzeros.
  • Use an optimized multiplication code for the
    particular block size.

2 1 4 3
0 2 1 2
1 2
1 0
0 3 3 1
2 5 1 4
3 0 3 2
0 4 1 2
2
1
2
3
0
2
4
1
2
5
0
1
0
0
1
3
0
2
1
3
0
5
7
3
0
4
1
1
  • Improves register reuse, lowers indexing
    overhead.
  • Challenge choosing a block size

6
SPARSITY optimizations (2) Cache Blocking
  • Keeping part of source vector in cache

Source vector (x)

Destination Vector (y)
Sparse matrix(A)
  • Improves cache reuse of source vector.
  • Challenge choosing a block size

7
SPARSITY optimizations (3) Multiple Vectors
  • Better potential for reuse
  • Loop unrolled codes multiplying across vectors
    are generated by a code generator.

x
j1
y
a
i2
y
ij
i1
  • Allows reuse of matrix elements.
  • Choosing the number of vectors for loop unrolling.

8
SPARSITY automatic performance tuning
  • SPARSITY is a system for automatic performance
    engineering.
  • Parameterized code generation
  • Search combined with performance modeling selects
  • Register block size
  • Cache block size
  • Number of vectors for loop unrolling

9
Sparse Matrices from Data Mining App.
Collection Algorithm Dimension Non-zeros Density Avg. of NZs / row
Web Documents LSI 10000 x 255943 3.7M 0.15 371
NSF abstracts CD 94481 x 6366 7.0M 1.16 74
Face Images EA 36000 x 2640 5.6M 5.86 155
10
Data Mining Algorithms
  • For Text Retrieval
  • Term-by-document matrix
  • Latent Semantic Indexing Berry et. Al.
  • Computation of Singular Value Decomposition
  • Blocked SVD uses multiple vectors
  • Concept Decomposition Dhillon and Modha
  • Matrix approximation solving least-squares
    problem
  • Also uses multiple vectors

11
Data Mining Algorithms
  • For Image Retrieval
  • Eigenface Approximation Li
  • Used for face recognition
  • Pixel-by-image matrix
  • Each image has multi-resolution hierarchy and is
    compressed with wavelet transformation.

12
Platforms Used in Performance Measurements
Clock (MHz) L2 cache DGEMV (MFLOPS) DGEMM (MFLOPS)
MIPS R10000 200 2 MB 67 322
Ultra- SPARC II 250 1 MB 100 401
Penitium III 450 512 KB 87 328
Alpha 21164 533 96 KB 83 550
13
Performance on Web Document Data
14
Performance on NSF Abstract Data
15
Performance on Face Image Data
16
Speedup
MIPS R10K Ultra- SPARC II Pentium III Alpha 21164
Web Documents 3.8 5.9 2.0 2.7
NSF Abstracts 2.9 1.3 1.6 1.3
Face Images 4.7 5.1 2.6 4.5
17
Performance Summary
  • Performance is better when a matrix is denser.
    (Face Images)
  • Cache blocking is effective for a matrix with a
    large number of columns. (Web Documents)
  • Optimization of the multiplication with multiple
    vectors is effective.

18
Cache Block Size for Web Document Matrix
L2 Cache Size Block Size for Single Vector Block Size for 10 vectors
MIPS R10000 2MB 10000x 64K 10000x 8K
Ultra-SPARC II 1MB 10000x 32K 10000x 4K
Pentium III 512KB 10000x 16K 10000x 2K
Alpha 21164 96KB 10000x 4K 10000x 2K
  • Width of cache block is limited by the size of
    cache.
  • For multiple vectors, the loop unrolling factor
    is 10 except for Alpha 21164 where the factor is
    3.

19
Conclusion
  • Most of the matrices used in data mining is
    sparse matrix.
  • The sparse matrix operation is memory-inefficient
    and needs optimization.
  • The optimization is dependent on the nonzero
    structure of the matrix.
  • SPARSITY system effectively speeds up this
    operation.

20
For Contribution
  • Contact ejim_at_cs.berkeley.edu to donate your
    matrix ! Thank you.
Write a Comment
User Comments (0)
About PowerShow.com