Simple Circuit - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Simple Circuit

Description:

Title: Simple Circuit Author: bsgreer Last modified by: chase Created Date: 3/18/2002 7:49:05 AM Document presentation format: Company – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 28
Provided by: bsgreer
Category:
Tags: circuit | simple

less

Transcript and Presenter's Notes

Title: Simple Circuit


1
Performance libraries Math Kernel Library
2003?3?
2
Agenda
  • ??
  • ??
  • MKL???
  • ????
  • ????????
  • ??
  • ?????
  • BLAS??

3
??
  • ??,??,??!
  • MKL?Intel??????????????
  • ??,???BLAS? FFT
  • ??
  • Solvers (BLAS, LAPACK)
  • ????/??? solvers(BLAS, LAPACK)
  • ????????? (dgemm)
  • PDEs, ????, ??, solid-state physics (FFTs)
  • General scientific, financial (vector
    transcendental functions VML)
  • ????????Intel???????

4
?? donts
  • But dont use MKL on
  • Dont use MKL on small counts
  • Dont call vector math functions on small n

X Y Z W
X Y Z W
4x4 Transformation matrix

????
????????IPP
5
MKL???
  • BLAS (Basic Linear Algebra Subroutines)
  • Level 1 BLAS vector-vector operations
  • 15 function types
  • 48 functions
  • Level 2 BLAS matrix-vector operations
  • 26 function types
  • 66 functions
  • Level 3 BLAS matrix-matrix operations
  • 9 function types
  • 30 functions
  • Extended BLAS level 1 BLAS for sparse vectors
  • 8 function types
  • 24 functions

6
MKL Contents
  • LAPACK (linear algebra package)
  • Solvers eigensolvers Many hundreds of routines
    total!
  • Total user callable support routines gt 1000
  • FFTs (fast fourier transforms)
  • one two dimensional
  • with without frequency ordering (bit reversal)
  • VML (vector math library)
  • Set of vectorized transcendental functions
  • Most of libm functions, but faster

7
MKL???
  • ??? MKL????Fortran??
  • ???????
  • BLAS, LAPACK ????????Fortran??
  • Cblas?? ????C/C?????BLAS

8
MKL??? - ??
  • ??Intel?CVF Fortran ???
  • ?? Linux ? Windows ????
  • ????????
  • ??????? 32-bit and 64-bit
  • ????????
  • ????? MKL Index

9
????????
  • ?????????????
  • ???????????? (??????)
  • CPU ???????????
  • Cache ?????????Cache?
  • ???? ?????????
  • Computer ??????????
  • System ????????? (??)

10
??
  • ??? MKL ???????,??
  • level 1, level 2 BLAS ????????( O(n) )
  • ???????????
  • Level 3 BLAS ( O(n3) )
  • LAPACK ( O(n3) )
  • FFTs ( O(n log(n) )
  • VML? Depends on processor and function
  • ??????? OpenMP??
  • ??MKL?????????????????

11
??? MKL??
  • Assume program calls MKL function then what?
  • two approaches
  • Static link all library objects linked into
    program binary
  • DLL use without static link frequent C approach

12
Static Link
  • Scenario 1 ifl, BLAS, Pentium III processor
  • ifl o myprog myprog.f static L/opt/intel/mkl/li
    b/32 lmkl_p3 lpthread -lguide (Linux)

13
Dynamic Link
  • Scenario 2 C program uses BLAS but want optimal
    code determined at runtime
  • ifl o myprog myprog.f L/opt/intel/mkl/lib/32
    lmkl lpthread -lguide (Linux)

14
BLAS ??
  • 3 levels of functions sparse
  • Level 1 vector-vector operations
  • Level 2 vector-matrix operations
  • Level 3 matrix-matrix operations
  • Sparse level 1 operations on sparse vectors
  • Levels ???
  • Level 1 in early 70s
  • Level 2 in mid-70s followed immediately by level
    3

15
BLAS ????
  • General scheme ltprecisiongtltnamegtltmodifiergt
  • precision one or two letters
  • 1 letter implies input and output are same
    type
  • s single, d double, c single complex, z
    double complex
  • 2 letters input and output are different
  • cs, zd complex in, real out sc, dz real in,
    complex out

16
BLAS????
  • ltnamegt
  • g general ge general gb band(??)
  • s symmetric sy symmetric sp packed sb
    band(??)
  • h Hermitian he Hermitian hp packed hb
    band( Hermitian??)
  • t triangular tr triangular tp packed tb
    band(??)

17
??band(General Band)
18
??band(symmetric band)
19
????band(Hermitian Band)
20
??band(triangular band)
21
packed
22
BLAS Naming Conventions
  • Level 1ltmodifiergt
  • c conjugated (cdotc), u unconjugated (cdotu),
    g givens (srotg)
  • Level 2 ltmodifiergt
  • mv matrix-vector sv solve (vector operations)
    r rank update r2 rank 2 update
  • dger double-precision general rank update
  • A alpha x y A
  • Level 3 ltmodifiergt
  • mm matrix-matrix sm solve (matrix operations)
    r rank update r2 rank 2 update
  • dsyr2k double-precision symmetric rank-2 update

23
Matrix Multiplication
  • ??????
  • Roll your own
  • DDOT (level 1)
  • DGEMV (level 2)
  • DGEMM (level 3)
  • Because C is used, all is not pretty J

24
Matrix MultiplicationRoll Your Own/Dot Product
Roll Your Own
ddot
for( i 0 i lt n i ) for( j 0 j lt m
j ) temp 0.0 for( k 0 k lt
kk k ) temp aik bkj
cij temp
incx 1 incy ldb for( i 0 i lt n i
) for( j 0 j lt m j ) cij
DDOT( n, ai, incx, b0j, incy )

25
Matrix MultiplicationDGEMV/DGEMM
dgemv
incx 1 incy ldb alpha 1.0 beta 0.0
transa 't' for( i 0 i lt n i )
dgemv( transa, m, n, alpha, a, lda,
b0i, ldb, beta, c0i, ldc
)
dgemm
alpha 1.0 beta 0.0 transa 'n' transb
'n' dgemm( transa, transb, m, n, kk,
alpha, b,
ldb, a, lda, beta,
c, ldc )
26
MKL?????? ????? vs DGEMM
2.2 GHz Intel Pentium 4 processor, 512 MB
memory
27
MKL ?????????DGEMM
800 MHz Itanium processor, 4 MB cache NEC
Express5800
Write a Comment
User Comments (0)
About PowerShow.com