AltiVec Optimizations - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

AltiVec Optimizations

Description:

AltiVec is a SIMD unit for PowerPC CPUs (G4, G5) It's similar to other ... In video codecs, using vectorized encoding/decoding. In scientific computing (eg. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 17
Provided by: free59
Category:

less

Transcript and Presenter's Notes

Title: AltiVec Optimizations


1
AltiVec Optimizations
for Linux/PowerPC
2
  • What is AltiVec?
  • Where can it be used?
  • How does it work?
  • How is it used?
  • Is it worth it?
  • Why hasn't it be used so far?
  • Can it be integrated into Linux?
  • What's Debian's role?
  • Altivec coding examples

3
  • AltiVec is a SIMD unit for PowerPC CPUs (G4, G5)
  • It's similar to other SIMDs (MMX/SSEx/3dNow!/etc).
  • It's the most complete SIMD implementation
    available
  • It doesn't do things faster, but it does more
    things in the same time!
  • It's a consistent architecture that just works!
  • It has plenty of registers and it uses them
    wisely -)(every instruction can use up to 3
    operands and the result is placed in a separate
    register)

4
  • When power consumption is an important factor.
  • Where Raw Power is just not enough and other
    methods must be used to achieve good performance.
  • In security, using vectorized encryption
    algorithms
  • In databases using vectorized hashing
  • In video codecs, using vectorized
    encoding/decoding
  • In scientific computing (eg. linear algebra,
    bioinformatics)
  • Also, generic computing where lots of data
    processing gets done

5
Scalar vs. SIMD
6
How is it used?
  • Using C? very easily.
  • No need to spend lots of time writing hand
    written asm functions.
  • Code vectorization can be tricky but it's not
    that difficult.
  • Consider the following simple example in C code

7
Why hasn't it been used more?
  • It is used extensively in MacOS X throughout the
    whole OS, and lots of ready made libraries are
    available.
  • But It has had very limited adoption in Linux.
  • Very few applications in Linux are
    AltiVec-optimized.
  • No system-wide AltiVec optimizations.
  • On Linux, AltiVec is one of the most underused
    SIMD engines available.

8
Can we benefit from it?
  • YES! Thanks to funding from Genesi and support
    from Freescale, a process has started of
    AltiVec-optimizing commonly used Linux
    components.
  • The TODO list is long, but we have started with
    the following
  • libfreevec GNU libC (functions mem() , str(),
    swab()).
  • Hashing Algorithms (Adler32, BDB hashing
    functions)
  • Sorting Algorithms (Insertion Sort, Merge Sort,
    Quick Sort)
  • TODO C STL, libmcrypt (common encryption
    functions AES, RC5, DES, 3DES, etc), UMAC
    parallelizable hashing algorithm (a replacement
    for MD5), CRC8/16/24/32, etc.

9
  • Debian has very good communication channels with
    upstream and an excellent BTS, so the AltiVec
    patches can be pushed upstream fairly quickly.
  • Debian is full of knowledgeable (and helping -)
    people that will provide the necessary feedback
    for this effort.
  • Debian has a very strong PowerPC community that
    will directly benefit from these optimizations.
    It will also make Debian the de facto
    distribution of choice for PowerPC users.

10
What about Performance?
  • memcpy() is 4x faster
  • memchr()/strnlen() is up to 10x faster
  • memset() is 10x faster
  • memfrob() is 24x faster!
  • swab() is 11x faster
  • hashing algorithms are from 2-7x faster
  • Adler32 is 2.5x faster
  • Zlib is 25 faster (compression) with only 2/6
    functions vectorized!
  • Insertion Sort is from 2-54x(!) faster!

11
Example memcpy()
12
Example Insertion Sort
13
Example Merge Sort
14
More about Sorting
  • Integer sorting is used in the kernel
    (filesystems) and in other applications.
  • AltiVec Insertion Sort in typical integer sorting
    is 4x faster than the scalar version.
  • Insertion Sort is up to 54x faster for char
    sorting than the scalar version!!
  • Next to vectorize Quicksort. Estimated
    performance gain 400!

15
  • AltiVec is a very useful tool, which until
    recently has been greatly underused
  • The future libfreevec 0.8 has been released,
    with 0.9 imminent. The goal is that with 1.0,
    many routines in GNU libC will have been replaced
    by high performance Altivec replacements.
    Further, other libC projects (like dietlibc,
    uClibc, etc) might benefit.
  • Other libraries will follow, like zlib, mcrypt,
    C STL, libdv, etc. The list is very long!

16
  • For more Altivec information check
  • http//arstechnica.com/articles/paedia/cpu/p4andg4
    e2.ars/1
  • http//developer.apple.com/hardware/ve/quickstart.
    html
  • http//www.penguinppc.org/dev/altivec
  • http//arstechnica.com/01q2/p4andg4e/p4andg4e-1.ht
    ml
  • http//www.freescale.com/files/32bit/doc/reports_p
    resentations/LNXDEVSYS_PPT1.pdf
  • http//www.cs.ucdavis.edu/rogaway/umac/umac_full.
    pdf
  • http//www.ppczone.org/
  • http//www.freevec.org/ (libfreevec official site)
Write a Comment
User Comments (0)
About PowerShow.com