Implementation of Morton Layout for Large Arrays - PowerPoint PPT Presentation

About This Presentation
Title:

Implementation of Morton Layout for Large Arrays

Description:

Morton Layout is used in two ... Also known as Zip Fastening Array Layout. Introduction continues... 1 1 0 1 (zipped address) Implementation continues ... – PowerPoint PPT presentation

Number of Views:464
Avg rating:3.0/5.0
Slides: 54
Provided by: Sha7180
Category:

less

Transcript and Presenter's Notes

Title: Implementation of Morton Layout for Large Arrays


1
Implementation of Morton Layout for Large Arrays
Bowling Green State University
  • Presented by Sharad Ratna Bajracharya
  • Advisor Prof. Larry Dunning

23rd April 2004
2
Outline
  • Introduction
  • Objectives
  • Implementation
  • Samples
  • Improvement
  • Recommendation
  • Conclusion

3
Introduction
  • Morton Layout is used in two dimensional array.
  • Performance of Morton Layout is comparatively
    better than row-major or column-major array
    representation.

4
Introduction continues...
  • Reports on analysis of the Morton Layout for the
    performance and efficiency
  • An exhaustive evaluation of row-major,
    column-major and Morton Layouts for large
    two-dimensional arrays Jeyarajan Thiyagalingam,
    Olav Beckman, Paul H. J. Kelly.
  • Is Morton Layout competitive for large
    two-dimensional arrays? Jeyarajan Thiyagalingam
    and Paul H. J. Kelly.
  • Improving the Performance of Morton Layout by
    Array Alignment and Loop Unrolling Jeyarajan
    Thiyagalingam, Olav Beckman, Paul H. J. Kelly.

5
Introduction continues...
  • General Row Major Array Representation
  • Row major ordering assigns successive elements,
    moving across the rows and then down the columns,
    to successive memory locations. 0 1 2
    3 4 5 6 78 9 10 1112 13
    14 15

6
Introduction continues...
  • Column Major array representation. 0 4
    8 12 1 5 9 13 2
    6 10 14 3 7 11 15

7
Introduction continues...
  • Morton layout is a compromise storage layout
    between the programming language mandated layouts
    such as row-major and column-major. 0 1
    2 3 0 1 4 5 4 5 6
    7 2 3 6 7 8 9 10 11 8
    9 12 1312 13 14 15 10 11 14
    15 (Row Major) (Morton Storage Layout)

8
Introduction continues...
  • Morton storage layout works with almost equal
    overhead whether traversed row-wise or
    column-wise.
  • Morton layout works fine with square two
    dimensional array, which size is power of 2 such
    as 2x2, 4x4, 8x8 etc.

9
Introduction continues...
  • For non-square matrix, it waste lots of memory
    spaces.0 1 2 3 0 1 4 5 4 5
    6 7 2 3 6 78 9 10 11 8 9 X
    X 10 11(Row Major) (Morton Storage
    Layout)

10
Introduction continues...
  • How Morton Layout Works?
  • For any subscript of 2 dimensional array such as
    array 2 , 3 Binary value of row 2 -gt 1
    0Binary value of col 3 -gt 1 1Morton
    Layout stores at 1 1 0 1 location, i.e. 13th
    memory location.
  • Also known as Zip Fastening Array Layout.

11
Introduction continues...
  • Consider row major large array1 2 3 4 5 6
    7 .10001001 1002 1003 1004 1005 1006 100
    7 ...20002001 2002
    ...9001 9002 9003 9004 9005 9006 9007
    10000. . . . . . . .
  • Result is cache miss, page faults and poor
    performance.

12
Objectives
  • Improve cache miss and page fault characteristics
    in Large Array using Morton Array Layouts.
  • Reduce wasted memory in Morton layout.
  • Improvement in extendibility of arrays.

13
Implementation
  • Interleaved bit patterns 4 -gt 0 1 0 0 -gt 0 0
    1 0 0 0 09 -gt 1 0 0 1 -gt 1 0 0 0 0 0 115 -gt
    1 1 1 1 -gt 1 0 1 0 1 0 1 (Interleaved Bits)

14
Implementation continues
  • Bit interleaved increment and decrement
  • Bit interleaved increment101 1 -gt 1 0 0 0 1
    1110 -gt 1 0 1 0 0(Changes are in
    interleaved bits)
  • For any value a, bit interleaved increment is
    given bya1 ((a 0xAAAAAAAA) 1)
    0x55555555
  • 0xAAAAAAAA1010..10101010 (32 bits)
  • 0x55555555 0101 .01010101 (32 bits)

15
Implementation continues
  • Bit interleaved increment a1 ((a
    0xAAAAAAAA) 1) 0x55555555 0 0 0 1 -gt Bit
    interleaved 1 (0 1)OR 1 0 1 0 1 0 1
    1 1 1 1 0 0AND 0 1 0 1 0 1 0 0 -gt Bit
    interleaved 2 (1 0)

16
Implementation continues
  • More examples of bit interleaved increment0 0 0
    0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0
    0 1 0 1 1 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0
    1 1 0 0 0 1 1

17
Implementation continues
  • Bit interleaved DecrementFor example,1 0 0 - 1
    -gt 1 0 0 0 0 - 11 1 -gt 0 0 1 0 1(Changes
    are in interleaved bits)
  • For any value a, bit interleaved decrement is
    given by a-1 (a - 1) 0x55555555Where,
  • 0x55555555 010101010101 (32 bits)

18
Implementation continues
  • Bit interleaved decrement a-1 (a -1)
    0x55555555 0 1 0 0 0 0 -gt Bit interleaved 4
    (100) - 1 0 0 1 1 1 1AND 0 1 0 1 0
    1 0 0 0 1 0 1 -gt Bit interleaved 3 (11)

19
Implementation continues
  • More examples of bit interleaved
    decrement...1 0 0 0 0 - 1 0 0 1 0 1 0 0
    1 0 1 - 1 0 0 1 0 0 0 0 1 0 0 - 1 0 0 0 0 1
    0 0 0 0 1 - 1 0 0 0 0 0

20
Implementation continues
  • Morton Layout Array representation can be
    implemented in two ways
  • First method is by maintaining lookup table of
    bit interleaved array subscript for address
    calculation. For example,0 -gt 0 0 0 01 -gt 0 0 0
    12 -gt 0 1 0 03 -gt 0 1 0 1

21
Implementation continues
  • For example, any array subscript viz. 2 , 3
    Value of 2 (1 0 ) from lookuptable -gt
    0100Value of 3 ( 1 1) from lookuptable -gt
    0101To get the Morton layout address,ROW
    bitwise shift 1 COL0100ltlt1 010110000101,
    that is, 1 0 0 0 0 1 0
    1 1 1 0 1 (zipped address)

22
Implementation continues
  • Second Method to implement Morton Array Layout
    Representation is by only using bit interleaved
    increment and decrement without lookuptable.

23
Implementation continues
  • Implemented in C as two dimensional array
    matrix class with Standard Template Library (STL)
    compatibility so as to make it generic, that is,
    it is not tied to any particular data structure
    or object type.
  • Internally data are stored in STL vector
    sequentially.

24
Implementation continues
  • Direct accessing the element of array matrix by
    using array subscript is implemented using lookup
    table.
  • Random Iterators are defined which make use of
    bit interleaved increment and decrement without
    using lookup table.
  • Iterators are generalization of pointers. They
    are objects that point to other objects.

25
Implementation continues
  • Different types of random iterators are
    implemented to provide the flexibility in using
    the matrix class, such as,
  • Row Major iterator
  • Column Major iterator
  • Diagonal iterator
  • Row iterator / Super row iterator
  • Column iterator / Super column iterator
  • Reverse Row Major iterator

26
Samples
  • Using Row Major Iterator

Sorted Data -9 -9 -8 -8 -8 -8 -7 -6 -6
-5 -4 -4 -2 -2 -2 -1 1 1 2 3 5 5 6
7
Original Data6 -9 -8 -1 -8 -6 -9 -2 -2
-5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1
7
Start
End
//Row Major sorting using STL Sort() mat1matori
coutltltmat1ltltendl sort(mat1.begin(),
mat1.end()) coutltlt"Sorted Data"ltltendl coutltltmat
1ltltendl
27
Samples continues...
  • Using Column Major iterator

Sorted Data -9 -7 -2 2 -9 -6 -2 3 -8
-6 -2 5 -8 -5 -1 5 -8 -4 1 6 -8 -4
1 7
Original Data 6 -9 -8 -1 -8 -6 -9 -2 -2
-5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1
7
Start
End
//Column Major sorting using STL
Sort() mat1matori coutltltmat1ltltendl sort(mat1.cb
egin(), mat1.cend()) coutltlt"Sorted
Data"ltltendl coutltltmat1ltltendl
28
Samples continues...
  • Using super row iterator

Original Data 6 -9 -8 -1 -8 -6 -9 -2 -2
-5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1
7
Sorted Data -9 -8 -1 6 -9 -8 -6 -2 -6
-5 -4 -2 -8 -4 2 3 -7 -2 1 5 -8 1 5
7
//Row by row sorting using STL Sort() mat1matori
coutltltmat1ltltendl for(ritermat1.r2rbegin()riter
!mat1.r2rend()riter) sort((riter).begin(),
(riter).end()) coutltltmat1ltltendl
29
Samples continues...
  • Using super column iterator

Original Data 6 -9 -8 -1 -8 -6 -9 -2 -2
-5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1
7
Sorted Data -8 -9 -9 -8 -2 -8 -8 -4 -2
-6 -7 -2 2 -5 -6 -1 5 1 -4 5 6 3 1
7
//Column by column sorting using STL
Sort() mat1matori coutltltmat1ltltendl for(citerma
t1.c2cbegin()citer!mat1.c2cend()citer) sort
((citer).begin(), (citer).end()) coutltltmat1ltlt
endl
30
Samples continues...
  • Using Resize function

Sorted Data 6 -9 -8 -1 0 0 -8 -6 -9 -2
0 0 -2 -5 -6 -4 0 0 2 3 -4 -8 0 0
-2 1 -7 5 0 0 5 -8 1 7 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Original Data 6 -9 -8 -1 -8 -6 -9 -2 -2
-5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1
7
//Resizing the matrix mat1matori coutltltmat1ltltend
l mat1.resize(8, 8, 0) coutltltmat1ltltendl
31
Improvement
  • Morton array representation can be improved if we
    can utilize the wasted spaces for non-square
    matrices.
  • This can be achieved to some extent by using
    partial interleaved bit patterns.
  • Portion of bits are interleaved and remaining
    bits are left as it is. This helps in utilizing
    the wasted space.

32
Improvement continues
  • For example Let us consider matrix of size 20 x
    4 (actual reqd. space 80). Using Morton layout,
    it will require 1000001010 0000000101
    10000011115271 528 spacesWith modified
    version, it will require1001010 0000101
    1001111 791 80 spaces -gtImproved !!!

33
Improvement continues
  • More details 1000001010 -gt19 (row)
    0000000101 -gt 3 (col) 1000001111 -gt527
    (Morton location) 100001010 -gt 19 (row)
    000000101 -gt 3 (col) 100001111 -gt 79
    (Improved Morton)

Extra interleaving bits removed
34
Improvement continues
  • In the improved version, only N bits are
    interleaved where N is total no. of bits in the
    smallest of total row-1 and column-1 in row x
    column matrix.
  • For example, in 20x4 matrix, the smallest no. is
    4 and 4-13 which is 11 in binary, that is N2
    as 3 is represented by 2 bits 11.

35
Improvement continues
  • Interleaving N bits and leaving remaining bits.
    For example, for rows20-11910011 100 10 10
    -gt2 bits are interleavedN2 row interleaved
    bits.For columns4-1311000 01 01 -gt 2 bits
    are interleavedN2 column interleaved bits.

36
Improvement continues
  • Bit interleaved increment/decrement still works.
  • For bit interleaved Increment 001 1010 -gt Bit
    interleaved 7 (111)OR 000 0101 -gt Bit Mask 001
    1111 1 010 0000AND 111 1010 -gt Bit
    Mask (complement) 010 0000 -gt Bit interleaved 8
    (1000)

37
Improvement continues
  • For bit interleaved Decrement 010 0000 -gt Bit
    interleaved 8 (1000) - 1 001 1111AND 111
    1010 -gt Bit Mask 001 1010 -gt Bit interleaved
    7 (111)

38
Improvement continues
  • Improved array location is calculated by adding
    partial bit interleaved row and column. 100 10
    10 -gt 19 000 01 01 -gt 3 100 11 11 79
  • This method utilizes the wasted space to some
    extent but it does not work better than original
    Morton layout for square matrix which are not
    power of 2.

39
Improvement continues
  • Improvement for square matrices
  • Lets consider matrix NxN and say we want n bits
    to be interleaved. There is no change in the
    remaining bits of column bit patterns but for row
    bit patterns, remaining bits will have special
    bit patterns which are multiple of ?N/2n ?. So,
    separate lookuptables are required for row and
    column bit patterns.
  • Row bit and column bit patterns are added to get
    the modified storage location.

40
Improvement continues
  • For example, 17x17 matrix with n2 interleaved
    bits (actual 289 spaces reqd.)
  • Space required by normal Morton Layout will be
    1000000000 01000000001100000000 7681769
  • With Improved version, we have, ?17/22? 5Row
    Lookuptable Col Lookuptable0000 0000 0 0000
    00000000 0010 1 0000 00010000 1000 2 0000
    01000000 1010 3 0000 01010101 0000 4 0001
    00000101 0010 5... 0001 0001...

Changed by 5 101
41
Improvement continues
  • For 17x17 matrix,
  • 16 from row lookuptable will be,10100 0000
  • 16 from col lookuptable will be,00100 0000
  • Total space required will be, 10100 0000
    00100 0000 Improved!!! 11000 0000 -gt 384
    1385 spaces reqd.

42
Improvement continues
  • This technique used for the square matrix still
    leaves some extra space as shown in the example
    of 17x17 matrix. In some cases, it even works
    perfectly. However its an improvement over Morton
    layout for square matrices which are not power of
    2.

43
Improvement continues
  • Generalized improvement for both square and
    non-square matrices
  • Each row and column have respective partially
    interleaved bit patterns.
  • Either row or column whichever is greater, will
    have some non-interleaved and some special bit
    patterns.
  • Different lookup tables for rows and columns are
    required to implement.

44
Improvement continues
  • Lets consider matrix of RxC with n interleaved
    bits then r ?R/2n ? and c ?C/2n ?
  • If rgtc, row will have i regular non-interleaved
    bits and some special bit patterns of multiple of
    j, or vice versa.
  • If rgtcFor RowFor Column

n interleaved bits
45
Improvement continues
  • For rgtc, i ?abs(r - cx2i) is the least
    where i 1, 2, 3,...j ?MAX(r/2i, c)?
  • For cgtr,i ?abs(c - rx2i) is the least
    where i 1, 2, 3,...j ?MAX(r, c/2i)?

46
Improvement continues
  • For example, consider 70x13 matrix with n2
    interleaved bits (actually 910 spaces required).
    Space required by normal Morton Layout will
    be,10000000100010 00000001010000
    10000001110010830618307Here,R70, C13, r
    ?70/22 ? and c ?13/22 ? We have, rgtc,When i1,
    abs(r - cx21)10When i2, abs(r - cx22)2When
    i3, abs(r - cx23)14? i2 (only used by row in
    this case)? j ?MAX(r/22, c)?5

47
Improvement continues
  • Row Lookuptable Col Lookuptable00000 00
    0000 0 00000 00 000000000 00 0010 1 00000 00
    000100000 00 1000 2 00000 00 010000000 00
    1010 3 00000 00 010100000 01 0000 4 00001 00
    000000000 01 0010 5... 00001 00
    0001 00000 11 1010 15 00101 00
    0000... 16

Changed by 5 101
Only used by Rowbecause row gt col
48
Improvement continues
  • For 70x13 matrix,
  • 69 from row lookuptable will be,10100 01 0010
  • 12 from col lookuptable will be,00011 00 0000
  • Total space required will be, 10100 01 0010
    00011 00 0000 Improved!!! 10111 01 0010 -gt
    1490 11491 spaces

49
Recommendations
  • Devise more efficient algorithms to utilize the
    wasted spaces by Morton Array Layout.
  • If an optimal compromised algorithm is devised
    which works with both non-square and square
    matrices, then it could be new research paper or
    graduate research project.

50
Conclusion
  • Morton Array Layout and its variant to improve
    the wasted spaces by Morton Layout was
    implemented in C.
  • Improvements on Morton Layout such as improvement
    for non-square and square matrices was
    introduced.
  • But still optimal algorithm is to be researched.

51
Conclusion continues
  • C header file of Morton Array Layout matrix
    class can be downloaded and evaluated from
    http//www.sharad.info/cs691
  • For any defects or feedback regarding this header
    file, please email me at sharadb_at_bgnet.bgsu.edu

52
Any Questions ?
53
Thank You !
Write a Comment
User Comments (0)
About PowerShow.com