# A data locality optimizing algorithm

From AcaWiki

**Citation:** *Micheal E. Wolf, Monica S. Lam (1991/06/26) A data locality optimizing algorithm. ACM SIGPLAN Conference on Programming Language Design and Implementation (RSS)*

**DOI (original publisher):** 10.1145/113445.113449

**Semantic Scholar (metadata)**: 10.1145/113445.113449

**Sci-Hub (fulltext)**: 10.1145/113445.113449

**Internet Archive Scholar (fulltext)**: A data locality optimizing algorithm

**Download:** https://dl.acm.org/doi/pdf/10.1145/113445.113449

**Tagged:** Computer Science
(RSS) compilers (RSS)

# Summary (Abstract)

This paper introduces the unimodular framework to maximize data-locality in nested loops.

## Elsewhere

## Problem

- Many loop-optimizations. Which to apply?
- How to determine
*legality*and*advantage*of optimizations? - Greedy solution won't work. One might need to do optimization loop-skewing (no benefit) in order to do loop-tiling (big benefit).
- Exhaustive solution is too slow. Search-space could be infinite.

- How to determine

## Background

*Advantage*in this paper, comes from exploiting temporal and spatial locality in the cache. If the processor can reuse data from the cache, it doesn't have to go back/forth to main memory.**Self-temporal reuse**: The same element is reused by this reference. E.g. Reference 1 in the example below; iteration (i, j) and iteration (i, j+1) use the same A[i].**Self-spatial reuse**: Nearby elements are used by this reference. Superset of self-temporal reuse. E.g. Reference 2: iteration (i, j) uses A[j], while the A[j+1] iteration uses A[j+1], which is nearby.**Group-temporal reuse**: Same element is reused, by other references. E.g. Reference 2 and 3: iteration (i, j) uses A[j+1] by reference 2, and iteration (i, j+1) uses A[j+1] by reference 3.**Group-spatial reuse**: Nearby elements are reused by other references. Superset of group-spatial reuse. E.g. Reference 3 and 4. Iteration (i, j) uses A[j+2] by reference 4, while iteration (i, j+1) uses A[j+1] by reference 3, which is nearby.

for i from 0 to max_i: for j from 0 to max_j: A[i] /*1*/ += A[j] /*2*/ + A[j+1] /*3*/ + A[j+2] /*4*/

- This paper only considers three kinds of loop transformations
**Tiling**: one loop into a pair of loops, where the size of the inner-loop is statically fixed.- Helps if the inner-loop accesses fit in the cache.

for i from 0 to max_i: f(i) for i from 0 to max_i by stride: # stride is fixed at compile-time # so this could be unrolled. for j from i to i + stride: f(j)

**Skewing**: Turn a horizontal-then-vertical iteration into a diagonal-then-diagonal iteration.- Iterating the same nodes in a different order can help remove a loop-dependency. However, reordering is not always valid; iterations must still be evaluated after their dependents.

for i from 0 to max_i: for j from 0 to max_j: # go left-to-right, then top-to-bottom f(i, j) for sum_ij from 0 to max_i + max_j: for dif_ij from ... to ...: # go southwest-to-northeast, then northwest-to-southest f(sum_ij - dif_ij, sum_ij + dif_ij)

**Loop interchange**: Switch the order of two nested loops.- This can help exploit locality.

for i from 0 to max_i: for j from 0 to max_j: # if A is row-major, interchanging these two loops leads to greater spatial locality. A[j, i] += 1

for j from 0 to max_j: for i from 0 to max_i: A[j, i] += 1

## Solution

**Node**: A single iteration of the loop, represented as a vector (specifies values for each loop-counter).**Dependence vector**: Node*a + v*depends on node*a*.**Lexicographic order**:**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle (0,0,0) < (0,0,1) < (0,0,2) < \dots (0, 0, n-1) < (0,1,0) < (0,1,1) < (0,1,2) < \dots (0, m-1, n-1) < (1, 0, 0) < \dots < (k-1, m-1, n-1)}**- We assume that nodes are iterated in lexicographic order.
- Dependence vectors must be
**lexicographically greater**than the zero-vector (aka**lexicographically positive**). Otherwise*a + v*is evaluated before its dependency*a*is evaluated. For example, (i,j) can depend on (i-1,j+1) (previous row, next column), which has already been evaluated, but not (i+1,j-1), (next row, previous column).

**Loop transformation**: A matrix that maps nodes in the old loop to nodes in the new loop,**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \vec{v} \mapsto A\vec{v}}**. It must be**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle n \times n}**where*n*is the number of loops, have integer entries (because the input and output must be integers), and have a determinant of**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \pm 1}**(so it doesn't*stretch*the coordinates to skip numbers numbers, like**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \begin{pmatrix}2 & 0 \\ 0 & 2\\ \end{pmatrix}}**would). This class of matrices is called*unimodular*.- A loop transformation is
**legal**if all dependency vectors remain*lexicographically positive*after transformation.- For the special case of loop-interchange, two loops
*i*and*j*are**interchangeable**(aka*permutable*) if all dependence vectors are lexicographically positive, before reaching coordinate**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \min(i, j)}**. E.g. (0, 0, 1, 0, -1, 1, 0). Everything after the third loop is permutable, since the third coordinate makes the whole thing lexicographically positive.

- For the special case of loop-interchange, two loops
**Uniformly generated references**: a reference to array*A*is generated by matrix*H*if it can be written as**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \mathrm{A}[H\vec{i} + \vec{c}]}**, where*i*are the coordinates of the node and*c*is constant.- The authors If two references are generated by
*different*H-matrices, they might intersect, but it is rare. The authors are primarily concerned with reuse between references with the same H, aka**uniformly generated reference sets (UGRS)**

- The authors If two references are generated by
- Self-temporal reuse exists in a UGRS if there are two iterations
**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \vec{i}, \vec{i_2}}**such that**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle H(\vec{i} - \vec{i_2}) = 0}**. Then**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \vec{r} = \vec{i} - \vec{i_2}}**is the**reuse vector**(like the dependence vector). This is exactly**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \ker H}**. - Self-spatial reuse is harder. The authors say that temporal reuse has to exist in the
*inner-most loop*, because by the time the outer loops recur, the cache has been blasted by the inner-loop's accesses. So they define**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle H_S}**as*H*with the last row zeroed out. Self-spatial reuse is the**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \ker H_s}**, asking if there are any accesses to the same row (ignoring the column). - Group-temporal reuse is like self-temporal, except we are considering two references with different
**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \vec{c},\vec{c_2}}**vectors. Reuse exists if there are infinite solutions to**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle H\vec{r} = \vec{c} - \vec{c_2}}**(for self-temporal, these are equal).*H*is invertible, so there is certainly one solution; There are infinite solutions if**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle (\mathrm{span} \vec{r} + \ker H) \cap L \neq \ker H \cap L}**where*L*is the vector-space of all nodes. - Group-spatial reuse is the same as group-temporal but with
**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle H_s}**. - The authors find a formula to count the amount of reuse they detected with the previous methods.

- A loop transformation is

### Algorithm

- Separate the loop-nest into inner-loops which are all mutually interchangeable and outer-loops which are not.
- Some loops are not possible (I'm not sure why).

- Place the inner loops in an order that maximizes reuse, which can be counted according to a formula.
- This is grows with the number of loops, but the number of loops is generally small in practice.

## Evaluation

- The authors implement their algorithm in the Stanford SUIF compiler.
- The authors show speedup on LU decomposition and successive over-relaxation.