Quantcast
Channel: Timothy Lottes
Viewing all articles
Browse latest Browse all 434

GCN and Wavefront Occupancy

$
0
0
References: GCN Architecture Whitepaper | Sea Islands ISA

Review
A Compute Unit (CU) is partitioned into four separate SIMD units.
Each SIMD unit has the capacity of 1 to 10 wavefronts.
Once launched, wavefronts do not migrate across SIMD units.
CU can decode and issue 5 instructions/clk for one SIMD unit.
It takes 4 clocks to issue across all four SIMD units.
The 5 instructions need to come from different wavefronts.
The 5 instructions need to be of different types.
Each SIMD unit has 512 scalar registers.
Each SIMD unit has 256 vector registers.
Each CU has 64KB of LDS space.

Occupancy Table
Waves = Wavefronts per SIMD unit (4 SIMD units/CU)
Scalar = Maximum number of scalar registers/wavefront
Vector = Maximum number of 64-wide vector registers/wavefront
LdsW/I = Maximum amount of LDS space per vector lane per wavefront in 32-bit words
Issue = Maximum number of instructions which can issue per clock

Waves Scalar Vector LdsW/I Issue
1____ 128___ 256___ 64____ 1
2____ 128___ 128___ 32____ 2
3____ 128___ 85____ 21____ 3
4____ 128___ 64____ 16____ 4
5____ 102___ 51____ 12____ 5
6____ 85____ 42____ 10____ 5
7____ 73____ 36____ 9_____ 5
8____ 64____ 32____ 8_____ 5
9____ 56____ 28____ 7_____ 5
10___ 51____ 25____ 6_____ 5


Notes
Shaders can easily get issue limited when the number of wavefronts becomes small. Without at least 3 wavefronts per SIMD unit, the device cannot tri-issue a {vector, scalar, and memory} group of operations. This will be further limited by the number of wavefronts which are blocked during execution (say because they are waiting on memory). Abusing instruction level parallelism and increasing register pressure at the expense of occupancy can result in low ALU utilization.

Viewing all articles
Browse latest Browse all 434

Trending Articles