Notes on AMD GCN ISA

From what I can tell from the AMD GCN ISA doc,

Predication
All vector operations are predicated by the EXEC bitmask. The EXEC bitmask provides 64-bits for the 64 lanes of a vector. The EXEC bitmask is scalar registers 126 and 127 (a pair of 32-bit registers). There is a VCC or vector condition code bitmask in scalar registers 106 and 107. The VCC bitmask is set by vector compare instructions. There are special registers which contain a 1-bit flag if the EXEC or VCC bitmask is zero. Finally there is a SCC or scalar condition code bit.

Predicates (bools) for the vector are stored in pairs of scalar registers. Predication of a sequence of instructions (using if() in shader code) involves saving the current EXEC bitmask into a pair of scalar registers, then loading the EXEC bitmask with the predicate (this is done with one single instruction: S_{op}_SAVEEXEC_B64), then later restoring the saved EXEC (S_MOV_B64).

At full 40 wavefront occupancy (and going with the published 8KB scalar register file number), each wavefront gets 51 scalar registers (and 25 vector registers). The peak around 100 registers/wavefront can be found easily with around 50% occupancy (and 52 vector registers/wavefront).

Vector Instructions
Either 32-bit or 64-bit total opcode length. The 32-bit form supports,

vdst = OP(vsrc0, vsrc1); // Vector source.
vdst = OP(ssrc0, vsrc1); // Common scalar source (typically a constant).
vdst = OP(-16 to 64, vsrc1); // Free special integer literal.
vdst = OP(0.0 or +/-{0.5,1.0,2.0,4.0}, vsrc1); // Free special floating point literal.
vdst = OP(imm32, vsrc1); // A 32-bit immediate which follows the instruction.
vdst = OP(LDS[M0], vsrc1); // Broadcast shared LDS value, M0 is special scalar register.

The 64-bit form supports mix of any of the following (note there is no free saturate to {0.0 to 1.0}),

vdst = OMOD * OP(); // Free multiplier on result, OMOD = {0.5,1.0,2.0,4.0}.
vdst = clamp(OP(),-1.0,1.0); // Free clamp on result.
vdst = OP(neg(src0),abs(src1),neg(abs(src2)))); // Free negate or absolute value on any input.
vdst = OP(src0, 0.0 or +/-{0.5,1.0,2.0,4.0}, src1); // Free special floating point literal on any input.
vdst = OP(src0, -16 to 64, src1); // Free special integer literal on any input.
vdst = OP(vsrc0, ssrc0, ssrc0); // Only one scalar register can be used in any of the three inputs.
vdst = OP(src0, LDS[M0], src2); // One broadcast shared LDS value for any of the three inputs.

The V_MOVRELS and V_MOVRELD instructions enable indexed register file access, but note M0 is a scalar register so M0 is non-divergent across the entire wavefront. Anything divergent requires extra manual predication logic.

vdst = vsrc[MO]; // Indexed register load.
vdst[MO] = vsrc; // Indexed register store.

Changing Materials the GCN Way
DX and GL are years behind in API design compared to what is possible on GCN. For instance there is no need for the CPU to do any binding for a traditional material system with unique shaders/textures/samplers/buffers associated with geometry. Going to the metal on GCN, it would be trivial to pass a 32-bit index from the vertex shader to the pixel shader, then use the 32-bit index and S_BUFFER_LOAD_DWORDX16 to get constants, samplers, textures, buffers, and shaders associated with the material. Do a S_SETPC to branch to the proper shader.

This kind of system would use a set of uber-shaders grouped by material shader register usage. So all objects sharing a given uber-shader can get drawn in say one draw call (draw as many views as you have GPU perf for). No need for traditional vertex attributes either, again just one S_SETPC branch in the vertex shader and manually fetch what is needed from buffers or textures...

Notes on AMD GCN ISA

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112