ISA Toolbox

For years now I have found that nearly everything I work on can be made better by leveraging ISA features which are not always exposed in all the graphics APIs. For example, currently working on a project now which could use the combination of the following,

(1.) From AMD_shader_trinary_minmax, max3(). Direct access to max of three values in a single V_MAX3_F32 operation. If the GPU has 3 read ports on the register file for FMA, might at well take advantage of that for min/max/median. AMD's DX driver shader compiler automatically optimizes these cases, for example "min(x,min(y,z))" gets transformed to "min3(x,y,z)".

(2.) Direct exposure of V_SIN_F32 and V_COS_F32, which have a range of +/- 512 PI and take normalized input. Avoids and extra V_MUL_F32 and V_FRACT_F32 per operation. Nearly all the time I use sin() or cos() I'm in range (no need for V_FRACT_F32). Nearly all the time I'm in the {0 to 1} range for 360 degrees, and need to scale by 2 PI only so code generation can later scale back by 1/2 PI. Portable fallback for machines without V_SIN_F32 and V_COS_F32 like functionality looks like,

float sinNormalized(float x) { return sin(x * 2.0 * PI); } float cosNormalized(float x) { return cos(x * 2.0 * PI); }

(3.) Branching if any or all of the SIMD vector want to do something. Massively important tool to avoid divergence. For example in a full screen triangle, if any pixel needs the more complex path, just have the full SIMD vector only do the complex path instead of divergently processing both complex and simple. API can be quite simple,

bool anyInvocations(bool x) bool allInvocations(bool x)

Example of how these could map in GCN (these scalar instructions execute in parallel with vector instructions, so low cost),

// S_CMP_NEQ_U64 x,0 // S_CBRANCH_SCCNZ if(anyInvocations(x)) { } // S_CMP_EQ_U64 x,-1 // S_CBRANCH_SCCNZ if(allInvocations(x)) { }

(4.) Quad swizzle for fragment shaders for cross-invocation communication is super useful. Given a 2x2 fragment quad as follows,

01 23

These functions would be quite useful (they map to DS_SWIZZLE_B32 in GCN),

// Swap value horizontally. type quadSwizzle1032(type x) // Swap value vertically. type quadSwizzle2301(type x)

For example one could simultaneously write out the results of a fragment shader to the standard full screen pass and write out the 1/2 x 1/2 resolution next smaller mip level at the same time using an extra image store. Just use the following to do a 2x2 box filter in the shader,

boxFilterColor = quadSwizzle1032(color) + color; boxFilterColor += quadSwizzle2301(boxFilterColor);

ISA Toolbox

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112