Quantcast
Channel: Timothy Lottes
Viewing all articles
Browse latest Browse all 434

Random Notes on Maxwell

$
0
0
Notes From GTX 980 Whitepaper | Maxwell Tuning Guide
GTX 980
16 geometry pipes
... One per SM
16 SMs
... 96KB shared memory per SM
... ?KB instruction cache per SM
... SMs divided into 4 quadrants
... Pair of quadrants sharing a TEX unit
... Each Quadrant
....... Issue 2 ops/clk per warp to different functional units
....... Supports up to 16 warps
....... Supports up to 8 workgroups
4 Memory Controllers
... 512KB per MC (2MB total)
... 16 ROPs per MC (64 total)

Only safe way to get L1 cached reads for read/write images is to run warp sized workgroups and work in global memory not shared by other workgroups. Hopefully an application can express this by typecasting to a read-only image before a read.

Using just shared memory, this GPU can run 64 parallel instances of a 24KB (data) computer without going out to L2.

(EDIT from Christophe's comments) This GPU has an insane untapped capacity for geometry: 16 pipes * more than 1GHz * 0.333 = maybe 5.3 million triangles per millisecond. Or enough for 2 single-pixel triangles per 1080p screen pixel per millisecond...

Viewing all articles
Browse latest Browse all 434

Trending Articles