Notes From GTX 980 Whitepaper | Maxwell Tuning Guide
GTX 980
16 geometry pipes
... One per SM
16 SMs
... 96KB shared memory per SM
... ?KB instruction cache per SM
... SMs divided into 4 quadrants
... Pair of quadrants sharing a TEX unit
... Each Quadrant
....... Issue 2 ops/clk per warp to different functional units
....... Supports up to 16 warps
....... Supports up to 8 workgroups
4 Memory Controllers
... 512KB per MC (2MB total)
... 16 ROPs per MC (64 total)
Only safe way to get L1 cached reads for read/write images is to run warp sized workgroups and work in global memory not shared by other workgroups. Hopefully an application can express this by typecasting to a read-only image before a read.
Using just shared memory, this GPU can run 64 parallel instances of a 24KB (data) computer without going out to L2.
(EDIT from Christophe's comments) This GPU has an insane untapped capacity for geometry: 16 pipes * more than 1GHz * 0.333 = maybe 5.3 million triangles per millisecond. Or enough for 2 single-pixel triangles per 1080p screen pixel per millisecond...
GTX 980
16 geometry pipes
... One per SM
16 SMs
... 96KB shared memory per SM
... ?KB instruction cache per SM
... SMs divided into 4 quadrants
... Pair of quadrants sharing a TEX unit
... Each Quadrant
....... Issue 2 ops/clk per warp to different functional units
....... Supports up to 16 warps
....... Supports up to 8 workgroups
4 Memory Controllers
... 512KB per MC (2MB total)
... 16 ROPs per MC (64 total)
Only safe way to get L1 cached reads for read/write images is to run warp sized workgroups and work in global memory not shared by other workgroups. Hopefully an application can express this by typecasting to a read-only image before a read.
Using just shared memory, this GPU can run 64 parallel instances of a 24KB (data) computer without going out to L2.
(EDIT from Christophe's comments) This GPU has an insane untapped capacity for geometry: 16 pipes * more than 1GHz * 0.333 = maybe 5.3 million triangles per millisecond. Or enough for 2 single-pixel triangles per 1080p screen pixel per millisecond...