Quantcast
Channel: Timothy Lottes
Viewing all articles
Browse latest Browse all 434

Location of "Filtering" in the Graphics Pipeline

$
0
0
This is a tangent related to the prior thread...

The hardware provides fast filtering from the texture unit, but only the worst-case filter: in terms of classic re-sampling filters, bilinear filtering has horrible quality. A proper filter would take many taps and would be phase adaptive: way too expensive for fixed-function texture fetch.

Second issue, filtering components before shading is fundamentally flawed. This practice exists also as an evolution of a practical trade-off to enable real-time graphics. The extent at which developers now attempt to workaround this problem can be quite amazing (for example CLEAN mapping).

An alternative to all of this is to shade at fixed positions in object space, then defer filtering into some final screen-space reconstruction pass. The object space positions get translated into world space then later view space, and sub-pixel precision view space coordinates get passed to the reconstruction pass. This provides one critical advantage: anti-aliasing quality becomes decoupled from sampling rate!

Traditionally, if one renders a scene with 8xMSAA using a high quality resolve filter with around a 1 pixel radius, that resolve filter is going to use somewhere around 16 to say 24 taps (or sample reads) per output pixel. Anything which does re-sizing during the resolve needs to compute filter weights per tap (phase adaptive). Output of the filter is a weighted average. If one uses temporal AA with jittered rendering, typically 9 taps are required for registration (to remove the jitter), with another 1 or many more taps required for re-projection (depending on filter quality). Ultimately an even more complex filter. If one completely avoids AA and attempts to make a frame look less ugly with MLAA/FXAA/etc, same situation, many filter taps, and a complex filter this time involving searching for edges.

The common element here in post process anti-aliasing is a complex filter which sources many samples and has become increasingly expensive over the years.

So why limit those samples to shaded points at fixed positions on the screen?

Effectively some amount of cost for a non-fixed position reconstruction filter is already taken by some AA filter. Some more cost is taken by buffering data across graphics pipeline stages. Even more cost is taken by the complexity introduced by filtering before shading. Reconstruction filter is going to do the same process of computing distance of sample to pixel center to compute sample weight, then take a weighted average for final output color. Except this time take into consideration the projected size of a sample. The giant leap required is to transform from shading at interpolated positions in screen space on a triangle, to taking shaded samples in object (or texture space, aka the texture shade cache talked about in prior posts), directly binning them into screen-space, then doing frame reconstruction from the bins. Or more specifically, bin the shaded tiles from the texture cache into screen space tiles, then for each screen space tile, build up a per pixel list of samples in local memory for reconstruction, then do reconstruction without going out to ram again. That is one possible method, there are others.

What prior might have been 8xMSAA with massive amounts of sub-pixel triangles wasting huge amounts of GPU performance, which still cannot produce a perfect anti-aliased scene, gets transformed into something where even less samples than pixels can produce a perfect anti-aliased image which has absolutely no artifacts in motion.

This is where I would expect the ultimate in engine design could hit on current hardware when API restrictions are removed. Enter Vulkan, make sure enough of the ISA instructions for wave-level programming are available in SPIR-V, and perhaps someone will realize this kind of design. Certainly a major challenge which would require some massive transforms from current engine design if anyone wanted to crawl to the destination. I expect the right way to enable such a move is to systematically bite off chunks of the problem, and prove out optimized shader kernels...

Viewing all articles
Browse latest Browse all 434

Trending Articles