Looks like GTX Titan's specs are now out, so I can say a few things. Below I'm pulling numbers from Wikipedia without checking if they are valid or not. Looking at the evolution of the big NVIDIA GPUs from Tesla to Kepler (big chips defined as somewhere near 250 W and 500 mm^2) and skipping the 55 nm die shrink of Tesla,
Kepler : 4500 Gflop/s : 288 Gb/s : 187 Gtex/s : 40 Grop/s : 28 nm : 2013 Q1
Fermi : 1581 Gflop/s : 192 Gb/s : 49 Gtex/s : 24 Grop/s : 40 nm : 2010 Q4
Tesla : 933 Gflop/s : 141 Gb/s : 48 Gtex/s : 19 Grop/s : 65 nm : 2008 Q2
And the middle NVIDIA GPUs (middle chips as around 170 W and 300 mm^2),
Kepler : 2460 Gflop/s : 192 Gb/s : 102 Gtex/s : 29 Grop/s : 28 nm : GTX 670
Fermi : 1263 Gflop/s : 128 Gb/s : 52 Gtex/s : 13 Grop/s : 40 nm : GTX 560ti
Tesla : 804 Gflop/s : 111 Gb/s : 41 Gtex/s : 16 Grop/s : 65 nm : GTX 260
Should be obvious from looking at the difference between the middle and big chips between Fermi and Kepler, that GTX Titan is a big chip which still has a graphics soul. The big Fermi on the other hand is texel challenged compared to it's middle brother the GTX 560ti. Titan provides the raw performance for massive frame rates, massive super-sampling, or massive resolution. Assuming Wikipedia numbers are correct, those who are nuts can probably do 3 way SLI and push 560 Gtex/s which is 70 times the texel rate of the Xbox 360.
For those who want the best possible perf/pixel ratio for image quality on a single GPU, this might be as good as it gets for a while if the 4K PC display gold rush starts. A GTX Titan at the current typical 1080p provides a massive 90 texels/ms/pixel (an iPad 4 in comparison has around 0.6 texels/ms/pixel). Taking the current Tesla to Kepler trend of 2x the perf every 2 years, this trend would hint that if the next big GPU has to deal with 4x the number of pixels, then the effective perf/pixel could drop by 50%. As pixels-per-inch increases, those who invest in rendering engine tech which decouples shading rate from resolution and frame rate might have some big advantages.
On the Topic of 70x the Performance of the Xbox 360
It is too bad the arcade of my youth died out. Arcade as in the custom machines too expensive to bring home with the best graphics and audio, but still accessible by any kid with a quarter. The current version of those arcade machines would be a 3-way SLI GTX Titan. Something which brings the full movie CG visual quality at a locked 60 Hz 1080p low latency plasma display. Natively targeting this beast in the arcade form of direct hardware access is hard to describe. With 3-way SLI at 60Hz on the beast, the engine gets 48 ms/frame, which is enough for 4 thousand texture fetches per pixel per frame. The Kepler arch is awesome at MSAA and massive triangle counts, and an arcade engine built for the beast could do 8xSGSSAA all the time with sub-pixel sized geometry. The three 6GB framebuffers would be fed with high capacity SSD. Streaming tech for unique texturing would be custom to manage duplication of resources across the 3 GPUs. Latency of 3-way AFR would be kept in check by smart engine design. The engine would be built on mixed texture space shading and forward rendering. This maximizes view-independent computation and minimizes the view-dependent computation, so that camera position and dynamic object updates can be deferred as late as possible in the rendering of the frame (reducing latency as much as possible). The engine would pull the view-dependent data (like current camera position) from pinned CPU memory right before the last bit of rendering (forward rendering of pre-shaded textures). Content creation would be different. Artists would just target near film geometry budgets with physically based shading models. The old arcade model is perfect for content because the gamer's experience is so short, no need to fill out hours of content. Production is more like that of a movie trailer instead of the full film.
Kepler : 4500 Gflop/s : 288 Gb/s : 187 Gtex/s : 40 Grop/s : 28 nm : 2013 Q1
Fermi : 1581 Gflop/s : 192 Gb/s : 49 Gtex/s : 24 Grop/s : 40 nm : 2010 Q4
Tesla : 933 Gflop/s : 141 Gb/s : 48 Gtex/s : 19 Grop/s : 65 nm : 2008 Q2
And the middle NVIDIA GPUs (middle chips as around 170 W and 300 mm^2),
Kepler : 2460 Gflop/s : 192 Gb/s : 102 Gtex/s : 29 Grop/s : 28 nm : GTX 670
Fermi : 1263 Gflop/s : 128 Gb/s : 52 Gtex/s : 13 Grop/s : 40 nm : GTX 560ti
Tesla : 804 Gflop/s : 111 Gb/s : 41 Gtex/s : 16 Grop/s : 65 nm : GTX 260
Should be obvious from looking at the difference between the middle and big chips between Fermi and Kepler, that GTX Titan is a big chip which still has a graphics soul. The big Fermi on the other hand is texel challenged compared to it's middle brother the GTX 560ti. Titan provides the raw performance for massive frame rates, massive super-sampling, or massive resolution. Assuming Wikipedia numbers are correct, those who are nuts can probably do 3 way SLI and push 560 Gtex/s which is 70 times the texel rate of the Xbox 360.
For those who want the best possible perf/pixel ratio for image quality on a single GPU, this might be as good as it gets for a while if the 4K PC display gold rush starts. A GTX Titan at the current typical 1080p provides a massive 90 texels/ms/pixel (an iPad 4 in comparison has around 0.6 texels/ms/pixel). Taking the current Tesla to Kepler trend of 2x the perf every 2 years, this trend would hint that if the next big GPU has to deal with 4x the number of pixels, then the effective perf/pixel could drop by 50%. As pixels-per-inch increases, those who invest in rendering engine tech which decouples shading rate from resolution and frame rate might have some big advantages.
On the Topic of 70x the Performance of the Xbox 360
It is too bad the arcade of my youth died out. Arcade as in the custom machines too expensive to bring home with the best graphics and audio, but still accessible by any kid with a quarter. The current version of those arcade machines would be a 3-way SLI GTX Titan. Something which brings the full movie CG visual quality at a locked 60 Hz 1080p low latency plasma display. Natively targeting this beast in the arcade form of direct hardware access is hard to describe. With 3-way SLI at 60Hz on the beast, the engine gets 48 ms/frame, which is enough for 4 thousand texture fetches per pixel per frame. The Kepler arch is awesome at MSAA and massive triangle counts, and an arcade engine built for the beast could do 8xSGSSAA all the time with sub-pixel sized geometry. The three 6GB framebuffers would be fed with high capacity SSD. Streaming tech for unique texturing would be custom to manage duplication of resources across the 3 GPUs. Latency of 3-way AFR would be kept in check by smart engine design. The engine would be built on mixed texture space shading and forward rendering. This maximizes view-independent computation and minimizes the view-dependent computation, so that camera position and dynamic object updates can be deferred as late as possible in the rendering of the frame (reducing latency as much as possible). The engine would pull the view-dependent data (like current camera position) from pinned CPU memory right before the last bit of rendering (forward rendering of pre-shaded textures). Content creation would be different. Artists would just target near film geometry budgets with physically based shading models. The old arcade model is perfect for content because the gamer's experience is so short, no need to fill out hours of content. Production is more like that of a movie trailer instead of the full film.