Update - 2012-05-25
Our new paper Clustered Deferred and Forward Shading,
to appear at HPG 2012, is now available as a pre-print. The techniqes presented extend tiled shading
using higher dimensional tiles, called clusters. This is shown to improve performance in prescence of discontinuities, notably speeding up worst-case performance.
Update - 2012-04-12
The demo has been updated with separate handling of non-alpha tested geometry,
as discard had a very unfavourable impact on tiled forward performance. Also implemented is a depth min-max reduction, which
is done through a single pass shader, with the resulting depth range buffer read back to the CPU for grid construction. It does
not scale horribly well with increasing MSAA level, but works. MSAA can now be changed at run-time, as can pre-z pass and
depth range for both algorithms (Check out F1). The new version of the code is somewhat more complex, and therefore the original demo
is still available (below), as it may be easier to understand.
The shots below shows Tiled Forward in action. In the left shot,
the depth range optimization and Pre-Z pass is turned off, and to the right both are on. Frame rate jumps from 30 to 85 using a
GTX 480. Clearly visible, at least in the high-res versions, is the lower numbers of lights per tile.
In the paper, Tiled Forward Shading is shown to perform poorly, especially on the GTX 280 which was used for the majority of
the results. Recent interest in Tiled Forward, apparently sparked by the AMD
has brought out additional demo implementations (demo 1
and demo 2). These all report pretty good performance for Tiled Forward Shading,
and because of this I re-ran the tests used in the paper. The new performance graph is shown below. Note that this is on a GTX 480, and with a much later version of
CUDA and drivers, so differences to the published graph are to be expected. The graph shows only the time to compute shading, which is a full screen pass
for tiled deferred, a lot of light spheres for deferred, and a scene rendering pass for tiled forward (excluding, for example, G-Buffer pass, grid building and Pre-z pass).
New Tiled Forward Shading Performance Results
Interestingly, all of the published algorithms perform better, but the overall relationship remains the same. This means that TiledForward still
is less efficient, as it ought to perform the same number of lighting computations as does tiled deferred,
which is what we concluded in the paper. So, while the conclusions in the paper appear to be valid, in terms of real-world performance however,
it has made an enormous difference, around 10x faster compared to the results for the GTX 280, making it appear a much more practical
real-time alternative, given the different trade-offs.
Click on the graph for a higher resolution version.
Download demo with source! Below to the left is a screen shot from the demo,
in the middle also showing the tiles. Tiles with a higher intensity contain more lights.
The demo does not use the depth buffer to cull lights, which is why the density is relatively uniform.
The image to the right shows a shot from the benchmark implementation used in the paper. Notice the tiles with
(geometry) discontinuities containing a larger number of lights.
Below is another set of screen shots from the demo, with a more distant view. The middle image displays all the light volumes with
additive blending. And again, to the right, showing the tiles. This time it is clear where there are no lights. Also visible
by comparing to the middle is the conservative nature of tiling, clearly pixels along the edges of the geometry are being