Ola Olsson
home   publications  

Tiled Shading (2011)

::download pdf:: ::bibtex:: ::publisher page::

Update - 2012-05-25

Our new paper Clustered Deferred and Forward Shading, to appear at HPG 2012, is now available as a pre-print. The techniqes presented extend tiled shading using higher dimensional tiles, called clusters. This is shown to improve performance in prescence of discontinuities, notably speeding up worst-case performance.

Update - 2012-04-12

The demo has been updated with separate handling of non-alpha tested geometry, as discard had a very unfavourable impact on tiled forward performance. Also implemented is a depth min-max reduction, which is done through a single pass shader, with the resulting depth range buffer read back to the CPU for grid construction. It does not scale horribly well with increasing MSAA level, but works. MSAA can now be changed at run-time, as can pre-z pass and depth range for both algorithms (Check out F1). The new version of the code is somewhat more complex, and therefore the original demo is still available (below), as it may be easier to understand.

The shots below shows Tiled Forward in action. In the left shot, the depth range optimization and Pre-Z pass is turned off, and to the right both are on. Frame rate jumps from 30 to 85 using a GTX 480. Clearly visible, at least in the high-res versions, is the lower numbers of lights per tile.
Tiled Shading Tiled Shading

In the paper, Tiled Forward Shading is shown to perform poorly, especially on the GTX 280 which was used for the majority of the results. Recent interest in Tiled Forward, apparently sparked by the AMD leo demo, has brought out additional demo implementations (demo 1 and demo 2). These all report pretty good performance for Tiled Forward Shading, and because of this I re-ran the tests used in the paper. The new performance graph is shown below. Note that this is on a GTX 480, and with a much later version of CUDA and drivers, so differences to the published graph are to be expected. The graph shows only the time to compute shading, which is a full screen pass for tiled deferred, a lot of light spheres for deferred, and a scene rendering pass for tiled forward (excluding, for example, G-Buffer pass, grid building and Pre-z pass).

New Tiled Forward Shading Performance Results

Interestingly, all of the published algorithms perform better, but the overall relationship remains the same. This means that TiledForward still is less efficient, as it ought to perform the same number of lighting computations as does tiled deferred, which is what we concluded in the paper. So, while the conclusions in the paper appear to be valid, in terms of real-world performance however, it has made an enormous difference, around 10x faster compared to the results for the GTX 280, making it appear a much more practical real-time alternative, given the different trade-offs.

Click on the graph for a higher resolution version.
Tiled Shading Performance

Download demo with source! Below to the left is a screen shot from the demo, in the middle also showing the tiles. Tiles with a higher intensity contain more lights. The demo does not use the depth buffer to cull lights, which is why the density is relatively uniform. The image to the right shows a shot from the benchmark implementation used in the paper. Notice the tiles with (geometry) discontinuities containing a larger number of lights.

Tiled Shading Tiled Shading Tiled Shading

Below is another set of screen shots from the demo, with a more distant view. The middle image displays all the light volumes with additive blending. And again, to the right, showing the tiles. This time it is clear where there are no lights. Also visible by comparing to the middle is the conservative nature of tiling, clearly pixels along the edges of the geometry are being shaded unnecessarily.

Tiled Shading Tiled Shading Tiled Shading

Tiled Shading (2011)

Ola Olsson and Ulf Assarsson

Journal of Graphics, GPU, and Game Tools


Abstract In this article we describe and investigate tiled shading. The tiled techniques, though simple, enable substantial improvements to both deferred and forward shading. Tiled Shading has been previously discussed only in terms of deferred shading (tiled deferred shading). We contribute a more detailed description of the technique, introduce tiled forward shading (a generalization of tiled deferred shading to also apply to forward shading), and a thorough performance evaluation. Tiled Forward Shading has many of the advantages of deferred shading, for example, scene management and light management are decoupled. At the same time, unlike traditional deferred and tiled deferred shading, full screen antialiasing and transparency are trivially supported. We also present a thorough comparison of the performance of tiled deferred, tiled forward, and traditional deferred shading. Our evaluation shows that tiled deferred shading has the least variable worst-case performance, and scales the best with faster GPUs. Tiled deferred shading is especially suitable when there are many light sources. Tiled forward shading is shown to be competitive for scenes with fewer lights, and is much simpler than traditional forward shading techniques. Tiled shading also enables simple transitioning between deferred and forward shading. We demonstrate how this can be used to handle transparent geometry, frequently a problem when using deferred shading. Demo source code is available online at the address provided at the end of this paper.

::read more:: ::download pdf:: ::bibtex:: ::publisher page::

Number of page visits: 8644