SSAO ship

Introduction

In tutorial 4, we calculate the indirect illumination (from the environment) with two terms. One of the terms, the diffuse illumination, is the irradiance (energy per area) that comes from the whole hemisphere around the surface normal. We then assume that all light from the hemisphere actually reaches the fragment, but there may be geometry that occludes (blocks) the sky (see the image below). In this project, we will compute the hemispherical occlusion by looking at nearby depth values in the z-buffer, a method called Screen Space Ambient Occlusion (SSAO). The result is visualized in the image above.

SSAO Principle

In other words, we want to estimate how much of the hemisphere is occluded for each pixel, and then we attenuate the diffuse indirect illumination by that. Traditional Ambient Occlusion (AO), where you shoot a bunch of rays to estimate the occlusion, has been a popular way of "faking" full global illumination calculations for a very long time, and was even used in the movie industry until quite recently. In video games, shooting a large number of rays per pixel is infeasible, but in 2007 the SSAO technique was developed and first used in a game ("Crysis").

Crysis SSAO Hemispherical SSAO

The idea was to precompute a number (e.g., 16) of uniformly distributed directions \(s_i\) in a sphere. Then, for each pixel at position \(p\) we want to know how many of those directions are blocked from the light (if 50% are blocked, we will attenuate indirect illumination by 50%).

To do this, the people at Crytek tried a horrendous hack: They simply calculate \(p + s_i\) and project that point onto the image plane. If whatever is on the depth buffer at the corresponding pixel is closer than the point, the direction is considered blocked (see the top image to the right). This turned out to be extremely cheap and look surprisingly good, and today there are hardly any games that ship without some sort of SSAO technique.

There are a million different extensions and variations of this algorithm and, if you find a version you would rather try, ask an assistant if that's allright. Otherwise, we're going to do it the Crysis way, with two important improvements.

First, with the original algorithm, if \(p\) lies on a plane, 50% of the samples will always be covered. Therefore we will instead generate samples on a hemisphere and transform them using the pixel's normal (see the second image to the right).

Secondly, the depth map does not really contain enough information for what we want to test. Consider the third image on the right: three of the points will be considered "blocked" because they lie behind the depth map, but the corresponding objects in the depth map are actually far away, and might not block \(p\) at all. Therefore, once we have the depth of a "blocking" point, we decide whether the it lies within the hemisphere and, if it doesn't, we ignore that point.

General overview

You may now go ahead and implement this however you choose to, but in this section we will suggest an overview of the steps you could take to get it working, along with some generally useful code snippets.

Render the depth and normal to a framebuffer. Just render the scene once to a separate framebuffer using a different shader program which simply writes the view-space normal as the color (we have ssaoInput shader files for that). The depth will be stored in the attached depth-buffer.
Compute the SSAO in a post-processing pass. This is the tricky part. You will render a fullscreen pass, and for every fragment you can fetch the depth and normal from the previous pass. The SSAO result is stored in a new texture. Some tips and details are given below.
Render the scene with shading as usual, but scale the diffuse indirect term with the SSAO term. The output from the previous SSAO pass is passed in as a texture to your standard shader, and you simply read the SSAO value from that, for each fragment.

Note: since the framebuffers hold 16 bit float values (you can check that in fbo.cpp), you can just store the normal as floating point numbers and they will retain all the information we need (which wouldn't be the case if they were 8 bit unsigned integers, as many textures loaded from a file are).

Implementation details

We will compute the hemispherical occlusion in a postprocess pass, saving the output to a dedicated framebuffer. For each pixel in the SSAO pass, we first determine the view-space position and view-space normal. The normal can be directly fetched from the pre-pass framebuffer texture, and the position can be computed from the depth buffer:

float fragmentDepth = texture(depthTexture, texCoord).r;

// Normalized Device Coordinates (clip space)
vec4 ndc = vec4(texCoord.x * 2.0 - 1.0, texCoord.y * 2.0 - 1.0, 
                fragmentDepth * 2.0 - 1.0, 1.0);

// Transform to view space
vec3 vs_pos = homogenize(inverseProjectionMatrix * ndc);

With the helper function:

vec3 homogenize(vec4 v) { return vec3((1.0 / v.w) * v); }

From this position we want to sample the depth buffer in a surrounding volume to see if there are occlusions. We start by establishing base vectors for the hemisphere in view space. The normal can be used for one of the base vectors, and the other two lie in the tangent plane (orthogonal to the normal) and we can compute a tangent and bi-tangent:

vec3 vs_tangent = perpendicular(vs_normal);
vec3 vs_bitangent = cross(vs_normal, vs_tangent);

With the helper function:

// Computes one vector in the plane perpendicular to v
vec3 perpendicular(vec3 v)
{
    vec3 av = abs(v); 
    if (av.x < av.y)
        if (av.x < av.z) return vec3(0.0f, -v.z, v.y);
        else return vec3(-v.y, v.x, 0.0f);
    else
        if (av.y < av.z) return vec3(-v.z, 0.0f, v.x);
        else return vec3(-v.y, v.x, 0.0f);
}

And the local base can consist of the tangent, bitangent and the normal:

    mat3 tbn = mat3(vs_tangent, vs_bitangent, vs_normal); // local base

Now, we will test \(N\) sample offsets in the depth buffer in a for loop:

int num_visible_samples = 0; 
int num_valid_samples = 0; 
for (int i = 0; i < nof_samples; i++) {
    // Project an hemishere sample onto the local base
    vec3 s = tbn * sampleHemisphere();

    // compute view-space position of sample
    vec3 vs_sample_position = vs_pos + s * hemisphere_radius;

    // compute the ndc-coords of the sample
    vec3 sample_coords_ndc = homogenize(projectionMatrix * vec4(vs_sample_position, 1.0));

    // Sample the depth-buffer at a texture coord based on the ndc-coord of the sample
    float blocker_depth = texture(...)

    // Find the view-space coord of the blocker
    vec3 vs_blocker_pos = homogenize(inverseProjectionMatrix * 
            vec4(sample_coords.xy, blockerDepth * 2.0 - 1.0, 1.0));    

    // If the blocker is futher away than the sample position, then
    // the sample is valid and visible

    // Otherwise, if the sample is inside the hemisphere
    // i.e. the distance from vs_pos to the blocker is smaller than the hemisphere_radius
    // then the sample is valid, but occluded
}

float visibility = 1.0;

if (num_valid_samples > 0)
{
    hemisphericalVisibility = float(num_visible_samples) / float(num_valid_samples);;
}

Note: After projecting and homogenizing a view-sample point, you will have coordinates in the \([(-1,-1,-1),(1,1,1)]\) range. To sample a texture, you need to convert these to the \([(0,0,0),(1,1,1)]\) range.

To take the samples, we are going to need to implement a function that takes samples in a hemisphere along the Z around the origin. You can use the following function for that purpose. It starts by taking a sample of the hemisphere surface with a cosine distribution (i.e. points closer to the pole get a higher density of samples than points closer to the horizon). Then, it applies a random scaling to that, to get points in a random position in the volume, not only on the surface.

You are welcome to try different random distributions for this. If you look around, people use different functions to select these points.

Note that this function receives an index. This is because the randf function doesn't have state, so we need to give it extra information to receive different samples on each iteration of the loop.

const float M_PI = 3.1415926538;

vec3 sampleHemisphereVolumeCosine(float idx)
{
	vec3 r = randf(vec3(gl_FragCoord.xy, idx));	
	vec3 ret;
	r.x *= 2 * M_PI;
	r.y = sqrt(r.y);
	r.y = min(r.y, 0.99);
	r.z = max(0.1, r.z);

	ret.x = r.y * cos(r.x);
	ret.y = r.y * sin(r.x);
	ret.z = sqrt(max(0, 1 - dot(ret.xy, ret.xy)));
	return ret * r.z;
}

GLSL doesn't have any existing random function we can use to generate these samples, so we are going to use a hash function to get random values from the coordinates of the screen we are at together with an index to get more than one different random for the same coordinates. For that, you can use the following functions:

// PCG random generator for 3 16-bit unsigned ints
uvec3 pcg3d16(uvec3 v)
{
	v = v * 12829u + 47989u;

	v.x += v.y * v.z;
	v.y += v.z * v.x;
	v.z += v.x * v.y;

	v.x += v.y * v.z;
	v.y += v.z * v.x;
	v.z += v.x * v.y;

	v ^= v >> 16u;
	return v;
}

// Conversion function to move from floats to uints, and back
vec3 pcg3d16f(vec3 v)
{
	uvec3 uv = floatBitsToUint(v);
	uv ^= uv >> 16u; // Make the info be contained in the lower 16 bits

	uvec3 m = pcg3d16(uv);

	return vec3(m & 0xFFFF) / vec3(0xFFFF);

	// Construct a float with half-open range [0,1) using low 23 bits.
	// All zeroes yields 0.0, all ones yields the next smallest representable value below 1.0.
	// From https://stackoverflow.com/questions/4200224/random-noise-functions-for-glsl
	const uint ieeeMantissa = 0x007FFFFFu; // binary32 mantissa bitmask
    const uint ieeeOne      = 0x3F800000u; // 1.0 in IEEE binary32

	// Since the pcg3d16 function is only made to work for the lower 16 bits, we only use those
	// by shifting them to be the highest of the 23
	m <<= 7u;
    m &= ieeeMantissa;                     // Keep only mantissa bits (fractional part)
    m |= ieeeOne;                          // Add fractional part to 1.0

    vec3 f = uintBitsToFloat(m);           // Range [1:2]
    return f - 1.0;                        // Range [0:1]
}

#define randf pcg3d16f

There's lots of different random/hash functions that can be used for this purpose, but this one is usually fast and good enough for our purposes.

We suggest that you visualize only the result from the SSAO pass until it works and then use it to attenuate the shading. It might also be helpful to add some global variables (e.g., number of samples, sample radius and whether to use SSAO at all) and create GUI sliders for them.

Improving performance

Precomputing samples

This should produce good results already, but executing random number generators in the shader can be a bit slow, and we can get very good results with a bit of extra work.

We will precompute the sample points in the CPU. In the initialize() function, generate a list of random samples. For that, you can sample the unit hemisphere with the function labhelper::cosineSampleHemisphere(), and scale them with labhelper::randf() to be in the volume.

Then, you can send these to the shader with labhelper::setUniformSlow(ssaoOutputProgram, "samples", number_of_samples, samples), and receive them in the shader with uniform vec3 samples[<max_number_of_samples>].

Finally, in each iteration of the loop, instead of calling sampleHemisphere(), take the nth value from this uniform array.

Removing aliasing

Since we reuse these samples for every pixel, there will be a lot of aliasing in the final result. To get rid of this, a common trick is to rotate the samples around the normal with a random rotation for each pixel.

The basic idea here is to take a random value theta between \(0\) and \(2\pi\) once per fragment, create a rotation matrix around the Z axis, and apply that to the sample point before doing the tbn transform.

Since this is only done once per fragment, you can use the randf function we defined earlier, just use one of the 3 coordinates it returns.

Another option to get the random rotation value is to create a texture with random values and sample a different point of it for each pixel. This texture can be fairly small (e.g. 64x64 px) with GL_REPEAT to get a value on every pixel of the screen. You also could instead sample it with texelFetch(rndTex, ivec2(gl_FragCoord.xy) % 64, 0).r, which will sample each pixel corresponding to the screen position, and will take care of the repeating texture.

Optional

When you are done, you should have an SSAO solution that gives nice looking results if you take many samples per pixel.

In reality, you can really only afford a few samples per pixel per frame, so common practice is to generate noisy results with a few samples and then do a bilateral blur in the SSAO output before using it. Give that a try if you have time.

Screen Space Ambient Occlusion

Implement a modern global illumination approximation that is used in almost all video games today