Many Shadows System
Introduction
When our group where settings some metrics for the engine there was always a common topic, realism. This suggests that the lighting should be as accurate to real life as possible. A general theme in real life is that every light source casts shadows, therefore every light source in our engine should cast shadows. The problem with that is that dynamic shadows are expensive to render in real-time. The solution to this came in the form of something called shadow mapping, which is a tried and tested method for rendering shadows that isn't difficult to implement.
The technique works in the following steps:
-
Place a camera on a light source with a correct view matrix.
-
The camera will then run the vertex shader to construct the geometry and render it to a depth texture or in other words, the shadow map.
-
When we are running the pixel shader for the light calculations, we need to translate the shadow map pixel to our light space.
-
Then check if the translated pixel is in the shadow and if that's the case, multiply the color value in the pixel shader with a shadow value to darken the pixel.
You can read more about shadow mapping here. Where it gets tricky is when you need to calculate many lights and their shadows. Because you need to execute these steps for every shadow map in the engine. This means running the vertex shader to construct the geometry for each model and then writing it to a depth texture.
The Solution
There are many different approaches to simulating shadows efficiently, but almost all of them include the use of static shadow maps or offline rendering. Whilst this is probably the best solution, I did not have the time to revise how the lights were implemented to support static shadows. Therefore, I implemented multiple smaller optimizations to render dynamic shadows for every light.
Threaded Culling System
The most common optimization is to only render what the shadow cameras see. This means running some form of simple distance culling to see if the model is relevant for the shadow camera. This is a very trivial thing to do, but there are many things we can
improve on. One big thing is to multi-thread it. Keep in mind while threading that you can't touch the same data without some form of thread safety. However, this is not an issue in this case. Because each shadow camera can hold a list of relevant models for that shadow camera. This allows us to dedicate a thread to only cull the models relevant for that camera. This is what makes it thread-safe, a thread performing a job with localized data.
Step by step:
-
Cull all the lights for the frame.
-
Iterate over all the light types and access the shadow camera for that specific light.
-
Queue a job with the help of a thread pool to run a function for the shadow camera's culling.
-
Wait for all of the jobs to be complete and then run the vertex shader for each shadow camera.
This is a good system for culling shadow casters async, but we can do more. See, the GPU is idling when we are executing all of the queued culling jobs and that's performance laying on the table. One thought is to render multi-threaded but DirectX 11 does not naturally support multi-threaded rendering. We can batch render the shadow cameras instead. While this won't be as efficient as using the deferred context, it should somewhat speed up the process.
How it's done:
-
Run the culling jobs for the spotlights
-
Run the culling jobs for the pointlights
-
Wait for the spotlight jobs to finish, then render each shadow camera
-
Run the culling jobs for the arealights.
-
Wait for the pointlight jobs to finish, then render each shadow camera
-
Wait for the arealight jobs to finish, then render each shadow camera
You can see a flowchart of this in the image below.
Shadow Map LODs
When the pixel shader is run for any of the light types and it comes to the shadow calculations for the light, the pixel shader then needs to sample every pixel from the depth texture. This means that the higher resolution of the shadow map, the longer it will take to sample from it. In my case, I create a single shadow map with a fixed resolution for every shadow camera. This is not great because a light that is far from the main camera will have the identical shadow map resolution as a light close to the main camera.
The solution to this problem is a shared shadow map with LODs (levels of detail) between all of the shadow cameras. Based on the distance between the shadow camera and the main camera we will assign different UV coordinates of the shadow map to each shadow camera. This will ensure that a shadow camera far from the main camera won't sample as many pixels of the shadow map, as a shadow camera close to the main camera will. The construction of a shared shadow map can be done in many ways by how you subdivide each area.
This is how my shared shadow map is subdivided, in my case this works great because of the number of shadows we cast.
The shared shadow map is an 8192 x 8192 depth texture and can support up to 184 shadow cameras. We can extend this to create more LODs if needed.
Summary
Shadows are expensive and complicated, but if they are ever worth it. I am glad we decided to set such high standards with the lighting. I've learnt much more about threading and how to avoid bottlenecks with the GPU. These kinds of threaded systems would not be possible without my Memory pool. For future work, I want to implement a static shared shadow map because it would have a huge performance increase. There is something called a deferred context in DirectX 11 and it would allow for multi-threaded rendering. After reading more about clustered shading I found out that you can use the clusters to accelerate the shadow calculations as well. There are so many different optimizations to implement and I find it so fascinating.