Deferred Rendering Optimization: keep a copy of buffers for objects that don't move?

Hi there,

I have a question about an optimization to my Deferred Rendering setup.

So I’ve built a game, and the majority of the time the camera remains locked in the same position. As a result of this, a large portion of the meshes on screen are drawn every frame to the exact same locations without any changes to their data in the gBuffers.

A thought I had was I could have a set of “static” gBuffers which is composed of all the models which are not animated and which do not move. And another set of “final” gBuffers. Then in situations where the camera doesn’t move I could simply skip rendering the static models and just copy the old “static” gBuffers into the “final” gBuffers before I render the moving meshes. Thus hopefully increasing performance when the camera doesn’t move (the majority of the time.)

When I did a rough implementation I was disappointed to find the performance approximately the same. This was largely because every frame I did a blit for each of my color attachments (positions, normals, AlbedoSpec) (I know I still need to handle depth but I wanted to see if this was worthwhile before continuing.) I’m currentlying using 3.3, I know if I upgraded I could use glCopyImageSubData which would probably be faster.

What I’m interested in is if anybody has any thoughts on this? Are there any tricks I could try or is it just a fact that copying those large screen sized textures would be slower than rendering the actual meshes? I know I could (and probably should) switch my positions buffer to a depth buffer and do that optimization, but for this it doesn’t seem like that would tip the scales much.

Thanks in advance and sorry if any of this is foolish or obvious,
John

I know I could (and probably should) switch my positions buffer to a depth buffer and do that optimization, but for this it doesn’t seem like that would tip the scales much.

How would it not “tip the scales much”? The cost of a copy operation is primarily driven by the amount of data that gets copied.

Positions would typically be stored in at least 16-bit floats. And you need 3 coordinates, but since you can’t actually have a 48-bit pixel, the implementation will likely give you 64-bit pixels instead.

So every frame, you are copying 8 bytes per pixel times the resolution of your scene. Plus, you still need to copy the static depth buffer (for blending and the like).

8 bytes per pixel * the resolution of the scene is a much larger number than the zero bytes you would be copying if you reconstitute the position from using just the depth buffer.

When it comes to optimizations, always aim for the low-hanging fruit first. Whether your scene is static or dynamic, reconstituting the position from the depth buffer rather than reading from a texture is almost always a performance win.

Also, note that by copying the depth buffer, rather than rendering from a previously cleared one, it is possible (but hardly certain) that you might lose certain depth optimizations.

Yes. Rather than just guessing the right things to optimize, I would highly recommend you profile first. Then optimize the biggest bottleneck. That’ll give you the most bang for your buck.

By profile, I don’t necessarily mean pulling out Nsight or Perf Studio. I mean simply adding a “switches” to your app that allow you to dynamically (at runtime preferably, to support fast iteration) enable and disable various pieces of your draw loop. For instance:

[ul]
[li]turn on/off whole render passes and specific blits, or [/li][li]turn off specific types of objects that are rendered, or [/li][li]turn off some particularly expensive feature like dynamic shadow generation, or lighting, or specific types of post-processing. [/li][/ul]
With a good suite of these debug switches, you can pretty quickly get a sense of what aspects of your rendering are the most expensive, and so focus your time where it’s going to make the most difference to reducing your draw time (and thus increasing your app’s performance).

That beats just changing the easiest stuff first speculatively, which is like just throwing dice.

Of course for these to be any use, you need a reliable way to time your draw frames (in msec). Disable VSync, let your draw loop rip, and then look at deltas in these frame times to see where all of your time is going.

Thanks in advance and sorry if any of this is foolish or obvious,

No, not at all. Good question!