PBuffers & Dynamic Cubemaps

This is more informative than a question (but perhaps someone can give some pointers Re performance…)

I have finally gotten around to implementing dynamic cube maps in my current project (I’ve put a piccy here ).

I initially went for the Render to texture solution and found it very slow (geforce 3 ti200). So I then created a glCopyTexSubImage() path and found it even slower…? So I did some benchmarks to compare different techniques etc.

In these tests I used the scene shown in the image above (very simple “map” with a single reflecting object). Collision detection has very little effect on performance but to standardize the results I left the ball in the “Home” position sitting stationary.

The Cube maps are updated every frame and the stencil shadows are enabled in all the tests. The pbuffer is 64x64pixels.

The tests I did (& the results) were…

1. Plain vanilla render, Stencil Shadows, no cubemaps

  • 272fps

2. 1. + just switching to pbuffer with no output to pbuffer

  • 223fps (~50fps just to switch to the pbuffer?)

3. 2. + render a single side of the cube map (-Y) and don’t copy the data to the cube map (ie. impact of rendering the scene to pbuffer)

  • 220 fps

4. 3. + render to texture on

  • 180 fps

5. 3. + use glCopyTexSubImage2D()

  • 109 fps (Yikes!)

6. 2. + render two sides of the cube map (-Y, -X) and don’t copy the second side to the cube map (ie. impact of rendering the scene to pbuffer, copying 1 side to cubemap and then switching to another side)

  • 108 fps (Switching sides costs HEAPS - compare this to 3. above)

7. 6. + render to texture on

  • 118 fps

8. 6. + use glCopyTexSubImage2D()

  • 73 fps

9. Render and copy all six sides using Render to texture

  • 79 fps

10. Render and copy all six sides using glCopyTexSubImage()

  • 32 fps

11. Use the primary frame buffer (ie. don’t use pbuffers) and render a single side of the cube map - update with glCopyTexSubImage.

  • 233 fps

12. Same as 11 but do two sides of the cubemap.

  • 205 fps

13. Same as 11 but do all 6 sides

  • 138 fps

My conclusion from all this is that I will not use pbuffers (what’s the point - in a game you make the window top most so you won’t lose any of the framebuffer - and the performance is woeful otherwise - on nVidia, I’ll leave it configurable just in case). I’ll also update a single side of the cubemap each frame, rather than all six at once.

[This message has been edited by rgpc (edited 03-15-2003).]

yes this is no wonder, using pbuffers and uploading to cube map is terribly slow on nvidia cards, dunno about ati
anyway there are some cases one simply cant use the back buffer and this still remains a big unsolved issue
i found it very interesting to read the benchmarks tho
thanks for spreading your results!

If you haven’t already done so, try the 43.00 NVIDIA drivers. The speed of rendering to cube map has improved compared to older driver versions (in particular 41.09).
But it’s still not nearly as fast as copying directly from the framebuffer and not using any pbuffers. The pbuffer context switches are just too expensive.

anyone with an ati card to compare these results?

Originally posted by tellaman:
yes this is no wonder, using pbuffers and uploading to cube map is terribly slow on nvidia cards, dunno about ati

Yes, I always thought it was only an issue with render-to-texture, but just the magnitude of the speed decrease makes me think that pbuffers just aren’t practical for gamedev (Just switching to the bpuffer wiped 20% off my frame rate).

But I guess if you were just using it to update textures you could switch to a pbuffer, update all the textures you needed to update, then switch back, and the 20% hit wouldn’t be too serious.

anyway there are some cases one simply cant use the back buffer and this still remains a big unsolved issue

My piccy shows one such case - at the very top you can see some “junk” which is caused by my window not being always on top. The Creative app bar is obscuring the window and eating some of the back buffer. Windowed mode is another (window partially off screen) and then there’s if you need to render an image larger than the frame buffer. But all these cases can probably be ignored for your average game (Games are generally run in full screen and you would sacrifice quality for speed and use lower resolution dynamic textures).

i found it very interesting to read the benchmarks tho
thanks for spreading your results!

You’re welcome. I haven’t seen these types of tests before (no doubt they’re out there somewhere) and I thought people might find it interesting. It would be interesting to see the same tests done on an ATI card.

(Hey ATI, you can just send me a 9700pro and I’ll test it for you )

All in all it was a worthwhile exercise because it gave me some handy optimizations for the reflections etc.

[This message has been edited by rgpc (edited 03-16-2003).]