FBO performance: who is right?
years ago, I saw Simon Green's presentation, which showed recommendations for FBO performance. The relevant slide is nr.29 from http://http.download.nvidia.com/developer/presentations/2005/GDC/OpenGL_Day/OpenGL_FrameBuffer_Object.pdf .
He recommends to use one FBO and switch between textures.
But then, Valve comes, and tells a very different story. The slides can be found here: https://developer.nvidia.com/sites/d...to%20Linux.pdf the relevant one is nr.65. It says "Do not create a single FBO and then swap out attachments on it."
Now, I realize that (a) nothing replaces doing your own benchmarks (b) Green's slides are years older than Valve's. So, perhaps changes in the hardware caused Green's preferred option to no longer be the fastest one?
FBOs were designed to encapsulate the whole framebuffer state and provides a way to replace it all, thus I would highly recommend having multiple FBOs instead of a single one and swapping attachments. Why?
1. This is how the API was designed to be used
2. Usually results in less API calls
3. The driver doesn't have to re-validate framebuffer completeness and other internal state (these states are cached inside an FBO)
Sure, doing your own benchmarks is the best way to determine what's best for you, but even if currently the choice I've suggested is not the most efficient on some driver implementation, it is the more likely one that can be further optimized, so in the long run it should be faster.
(Off-topic: That's why I didn't really understand Valve's presentation about VAOs, as they not recommend using them. In practice VAOs are in fact faster if you have a larger number of attributes than 1 or 2, and they should be preferred because of the same reason: that's the purpose of them, less API calls, and state can be cached in the object)
Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/
Well, it doesn't really matter for the recent core GL developer anyway.
Originally Posted by aqnuep
It matters. In Core Profile you need VAO yes it is true, but you can gen and bind one at application startup and never undbind it. Then before each draw call configure your arrays with GL_ARB_vertex_attrib_binding. This is a fast path for NV driver.
Originally Posted by thokra
Well, having just read the very brief passage in the Valve slides, I find a claim like "Slower than glVertexAttribPointer on all implementations" without any substantial data to back it up rather amusing - just like them calling a VAO a vertex attribute object. Someone clearly has a love for the spec terminology.
The idea of having more API calls and still reach higher performance, even if true on all current implementations, just baffles me and again leaves me wondering, why the hell is it so difficult with OpenGL to have mandatory concepts perform as well or better than algorithms that apparently do the exact same thing which is supposed to be avoided by the aforementioned concepts?
I agree that binding as few as possible VAOs per frame is good, but inding a single VAO just to conform to the spec and then setting up your arrays every frame? That's simply messed up.
I believe the problem with using VAOs is the lack of distinction between "bind for modify" and "bind for draw". Because the driver has no idea what you're going to do, it has to set up all the state for each vertex attribute location (x16 or x20), whether that location is enabled or not, so that all the vertex queries and state is available to you for a modify op. But if there existed a specialized "bind VAO for draw" call, it could ignore setting up the state for all disabled locations, and simply just set the disable state on those.
That's probably why calling glVertexAttribPointer() on a small subset of locations is faster than a VAO switch, which sets up the state for all locations.
As far as I am aware, both are right, they are talking about two different scenarios.
Originally Posted by dv
If you switch between fewer rendertextures (of same size) than the maximum number of FBO-attachments, you can attach them to a single FBO to switch quickly between them. This is a static attachment, typically done at init-time, it never changes.
What Valve are talking about, is swapping out the attachments, which does not apply to the situation described above.
With "switch quickly between them" you mean attaching them once, and then just setting the one to be used with glDrawBuffer() ?
Originally Posted by Mikkel Gjoel
Originally Posted by dv