As we all know, display lists are deprecated since OpenGL 3.0… But their speed has never been achieved, although there is a bunch of new “speedy” stuff (like: instancing - for drawing huge amount of similar objects, VAO - for collecting states, UBO - for collecting uniforms, bindless - for direct access to VBOs, etc.).
The bindless graphic significantly reduced CPU cache misses. If we have a lot of VBOs in the scene, each of them has to be bound before drawing. Their IDs have to be translated into physical addresses, and that is the stage skipped by bindless. BUT, if we have a lot of VBOs, we have A LOT OF FUNCTION CALLS, which makes our application a CPU bound. Hundreds of thousands of function calls is something that makes a driver overhead enormous, and no one extension, as far as I know, tries to solve the problem. Instancing is useful only for similar objects. But in many applications instancing is not suitable.
The only solution is to store function-calls into a “display list” and execute it in just one function-call. I will illustrate this with a scene having more than 65K VBOs. Using highly optimized approach with bindless access the frame rate on 9600GT is about 21 (with view-frustum culling up to 64). Without bindless it is almost two times less. What is happening if we draw it in a single “old fashion” display list? The frame rate without culling is 152. More than 7 times faster!!! Of course, this display list is not very useful. This is just an illustration.
The purpose of this long introduction is to draw attention to, what I think, is the greatest bottleneck in many applications today. In mine certainly!
Well, by this post I want to gather your opinions about a suggestion for a new revision of OpenGL, or an extension. I think it would be very useful to have a new kind of display lists that would store batch of commands that can be invoked with a single call. The list of commands can be restricted to: activating shader program, setting uniforms, setting attributes, activating and drawing VBO. It would be even more efficient if each command would have its slot, so that changing a command would not affect the entire buffer.
Because this buffered object does not require data management (VBO reorganization or similar), maybe a command buffer object (CBO) is more suitable name for it.