VBO perfs analysis

i wrote a little benchmark to compare vbo and a basic loop of glvertex3fv.
I’m a bit deceived by the results :
for a loop of 100,000 points : fps = 59.7
for a loop of 1,000,000 points : fps = 19.9
for a vbo with 100,000 : fps = 62.5
for a vbo with 1,000,000 : fps = 31.2

For 100,000 points I got no gain from using VBO… isn’t it strange ?

I also measured the time needed for my glrender function in both cases. For a loop of glvertex3fv, it’s very slow (all the time is spend there).
But for the VBO, this is very fast, time is not spend there. What does that mean ?

That means the CPU is not the bottleneck :wink:

One great advantage of VBOs (and to some degree normal vertex arrays, too) is that you save a lot of CPU time. If your CPU has nothing else to do, you won’t notice a difference because the GPU is the limiting factor.

As soon as you do different tasks on the CPU, like calculating physics or AI, you will notice a performance drop because of the time “wasted” in the drawing function, while with VBOs the drawing function immediately returns and the GPU is busy drawing while the CPU does something else…

thanks for this clear answer.

My next question is about mixing some drawings with VBO and others with glVertex3f.

For example, i draw in the following order :
1.few vertices with glvertex3f
2.a big array of vertices using vbo
3.few vertices with glVertex3f

What’s going to happen ? Will the third step wait for the vbo to be finished ?

I’m no expert on the hardware details, but AFAIK, the CPU dumps geometry etc. into the same pipeline as the VBO data ends up in.

So what I expect is this:

  1. (glVertex stuff) CPU sends commands to pipeline, GPU processes them really quickly, pipeline remains virtually empty.
  2. (VBO) CPU sends one command to the pipeline, but that places a lot of work into the pipeline. CPU can continue, but the GPU has a backlog.
  3. (more glVertex stuff) CPU sends commands to the pipeline, behind the backlog of the GPU.
  4. CPU does other stuff/waits for buffer swap - while the GPU gets some time to finish work.

In other words: all calls (except for synchronisation calls) return as soon as their commands have been buffered. But everything is still effectively performed in the same order.

So you save a lot of CPU time, and if the CPU is the bottleneck that means you increase the rendering speed, but the results are correct.

Note: if you have software fallback at some point, of course the pipeline would have to be flushed, if it is used at all.

But even if there was a need to flush the pipeline before the glVertex calls were issued, it would still be faster. Simply because of the function call overhead.

A question related to this topic: Display lists behave identically as VBOs in that sense, don’t they? I mean that glCallList is a non-blocking command, so the CPU can do its work after the call while the GPU is doing its own.