Vertex Array Speed

When I use VBOs to draw terrain patches, my frame-rate drops by almost 50%, as compared to immediate mode triangle-strips.

The VBO vert-pointer array is created only once, but the vertex and texture data is updated every frame.

Is it normal for VBOs to run so slowly, when updated every frame?

How do I speed it up?

Is it normal for VBOs to run so slowly, when updated every frame?
Um, yes.

Unless you create your VBOs specifically to be updated ever frame (with stream-draw), you’re going to get significant slowdown.

Can you give some details of the stream-draw method?

I’ve tried glBufferDataARB() with GL_STREAM_DRAW_ARB and GL_STATIC_DRAW_ARB, but there is no difference in frame-rate using these different approaches.

Is there any reason I should try vertex arrays without VBOs?

Thanks.

Can you post the code you use to update the contents of the VBO? (BufferData/BufferSubData/MapBuffer, and anything surrounding it?)

Do you replace the entire contents of the VBO every frame, or only part of the contents?

VBO should be faster than immediate mode even if GL_STREAM_DRAW mode is used.

You can copy new dataset with glBufferDataARB() every frame. Or mapping technique; glMapBufferARB()/glUnmapBufferARB().

Here is a snippet code using map/unmap;

float *ptr = (float*)glMapBufferARB(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB);
if(ptr)
{
    // update dataset with given pointer to vertex buffer
    updateMyVBO(ptr);
    glUnmapBufferARB(GL_ARRAY_BUFFER_ARB); // release VBO after use
}

You can copy new dataset with glBufferDataARB() every frame.
No. glBufferData allocates a new buffer, so it is slow. Better use glBufferSubData to override the existing buffer.

Originally posted by Overmind:
[quote]You can copy new dataset with glBufferDataARB() every frame.
No. glBufferData allocates a new buffer, so it is slow.
[/QUOTE]It allocates new memory only if it has to because the buffer size or storage type changed or because the buffer is used by GPU. Otherwise it will likely reuse existing memory.

Here is my code.

--once--
//
glGenBuffersARB( 1, &BufferName[0] );
glBindBufferARB(GL_ARRAY_BUFFER_ARB, BufferName[0]);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, 2000 * 3 * sizeof(double), terrain.grid_verts, GL_STREAM_DRAW_ARB);
//
glGenBuffersARB( 1, &BufferName[1] );
glBindBufferARB(GL_ARRAY_BUFFER_ARB, BufferName[1]);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, 1000 * 2 * sizeof(float), terrain.tex_coords, GL_STREAM_DRAW_ARB);
//
-- every frame --
//
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
//
glBindBufferARB(GL_ARRAY_BUFFER_ARB, BufferName[0]);
glVertexPointer(3, GL_DOUBLE, 0, (char *) NULL);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, terrain.ngrid_verts * 3 * sizeof(double), terrain.grid_verts, GL_STREAM_DRAW_ARB);
//
glBindBufferARB(GL_ARRAY_BUFFER_ARB, BufferName[1]);
glTexCoordPointer(2, GL_FLOAT, 0, (char *) NULL);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, terrain.ntex_coords * 2 * sizeof(float), terrain.tex_coords, GL_STREAM_DRAW_ARB);
//
glDrawElements(GL_TRIANGLE_STRIP, terrain.ngrid_tstrip, GL_UNSIGNED_SHORT, terrain.grid_tstrip);
//
glDisableClientState(GL_VERTEX_ARRAY); 
glDisableClientState(GL_TEXTURE_COORD_ARRAY);

The GL_DOUBLE type is not natively supported by most GPUs so the driver needs to do conversion on the CPU during the rendering command which is almost certainly the cause of the slowdown you see.

Komat, you were right about the GL_DOUBLE, but it didn’t solve the problem completely.

Here is my frame-rate for various scenarios:

With VBOs (GL_DOUBLE)… 188
With VBOs (GL_FLOAT)… 224
Without VBOs… 292

Are there any more enhancements you can think of?

Thanks.

I wouldn’t be so sure the driver optimizes away the allocation. It doesn’t ever optimize away redundant state change…

I would at least try using glBufferSubData, to see if it makes a difference.

Originally posted by Overmind:
I wouldn’t be so sure the driver optimizes away the allocation. It doesn’t ever optimize away redundant state change…

What the driver optimizes depends on cost of the change when compared with cost of the check (complexity of the check, frequency of the calls, probability that the check will avoid additional work).

At least the Nvidia optimizes this based on this paper .

The allocation can still happen if the GPU is using the buffer or the driver would have to wait. In the case of replacing content of entire buffer that is currently in use by GPU, the glBufferSubData has opposite problem. Unless the driver detects that entire buffer content is replaced and optimizes that by allocating new memory, it needs to wait for the GPU. In both cases there would be allocation or wait.


I would at least try using glBufferSubData, to see if it makes a difference.

That is good idea.

Additional thing to try would be to manually double buffer the VBOs in case the GPU using the buffer would force the driver to allocate memory or wait.