stream/dynamic VBOs and performance

What are “best practices” for using vertex buffer objects with dynamically updated data?

As I read the extension spec, glMapBuffer will always stall until all drawing commands that use that buffer have completed. If the driver is brain-dead it might even stall until ALL drawing is completed. Anyone know if it’s that bad?

glBufferSubData probably stalls until all drawing with the buffer completes. If I used only glDrawRangeElements, a really smart driver might be able to figure out that the drawing and the update don’t overlap and avoid the stall. However, unless someone can definitively confirm it, it doesn’t seem wise to count on this. Anyone know? And even if it can avoid the stall, it has the downside of copying data rather than letting me write it directly into a buffer.

From this, I’m concluding that unlike DirectX with its D3DLOCK_NOOVERWRITE flag, OpenGL offers no performance-friendly multi-vendor way to update only part of a vertex buffer object if any part of it has been used for drawing. Thus, each block of dynamically-updated data should be in its own buffer that’s either replaced completely or not modified at all. Correct?

There are several strategies I can see here:

  1. Always allocate a new vertex buffer object of the correct size. Delete it when it’s no longer used. Update it as needed with glBufferData( …NULL… ) + glMapBuffer, which will cause a new allocation within the driver.

2a. Have a pool of unused vertex buffers sitting around. When I need new buffer space, grab the best-fit from that pool (or allocate if necessary) and discard/replace its contents. When I’m done with a buffer, it goes into a queue. Queued buffers get returned to the free pool after buffer swaps (or fences if available) complete. Replacing the contents of an existing buffer is done with glBufferData on that buffer.

2b. Same as (2a), but replacing the contents of an existing buffer is done by tossing the current buffer into the unused queue and grabbing a new one from the free pool.

My guess is that for dynamic data (used multiple times for each time its updated), approach 1 would be the best. For streamed data (written once, rendered once, and then discarded), 2a and 2b are equivalent and would be the best solution.

But I’m just speculating here. Does anybody know for sure?

glMapBuffer will always stall until all drawing commands that use that buffer have completed.

But BufferData(…NULL) does not stall, so that’s how you get around the stalling question.

I believe that, if you look under the covers, options 1, 2a and 2b end up doing pretty much the same work, except in some cases, the work’s done by the driver, and in others, it’s done by you.

FWIW, we use option 1 for pretty much everything, and it seems to do OK (if you have later driver versions where VBO support is actually reasonably well implemented).

Hi,

But BufferData(…NULL) does not stall, so that’s how you get around the stalling question.
Thanks for the hint. I just replaced my glMapBufferARB by a glBufferDataARB to update my VBO data, and there’s quite a big performance leap on my application: from 43 fps to 140. (athlon xp, radeon 9600 agp 4x/catalyst 4.5).

I am using VBO to duplicate several times some procedural geometric data on the screen, i didn’t notice any performance difference between a Static VBO and a Dynamic one.

Using a stream VBO makes the performance rise at 150 fps. (if anyone can explain…?)

Here’s the mapping/unmapping code:

void* rdr_wgl_VertexStreamVBO::MapBuffer( int buffer_ind )
{
#ifdef MAPBUFFERS
	if (!map_counter)
	{
		glBindBufferARB( GL_ARRAY_BUFFER_ARB, bufferID );
		bufferadd = (char*) glMapBufferARB( GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB );
		map_counter++;
	}

	return (void*) (bufferadd + buffer[buffer_ind].offset);
#else
	return (void*) (vbodata + buffer[buffer_ind].offset);
#endif
}


void rdr_wgl_VertexStreamVBO::UnMapBuffers()
{
#ifdef MAPBUFFERS
	glUnmapBufferARB( GL_ARRAY_BUFFER_ARB );
	map_counter = 0;
#else
	glBindBufferARB( GL_ARRAY_BUFFER_ARB, bufferID );
	glBufferDataARB( GL_ARRAY_BUFFER_ARB, sz_buffers, vbodata, convert_vertexstream_type_to_gl[(int) type] );
#endif
}