Does swapping VBOs cause CPU : GPU Sync?

I’m trying as hard as I can to avoid CPU GPU synchronization.

I am in a situation where I may need to do a draw call with a particular VAO bound, swap the VBOs in that VAO (So a different VBO is feeding the same vertex attribute input to the vertex shader) and draw the same VAO with the new VBO bound.

I need to do this for per-instance attributes. Instances are batched by texture and cubemap, so I need a VBO for every combination of texture and cubemap that a particular model may have.

Will swapping the VBOs cause CPU:GPU sync?

Thanks.

I’m not sure about synchronization, but can’t you just use a single VBO to store your data? Then use glDrawElementsBaseVertex() or glDrawArrays() and use the baseVertex / first as offsets into your VBO.

Will swapping the VBOs cause CPU:GPU sync?

No. Changing which buffer a VAO uses is fairly cheap among the state changes. This is best done with the separate attribute format API.

What about mid-frame, when the same VAO has already been used for a draw call?

if it is still cheap, i might just be doing that

What Alfonse said. There may be some delay if the new VBO’s contents aren’t immediately accessible to the GPU, but the driver dealing with that isn’t CPU-GPU synchronization.

Where you need to be careful is with “updating” the content of VBOs. If done wrong, this can cause CPU-GPU synchronization. On mobile, the penalty for this is much, much higher than on desktop. Depending on the driver and your VBO update method, your draw thread can be blocked for a whole frame or two (due to its sort-middle architecture) while the GPU “catches up” to prior read references from that buffer object that are already in-flight. This will cut your frame rate in half (or worse).

if I use glBufferSubData it’ll be fine right?

I know that glBufferData is super slow because it has to allocate a whole new buffer from gpu mem

Luckily I only do it when the number of instances increases higher than it ever has before

No. There is a difference between “change buffers” and “change what is in the buffer”. What you’re talking about is the latter.

In order for glBufferSubData to work, the upload has to wait until all prior commands that read from the buffer have executed. This is usually done asynchronously by copying your array of data to temporary staging memory, then doing the upload when the buffer is actually ready for it.

You’re talking about streaming: frequently uploading data to the GPU. Efficient techniques for doing this are collated elsewhere.

[QUOTE=Geklmin;1291598]if I use glBufferSubData it’ll be fine right?

I know that glBufferData is super slow because it has to allocate a whole new buffer from gpu mem
[/QUOTE]
glBufferSubData() is more likely to cause CPU-GPU synchronisation.

As glBufferData() typically allocates a new block of memory, it’s fairly simple for it to decouple allocation and population of the new memory from deallocation of the existing memory, avoiding any need to wait until pending commands have completed.

For glBufferSubData() to do something similar would be more involved, so it’s less likely to happen. Instead, the driver will probably overwrite the previous data, which requires waiting until any pending commands which use that data have completed. If you need to replace part of a buffer without synchronisation, use glBufferData() and glCopyBufferSubData().

As already mentioned, no. And changing the content via plain glMapBuffers() can yield the same problem.

What you want to look at is 1) changing buffer content via PERSISTENT/COHERENT buffer maps, 2) changing buffer content by mapping the buffers UNSYNCHRONIZED with glMapBufferRange, and/or 3) using buffer orphaning. For details, see Buffer Object Streaming.

If you’re stuck with an old GLES driver that supports none of these (don’t think you’ve said whether you’re targeting GL or GLES), either: 1) use client arrays or 2) keep a ring buffer of VBOs and never update a buffer object (via Sub or plain Map) that you have submitted draw calls for until 2-3 frames after the last draw call you’ve submitted that references that buffer object.

I know that glBufferData is super slow because it has to allocate a whole new buffer from gpu mem

This isn’t true in general. If you re-allocate buffer storage on a buffer object with the same size as before, effectively "orphan"ing older buffer storage with newer buffer storage, the driver pools the old, unused buffer storage blocks and recycles them for new buffer storage allocate requests so this can be very fast. What looks like a re-allocate on your side ends up being a cheap “give me the last memory buffer of that size that we’re not using anymore” on the driver side. Calling glBufferData with a NULL pointer and the same size as before is one method of doing buffer orphaning (a form of resource renaming). Again, see the wiki page for details.

Either that, or the driver will wholesale block your draw thread while it waits for all uses of that buffer object in-flight already to clear the pipeline (depending on your buffer use and the driver, it may only block until the vertex work referencing that buffer object is complete; see the docs from the driver writer for details on how they handle this).

Wait… so it’s faster to do glBufferData than glBufferSubData?

It simplifies my code a lot but… really?

And I will always be swapping VBO contents before using them, and always after the previous glfw Swap Buffers

So here is my situation:
Before I draw my instanced mesh, I need to update the per-instance information in the VBO that holds mesh instance data. All instances will be updated, indeed the entire buffer will be populated with data.

The VBO will not be used again until the next frame, after glfw has swapped the screen buffer

What should I use? glMapBufferRange? glBufferSubData? glBufferData?

There is an entire Wiki article on this subject. Please go read it.

I did! It didn’t really tell me anything.