Buffer Object Streaming
Buffer Object Streaming is the process of updating buffer objects frequently with new data while using those buffers. Streaming works like this. You make modifications to a buffer object, then you perform an OpenGL operation that reads from the buffer. Then, after having called that OpenGL operation, you modify the buffer object with new data. Following this, you perform another OpenGL operation to read from the buffer.
Streaming is a modify/use cycle. There is typically a swap buffers (or equivalent frame changing process) between one modify/use cycle and another.
OpenGL puts in place all the guarantees to make this process work, but making it work fast is the real problem. The biggest danger in streaming, the one that causes the most problems, is implicit synchronization.
The OpenGL specification permits an implementation to delay the execution of drawing commands. This allows you to draw a lot of stuff, and then let OpenGL handle things on its own time. Because of this, it is entirely possible that, well after you call whatever operation that uses the buffer object, you might start trying to upload new data to that buffer. If this happens, the OpenGL specification requires that the thread halt until all drawing commands that could be affected by your update of the buffer object complete.
This implicit synchronization is the primary enemy when streaming vertex data.
There are a number of strategies to solve this problem. Some implementations work better with certain ones than others. Each one has its benefits and drawbacks.
The very first thing you should do is make sure that STREAM is in your buffer's hint.
Explicit multiple buffering
This solution is fairly simple. You simply create two or more buffer objects of the same length. While you are using one buffer object, you can be modifying another. Depending on how much parallelism your implementation can provide, you may need more than two buffers to make this work.
The principle drawback to this solution is that it requires using a number of different buffer objects. If you are using this for uploading vertex data, you will therefore need more VAOs.
This solution is to reallocate the buffer object before you start modifying it. There are two ways to do it.
The first way is to call glBufferData with a NULL pointer, and the exact same size and usage hints it had before. This allows the implementation to simply reallocate storage for that buffer object. Since allocating storage is (likely) faster than a the implicit synchronization, you gain significant performance advantages over the synchronization. And since you passed NULL, if there wasn't a need for synchronization to begin with, this can be reduced to a no-op. The old storage will still be used by the OpenGL commands that have been sent previously.
You can do the same thing when using
glMapBufferRange with the GL_MAP_INVALIDATE_BUFFER_BIT. This gives the implementation the freedom to orphan the previous storage and allocate a new one.
Obviously, these methods only work if the previous data in the buffer is irrelevant. Generally, streaming works best if it is done on a whole buffer rather than parts of a buffer, and if you overwrite all of the data in that buffer each time.
One problem with this method is that it is implementation dependent. Just because an implementation has the freedom to do something does not mean that it will.
The most dangerous form of streaming is to use
glMapBufferRange with the GL_MAP_UNSYNCHRONIZED_BIT. This tells OpenGL not to do any implicit synchronization at all.
This does not mean that synchronization is unimportant. Indeed, you get undefined results if you are modifying the buffer before or during the time that the last command that read from it will be executed. What this allows you to do is manual synchronization with a sync object.
If you put a fence after all of the commands that read from a buffer, you can check whether this fence has completed before mapping the buffer. If it has not, then you can wait to update the buffer, performing some other important task in the meantime. You can also use the fence to force synchronization if you have no other tasks to perform. Once the fence has completed, you can map the buffer freely, using the GL_MAP_UNSYNCHRONIZED_BIT just in case the implementation isn't aware that the buffer can be updated.
Instead of using the fence to force synchronization, you can also combine this method with buffer respecification. If you reach the point where you are out of other tasks to do, simply use GL_MAP_INVALIDATE_BUFFER_BIT to reallocate new buffer storage. This way, you will only respecify the buffer if you need to.