glBufferSubData + syncing is indeed very slow. What you can try, if your target hardware supports it, are persistently mapped buffers.

The idea is that you keep around a pointer to a block of...