I’m writing an app that does some complex character animation, which I’m computing on the CPU. I’ve implemented two methods of streaming the data into OpenGL:
[ol][li]Compute the values in a memory buffer, then call glBufferSubData[*]Call glMapBuffer (with GL_WRITE_ONLY) and stream the results into it[/ol][/li]I would expect the latter to be faster since it is zero-copy. However, it is actually about 10% slower on my system (GF6600GT, 96.40 driver, Athlon XP 3000+). I’m guessing that there is a stall somewhere, although it makes essentially no difference if I call glBufferData with a NULL pointer just before glMapBuffer (to indicate that the current data may be discarded). Incidentally, the buffer is allocated as GL_STREAM_DRAW.
What’s the recommended way to stream vertex data from the CPU without either double-copying or stalling waiting for the GPU to finish with the previous data?