Hey - I wasn’t sure if this should go under the Windows section, or maybe drivers, but I am a bit of a beginner, so here it goes.
In my company’s application, we have a pretty simple GL system for VBOs and geometry, that was written about 8 years ago. We dynamically generate a lot of geometry and store it directly in VBOs, since it’s often transient and we want to save system memory. Occasionally we want to read the dynamically created geometry back from the VBO, and sometimes we only do partial updates. If partial updates are small enough, we use glBufferSubData. Otherwise we orphan the old VBOs using glBufferData set to NULL, and generate an entirely new VBO. When we do want to read from the VBO, we use glMapBuffer (as GL_READ_ONLY).
On Windows + ATI systems, the glMapBuffer call to read data was extremely slow. So slow that even the occasionally reads can make the app unusable. I believe this is due to the driver unnecessarily synchronizing on a read-only call. I tried switching to glMapBufferRange with the flags GL_MAP_READ_BIT | GL_MAP_UNSYNCHRONIZED_BIT, which is technically illegal, and it massively sped up the app on Windows + ATI systems. But, because it’s illegal, it broke on nVidia systems and returned NULL as it should.
So, is there is an established way to declare an asynchronous read?
Here are a couple of things which seem to work in general, but I don’t know how robust they are: use glMapBufferRangeARB, which only seems to be declared on ATI systems (and is not part of the GL spec). That way I can use wglGetProcAddress, and if it’s NULL just use glMapBufferRange without the unsynchronized read. However, I don’t really know if it’s only on ATI systems, but it seems that way anecdotally.
Alternatively, I can attempt an unsynchronized read, and if it fails (returns NULL, as it does on the nVidia systems), do it without the unsynchronized bit. I think this approach is technically safe (should be inline with the GL spec, while hacking the ATI systems to read asynchronously), but I’m not 100% certain.