I have made some testing:
With 50 objects of 7578 vertex each one (484992 bytes of vertex data each object)
- Mapping with:
glBufferDataARB(GL_ARRAY_BUFFER_ARB, dwSizeInBytes, NULL, GL_STREAM_DRAW_ARB);
pRet=glMapBufferARB(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY);
or
pRet=glMapBufferARB(GL_ARRAY_BUFFER_ARB, GL_READ_WRITE);
(I’m using STREAM_DRAW as I draw the object a couple of times: one for create the shadow map, one for render the object. I have tested that in this case is slightly faster than DYNAMIC_DRAW, not the map but the overall draw)
Then doing a FastCopy of a system memory array to the pointer returned by MapBuffer is slightly faster using GL_READ_WRITE than using GL_WRITE_ONLY (it is also a surprise for me). But it is just a small accumulated diference mapping and copying the 50 objects (each one with it’s own MapBuffer+FastCopy). I’m talking about near 25MB of data.
(FYI: for me a FastCopy is something like a copy using prefetch and MMX registers. If you are interested, I think you can find a GDC presentation about that in AMD’s website)
- Before this test, I was creating the transformed vertices (skinning) in a system memory buffer (I ‘random access’ the memory so I can’t use a buffer returned by MapBuffer) and then using:
glBufferDataARB(GL_ARRAY_BUFFER_ARB, dwSizeInBytes, pPtr, GL_STREAM_DRAW_ARB);
For my surprise, I have ‘discover’ that this approach is slower than replacing glBufferData with:
glBufferDataARB(GL_ARRAY_BUFFER_ARB, dwSizeInBytes, NULL, GL_STREAM_DRAW_ARB);
void *pRet=glMapBufferARB(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY);
FastCopy(pRet, pSysMemBuffer, dwSizeInBytes);
glUnmapBufferARB(GL_ARRAY_BUFFER_ARB);
(again it is slightly faster using GL_READ_WRITE, but it is minimal)
The difference between the two methods (BufferData vs Map+FastCopy+Unmap) is noticeable.
Hope this helps.
(Note: I’m using a GF 8800GTX)