This means that at least during the copy, the application is stalled and no new draw commands are feeded meanwhile to the driver (at least in the current context/thread).
With client side vertex arrays absolutely the same must happen. glDrawElements won’t return until either
a) the drawcall itself is done or
b) the driver has made a copy of the data first. The problem is, it is difficult for the driver to find out how much data will be referenced by the draw call (until it scans all the indices), so the driver will probably do a)
So, glBufferData() is the better solution. The driver won’t have to guess how much data has to be copied, because we explicitly told him how many bytes to copy.
glMapBuffer/glUnmapBuffer will be even slower. In my experience, mapping is not faster then glBufferData for very small buffers. That is because, even though glMapBuffer gives you a pointer back, there’s NO guarantee that the data won’t get copied a second time. This is because some drivers might return
a) a pointer to directly mapped GPU memory (where mapping is probably a heavyweight operation) or
b) a pointer to some system memory and the driver does a copy on unmapping
Reading the original post, I guess its about rendering many very small objects that have no more than 8 vertices per drawcall. The actual memory bandwidth to copy this data around is probably negligible. He should think about using vertex shaders that create the quads from a “unit-quad” via few shader parameters (passing position and size to the vertex shader). This results in some kind of instancing and problem to upload data for each drawcall would then just go away
The calls to render such quads would then look like
glBindBuffer(GL_ARRAY_BUFFER, unit_quad_vbo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, unit_quad_ibo);
glVertexAttribPointer(0, GL_FLOAT, 2, 0,0,0);
for (int q=0; q<num_quads; ++q)
{
// pass position and size of quad into the vertex shader via dangling attributes
glVertexAttrib4fv(POS_SIZE_ATTRIB, &quad_pos_size[q]);
glDrawElements(GL_TRIANGLES, ....)
}
This would allow to render many many Quads (all from the same VBO data) with minimal effort