Small VBO problem with NVIDIA

I have an application that uses one unique vbo to draw text.
What I do several times per frame is:

[b]DrawText[/b]
dwSizeInBytes=dwNChars*4;
glBindBufferARB(GL_ARRAY_BUFFER_ARB, uBufferID);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, dwSizeInBytes, NULL, GL_STREAM_DRAW_ARB);
pRet=glMapBufferARB(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB);

Fill the buffer from the pRet pointer
...

glUnmapBufferARB(GL_ARRAY_BUFFER_ARB);

glTexCoordPointer(2, GL_FLOAT, ...);
glVertexPointer(2, GL_FLOAT, ...);

glDrawArrays(GL_QUADS, 0, iNRealChars*4);

The problem comes when I draw some small texts with the same number of chars (dwSizeInBytes are the same in some consecutive calls) and the app runs at big FPS. In some consecutive calls, pRet gives me the same value, and just one text of those consecutive calls are drawn.

  • If I slowdown the system. It works properly.
  • If the texts have different sizes (dwSizeInBytes will be different) it works.

Also, If I put:
glBufferDataARB(GL_ARRAY_BUFFER_ARB, 0, NULL, GL_STREAM_DRAW_ARB);
before
glBufferDataARB(GL_ARRAY_BUFFER_ARB, dwSizeInBytes, NULL, GL_STREAM_DRAW_ARB);
it works properly.

  • It works properly in ATI hw.

It is a rare problem because:

  • You have to be re-using the same VBO with glMapBuffer.
  • The amount of data should be the same between two consecutive passes.

Does anybody have similar problems? Am I doing something illegal?
It seems a bug for me.

Thanks.

One thing I’d look into is: do you really need to map such small buffers? That often? I’d expect the overhead to be seriously more than just uploading the data to the VBO (letting the driver handle potential mapping).

That said, it looks like it could be a problem in your implementation (the nvidia ICD), but please don’t take my word for it - I haven’t verified with the spec, even that I see it as an obvious ICD error.

Something else to also possibly look out for is that from my limited testing nvidia seems to have different batching “rules” than ATI, something that bit me pretty hard trying to keep CPU offloaded while letting the GPU continue do its work (*), why you could perhaps try a glFlush and see if it makes a difference.

(*) It seemed at the time, for that particular test, the nvidia driver flushed instructions to the card at context switch, allowing the GPU to continue processing while the CPU was in the halted state. Trying on ATI I was at first baffled to see CPU load raise to unknown heights, before I noticed my error; adding a glFlush before my (forced) switch to another thread brought down CPU load to sub 10% overall even for ATI.