Dynamic/Streaming VBO: how to use? how do they actually work?

In VBO experiments I’ve found that if static VBOs are very efficient, I haven’t been able to bring streaming or dynamic VBOs to the level of performance of immediate mode or good old (even uncompiled) vertex arrays.
Further, it appeared that using glBufferSubData to update the whole dataset was faster than using glBufferData, by a factor 3, which is contrary to what the spec or nVidia VBO performance paper suggest (tested on GF3 and FX5900), manually managing several VBO sets and filling them alternatively from frame-to-frame with BufferSubData was even faster (if still not as fast as plain vertex arrays). Isn’t glBufferData supposed to take care of CPU/GPU synchronization issues?
I also tried doing a single glVertexPointer as hinted in the nVidia paper, but the framerate didn’t change at all.

An also odd result was the very poor performance of ATI drivers when using vectex arrays (it’s 20-30% slower to use glVertexPointer/glDraw(Range)Elements than to just loop through the array yourself and specify data with glBegin/glVertex), not sure what’s going on, but if nVidia drivers showed a 1:3 ratio between normal vertex arrays and static VBOs on a 5900, ATI drivers exhibited ratios of 1:10 and beyond on 9700/9800 models, truning normal vertex arrays into major bottlenecks.

I wasn’t able to find any demos of VBO in a streaming situation (where all vertices get updated each frame), all the demos I found where using static VBOs… is it because that’s the only situation in which VBOs currently work?
Any link/url to a streaming VBO demo would be most welcome, I want leave VAR behind :slight_smile:

(see http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011853 for the methodology/code that’s used)

Hi,
I used the following scheme for streaming vertex data, and it worked for me as advertised.
glVertexPointer() is the expensive one because it’s there where the synchronisation/fencing is taking place.

glBindBuffer( … );

// allocate space for new buffer
glBufferData( …, GL_STREAM_DRAW, size, 0 );

// get pointer into mem
void *pointer = glMapBuffer( …, GL_WRITE_ONLY );

// fill mem with vertices
for( …

glUnmapBuffer();

glVertexPointer( …, BUFFER_OFFSET(0) ) // <<- this is the expensive one

This is excatly what I’m doing… :frowning:

Did you try dropping the VBO and issuing calls directly? (glVertex, etc.)

I remember reading an optimalization presentation saying that the buffer size should be a multiple of some number to achieve optimal results. I can’t remember the exact number though, nor the the presentation slides I got it from.

N.

Originally posted by EG:
[b]This is excatly what I’m doing… :frowning:

Did you try dropping the VBO and issuing calls directly? (glVertex, etc.)[/b]
At the time I did this, I got theoretical throuputs of 20 Mverts / s on a GF2 so thought it was optimal.

>At the time I did this, I got theoretical throuputs of
>20 Mverts / s on a GF2 so thought it was optimal.

hmm… I can achieve higher speeds only when I update part of the data only (vertex coordinates or normals only f.i.), but not if I re-specify everything… and only when using BufferSubData.

Would you have a small demo, link to a demo or a test case per chance?

No demo, I’m afraid.

Have you tried with larger batches?
How large are the chunks you request with glBufferData?

My mesh is 60k vertices and beyond, vertex coordinate + normal for each, and a essentially a huge indexed triangle strip.

I tried smaller batches (6k vertices) but that didn’t help…