I just downloaded the NEHE lesson45 which demonstrates VBO rendering. I am using a Quadro FX 1500 board and here comes the interesting thing:
when I activate VBOs I get 60 fps while rendering 526000 triangles. By using VAs I get 260 fps with the same demo!
can anyone explain this effect to me? I thought VBOs (expecially static ones) should definitely be faster (or at least not slower) than vertex arrays - but it seems as if I am wrong…
Didn’t notice you started a new topic, so I’m posting my answer here too:
Why so many triangles? The standard tut has only 32k triangles. Maybe you exceeded the maximum size of the buffer object. The reason for bad performance with VBO could be too many ot too few data in a single BO.
ok maybe that is the problem, but I expected to be able to render 500K triangles with one VBO efficiently. (Actually 500K vertices are only a few MB of data…)
(I decreased the number of pixels per vertex to 1.0 in the demo - just for evaluation)
There must be something wrong either in your fps code, either in the number of rendered triangles. There’s no way you can get 260 fps @ 526k tris/frame on a Quadro 1500. That’d be near 137 MTris/second, with all the bus tranfers going on for each rendering call.
Geforce 7800 GTX, Pentium Dual core 3 Ghz, 2 GB ram, and the settings you just posted above: 110 fps with VBOs, 42 fps without VBOs. Pretty much what I expected.
I have no idea why it reports a framerate of 260 fps without VBOs on your machine, but it has to be wrong. Maybe a driver bug, not rendering all the vertices ?
By the way, I must point out that the proof the number is wrong is that a vertex is 20 bytes, you’ve got 1.7 millions vertices per frame, if you were indeed rendering at 260 fps via system memory, that’d be a bus bandwidth of 1.720260 = 8840 MB/sec, that’s a lot more than what even a PCI X16 bus can achieve.
well actually I am using windowed mode WITH the internal fps counter and the external (FRAPS) - take a look at these screenshots - btw, I have a core 2 Duo 1.8 GHz and 2 GB RAM.
actually the framerate is jittering between 230-270 that’s why there is a difference between both framecounters - but it is definitely a extremely strange behaviour…
vsync is disabled in both cases - I took both screenshots with completely the same configuration.
I checked my driver settings and disabled “maximize texture memory” and well - now I get ~400 fps without VBOs, with the VBOs I get ~100 fps.
Everything measured with fraps. I really have no idea what’s going on here…
by the way- it’s not a good idea to calculate fps like in that example:
if( (SDL_GetTicks() - g_dwLastFPS) >= 1000 ) // When A Second Has Passed...
{
g_dwLastFPS = SDL_GetTicks(); // Update Our Time Variable
g_nFPS = g_nFrames; // Save The FPS
g_nFrames = 0;
... }
the time difference (SDL_GetTicks() - g_dwLastFPS) can for instance be 1500 (=1.5 sec). in that case the condition is true, and if 100 frames were drawn, fps will be set to 100, although it is only 100/1.5=67.
you are right - normally I am not using this for framecounting - I use a code similar to the one you proposed, but I also use fraps as a reference.
I assume it is a driver bug/feature (?=) - I also cannot reproduce it on any other geforce card. If there’s someone with a Quadro card, please try it with the same driver version!
I modified the test to render the default 32K mesh 100 times (3.3M triangles) at 1024x1024 windowed mode and added logic to switch between modes as opposed to recompiling with a switch. On Vista I see what I would expect, but on XP I’m seeing similar differences (i.e., VAs are faster than VBOs). I also added logic to test display lists.
(1) Loop overhead, no draw
(2) VSYNC is non-functional on Vista Aero
VBOs and DLs behaved the same between Vista and XP with DLs being the clear winner for static data. There is a very odd anomaly with VAs on the Quadro cards on XP – and only an NVIDIA developer can answer that question.
I did notice that the NeHe test never calls glFlush() or glFinish() prior to swapping buffers. If I insert a glFinish() prior to calling SwapBuffers(), the VA frame rate is nearly identical to the DL frame rate and the NULL draw frame rate drops to 2750 fps.
Actually, if VBO’s using is quite accurate, there is no measurable difference between VBO and DL on modern drivers (expecially if you are using glDrawArrays(), not glDrawElements()).