VBO vs. Vertex Arrays on a Quadro FX 1500

I just downloaded the NEHE lesson45 which demonstrates VBO rendering. I am using a Quadro FX 1500 board and here comes the interesting thing:

when I activate VBOs I get 60 fps while rendering 526000 triangles. By using VAs I get 260 fps with the same demo!

can anyone explain this effect to me? I thought VBOs (expecially static ones) should definitely be faster (or at least not slower) than vertex arrays - but it seems as if I am wrong…

What drivers are you using?

The offical (Quadro) Forceware 91.85

Didn’t notice you started a new topic, so I’m posting my answer here too:

Why so many triangles? The standard tut has only 32k triangles. Maybe you exceeded the maximum size of the buffer object. The reason for bad performance with VBO could be too many ot too few data in a single BO.

ok maybe that is the problem, but I expected to be able to render 500K triangles with one VBO efficiently. (Actually 500K vertices are only a few MB of data…)

(I decreased the number of pixels per vertex to 1.0 in the demo - just for evaluation)

There must be something wrong either in your fps code, either in the number of rendered triangles. There’s no way you can get 260 fps @ 526k tris/frame on a Quadro 1500. That’d be near 137 MTris/second, with all the bus tranfers going on for each rendering call.

Y.

you can download the demo from NEHE - just change the code lines 26-28 like this:

#define MESH_RESOLUTION 1.0f		
#define MESH_HEIGHTSCALE 1.0f		
#define NO_VBOS		

I was using FRAPS for fps-measurements (60 with VBOs/ ~250 without)

Geforce 7800 GTX, Pentium Dual core 3 Ghz, 2 GB ram, and the settings you just posted above: 110 fps with VBOs, 42 fps without VBOs. Pretty much what I expected.

I have no idea why it reports a framerate of 260 fps without VBOs on your machine, but it has to be wrong. Maybe a driver bug, not rendering all the vertices ?

Y.

By the way, I must point out that the proof the number is wrong is that a vertex is 20 bytes, you’ve got 1.7 millions vertices per frame, if you were indeed rendering at 260 fps via system memory, that’d be a bus bandwidth of 1.720260 = 8840 MB/sec, that’s a lot more than what even a PCI X16 bus can achieve.

Y.

use fraps

Tested on an ATI X1950 pro AGP:

11 fps w/o VBO
111 fps with VBO

With standard VA it looks like the AGP bus or my rather slow CPU (athlon xp 2700+) is the bottle neck.

My guess would be that there is probably something wrong with your drivers or system in general.

Why are you using fraps and not running the app in window mode and take the displayed fps into account?

well actually I am using windowed mode WITH the internal fps counter and the external (FRAPS) - take a look at these screenshots - btw, I have a core 2 Duo 1.8 GHz and 2 GB RAM.

take a look at these screenshots:

Without VBOs:

VBOs:

actually the framerate is jittering between 230-270 that’s why there is a difference between both framecounters - but it is definitely a extremely strange behaviour… :confused:

fraps is pretty reliable.
are you sure the vbo version doesn’t have vsync enabled?

vsync is disabled in both cases - I took both screenshots with completely the same configuration.

I checked my driver settings and disabled “maximize texture memory” and well - now I get ~400 fps without VBOs, with the VBOs I get ~100 fps.
Everything measured with fraps. I really have no idea what’s going on here…

just tested it on a quadro fx 3500, driver 2.0.2 NVIDIA 87.56, dual xeon@3.0 ghz in suse linux 10.0.

with vbo: ~120 fps
without vbo: ~40 fps

(according to the window’s titlebar)

hardly any difference in fps between 640x480 and 1280x1024 resolution.

by the way- it’s not a good idea to calculate fps like in that example:

if( (SDL_GetTicks() - g_dwLastFPS) >= 1000 )					// When A Second Has Passed...
     {		
     g_dwLastFPS = SDL_GetTicks();	// Update Our Time Variable
     g_nFPS = g_nFrames;	// Save The FPS
     g_nFrames = 0;	
 ... }

the time difference (SDL_GetTicks() - g_dwLastFPS) can for instance be 1500 (=1.5 sec). in that case the condition is true, and if 100 frames were drawn, fps will be set to 100, although it is only 100/1.5=67.

a better way would look like this:

if( g_nFrames == 100 )
     {	
     float dt = 0.001*(float) (SDL_GetTicks()-g_dwLastFPS);	
     g_dwLastFPS = SDL_GetTicks();
     g_nFPS = (int)(100.0/dt);
     g_nFrames = 0;	
 ... }

you are right - normally I am not using this for framecounting - I use a code similar to the one you proposed, but I also use fraps as a reference.

I assume it is a driver bug/feature (?=) - I also cannot reproduce it on any other geforce card. If there’s someone with a Quadro card, please try it with the same driver version!

I modified the test to render the default 32K mesh 100 times (3.3M triangles) at 1024x1024 windowed mode and added logic to switch between modes as opposed to recompiling with a switch. On Vista I see what I would expect, but on XP I’m seeing similar differences (i.e., VAs are faster than VBOs). I also added logic to test display lists.

QuadroFX 3450,

VISTA, Driver 160.03

VA: 4 fps
VBO: 12 fps
DL: 20 fps
NULL(1): 85 fps(2)

XP, Driver 91.36

VA: 26 fps
VBO: 13 fps
DL: 21 fps
NULL(1): 4950 fps

(1) Loop overhead, no draw
(2) VSYNC is non-functional on Vista Aero

VBOs and DLs behaved the same between Vista and XP with DLs being the clear winner for static data. There is a very odd anomaly with VAs on the Quadro cards on XP – and only an NVIDIA developer can answer that question.

I did notice that the NeHe test never calls glFlush() or glFinish() prior to swapping buffers. If I insert a glFinish() prior to calling SwapBuffers(), the VA frame rate is nearly identical to the DL frame rate and the NULL draw frame rate drops to 2750 fps.

Actually, if VBO’s using is quite accurate, there is no measurable difference between VBO and DL on modern drivers (expecially if you are using glDrawArrays(), not glDrawElements()).