VBO slower than immediate mode when Lighting is ON

I’ve encountered what looks to me like a performance oddity: with vertex lighting ON, immediate mode rendering is slightly faster than VBO or BuildList, with lighting OFF, the situation reverses.

By “immediate” I mean specifying triangles with glBegin/glNormal/glVertex/glEnd, while “BuildList” use a build list made from the “immediate”, and “VBO” uses straight indexed vertex arrays and VBO buffers.

Here are the figures I get (GF3, Det 45.23, AXP 1800+) for 28000 triangles in a tristrip (all visible, none gets culled), only one omni light is in the scene:

Lighting ON:

  • immediate mode : 220 FPS
  • buildlist/VBO : 200 FPS

Lighting OFF: (aka glDisable(GL_LIGHTING))

  • immediate mode : 340 FPS
  • buildlist : 520 FPS
  • VBO : 510 FPS

If the triangle rate with lighting OFF looks not too bad, with lighting ON, the VBO/BuildList performance is somewhat depressing… Any idea why VBO would perform slower than immediate mode calls when the only difference is lighting being ON?

Try on a ATI board or with Detonator 44.03.

How are you creating your VBOs? How are you managing them? How do you do your rendering?

Show us some code, or at least, give us more details than fps from some unknown program.

The VBO code is the vanilla one, initialized with

glGenBuffersARB(1, @vboVerticesBuffer);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboVerticesBuffer);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, vertices.Count*SizeOf(TAffineVector), vertices.List, GL_STATIC_DRAW_ARB);

glGenBuffersARB(1, @vboNormalsBuffer);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboNormalsBuffer);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, normals.Count*SizeOf(TAffineVector), normals.List, GL_STATIC_DRAW_ARB);

glGenBuffersARB(1, @vboIndicesBuffer);
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, vboIndicesBuffer);
glBufferDataARB(GL_ELEMENT_ARRAY_BUFFER_ARB, indices.Count*SizeOf(Integer), indices.List, GL_STATIC_DRAW_ARB);

and executed with

glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboVerticesBuffer);
glVertexPointer(3, GL_FLOAT, 0, nil);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboNormalsBuffer);
glNormalPointer(GL_FLOAT, 0, nil);
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, vboIndicesBuffer);

glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);

glDrawElements(GL_TRIANGLE_STRIP, indices.Count, GL_UNSIGNED_INT, nil);

and finally the immediate mode code is

glBegin(GL_TRIANGLE_STRIP);
for i:=0 to indices.Count-1 do begin
k:=indices[i];
glNormal3fv(@normals[k]);
glVertex3fv(@vertices[k]);
end;
glEnd;

Note: “classic” vertex arrays and VBO have the same exact performance as soon as there is a light ON, i.e. slower than immediate (despite the fact that “immediate” makes thousandths of calls).

Is that Visual Basic?
If it is, then maybe there’s something going wrong because of the way the pointer parameter is abused in VBO…? Just a thought…don’t know jack about visual basic, but I gather it doesn’t use pointers, so maybe it’s interface with a c dll gets messed up in this extension.

>Is that Visual Basic?

That is Delphi code, interfaces OpenGL in exactly the same fashion as your C code, with pointers etc. - though from experience I guess my ‘for’ loop is compiled more efficiently than your C equivalent

Btw, on the “classic” vertex array performance, I’ve an addendum: as long as no VBO call of any kind has been made, performance is similar to “immediate” (and even slightly faster). Once VBOs have been used, the performance of “classic” vertex arrays matches that of the VBOs when lighting is ON (ie. slower, even though the VBOs have been disposed of).
With lighting OFF, classic vertex arrays are faster than immediate, but not as fast as VBO (as can be expected).

jesus, why mess with perfection.

Could it be your lighting?
Nvidia wants you to NOT use 2 sided lighting, otherwise you hit a software path.
I think that`s it and the rest should be entirely hw accelerated.

The next suspect is the driver. I think the newer ones are better tuned for the FX cards at the cost of older ones.

This is just a guess, but perhaps nVidia’s hardware doesn’t line integer indices? Try using shorts and see what happens.

TwoSidedLighting isn’t used, and faceculling on/off has no impact of performance (as expected, all triangles being visible).

Shorts gave me no performance delta, same exact performance figures
(the actual meshes can end up with more than 64k vertices in a chunk, so short indices wouldn’t have been a convenient solution anyway)

I’ve made another test (not willingly at first, but results were interesting): I fired a proggy that ate 100% of CPU time (a math calculations thing), then started the bench. Immediate mode performance dropped to about 170 FPS each time, while VBOs went down to 50 FPS… meaning that VBOs are tranformed/lit on the CPU side???

Well, gotta wait for next driver release and hope for an improvement…