Benchmark results on VAs, VBOs & DLs

Hi!

I posted at this forum some while ago when I had some strange performance issues in my graphics engine.
http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/011373.html

I said I was going to benchmark the performance of VBOs, DLs and VAs with an increasing number of instances (calls to glDrawElements) to render.

Well, now I have done so…
The scene displays a fairly high number of instances of an object with 629 triangles. The object is a static mesh dynamically transformed (making it difficult to render several instances with a single call). The triangles are defined using GL_TRIANGLES and each vertex is 8 floats in size. I use a single interleaved VBO for each object.

The system is an Athlon 1.33 GHz with a GF3 ti 200 on 4x AGP.

The results:
Vertex Arrays:
instances fps M vertices/s
100 28 5.3
200 14 5.2
400 7 5.2
700 4 5.1
1000 3 5.1
1500 2 5.1
2000 1 5.1

Vertex Buffer Objects:
instances fps M vertices/s
100 105 20.5
200 55 20.7
400 28 21.0
700 16 21.1
1000 11 21.0
1500 7 21.0
2000 5 20.5

Display Lists:
instances fps M vertices/s
100 103 19.5
200 52 19.8
400 26 20.0
700 15 20.1
1000 10 20.0
1500 7 20.0
2000 5 20.0

The performance problems I once had must have been due to some issue with my implementation or possibly a driver issue as I have upgraded my drivers since then.

There was little information to gain from these results, but I post them anyway just in case someone is wondering about the overhead of calling VBOs or DLs.

Cheers!

[This message has been edited by neomind (edited 02-11-2004).]

Just for info’s sake, did you get these numbers w/ glDrawElements?

Just for info’s sake, did you get these numbers w/ glDrawElements?

Yes, I used glDrawElements. I have also tried using glDrawRangeElements with no noticeable effect on the performance.

[This message has been edited by neomind (edited 02-11-2004).]

I guess I should add that lighting was ON in the benchmark (as I think most will use lights in real-world applications). One directional light source. GL_LIGHT_MODEL_LOCAL_VIEWER was disabled.

shouldnt vertex/index arrays be faster than display lists?

shouldnt vertex/index arrays be faster than display lists?

I don’t see how they could be. DLs are stored in graphics memory, while vertex arrays are transferred from main memory for every use.

My description was a bit misleading though. Although I am using “a object”, I am actually using about a hundred different versions of it (the final game will use several hundred different versions of the object). As far as OpenGl is concerned, these are as different as they could be, there is no reuse. In short, the benchmark does not call glDrawElement more than once for each call to glVertexPointer. This is indeed somewhat unfair, as VAs could have done better had I bothered to sort data by VA before drawing it. For my own purposes, I have nothing to gain from sorting the data.

As for DLs, I am actually rather surprised that DLs are not the fastest way to draw static meshes. Given the restrictions that apply to DLs I think that they should be the most suitable format for driver-side optimizing. The driver should be able to always recompile DLs to the optimal internal format.

i just tried display lists vs arrays, arrays are better, sure you havent got your results the wrong way around.
Did you have textures on your polys?
I can get your sort of FPS only with no textures.

No textures. I am sending texture coordinates though. Vertices contain (position,normal,texcoords).

And I am sure that DLs are much faster than VAs on my engine.

ah… no textures

by display list you mean:
model = glGenLists(1);
glNewList( model, GL_COMPILE );

right? these tend to big worse on my geforce 3… dont know why tho

by display list you mean:
model = glGenLists(1);
glNewList( model, GL_COMPILE );

yes

Originally posted by supagu:
[b]ah… no textures

by display list you mean:
model = glGenLists(1);
glNewList( model, GL_COMPILE );

right? these tend to big worse on my geforce 3… dont know why tho[/b]
If you do that every time you render something, it’s no wonder that it’s slow. Display list compilation will never be faster than than the method you use to send geometry into the list (eg DrawElements) and it additionally incurs some overhead for the compilation process itself. What you should do is reuse the display list the next time you render the same mesh. This will only work for static meshes, of course.

GLuint display_list=0;

void
render_static_mesh()
{
if (display_list==0)
{
glGenLists(1,&display_list);
glVertexPointer(<…> );

glEnableClientState(<…> );
glNewList(display_list,GL_COMPILE);
glDrawElements(<…> );
glEndList();
}
glCallList(display_list);
}

You shouldn’t use display lists for dynamic meshes at all.

By comparisons sake: using GL_TRIANGLES + vbo + glInterleavedArrays, glDrawArrays

610,000 triangle model

GeforceFx 5600 Ultra
30 FPS, 18 M Tri/Sec

I guess vertex caching is really a big help. I wonder if it is worth it for me to try to reorganize the data for better performance and use glDrawElements.

Very nice to read that. The picture is somewhat similar to some benchmarking I did months ago.

Thank you for the time you spent, this kind of benchmarks are pretty useful to a lot of people (I just hope you did it correctly).

Thank you for the time you spent, this kind of benchmarks are pretty useful to a lot of people (I just hope you did it correctly).

Apart from what I’ve already said about the conditions not being completely fair to VAs, I think these results are correct (I did the benchmarking the way I found it the most useful), but of course, if you really want something done right you should always do it yourself.

Anyways, it was really no problem for me to do this benchmark as all of the required functionality to do it is already in my graphics engine. Only thing I had to do was setup a suitable scene with some predefined camera movements and read the log produced.
Nice to know someone found the numbers useful though.

You can check out some more info at http://www.fl-tw.com/opengl/GeomBench/

I found on older ATI drivers that display lists, in Homeworld2 at least, were actually slower than vertex arrays! On nvidia cards they were much quicker. I’m not sure if ATI has changed their display list implementation in their newer drivers or not.

Originally posted by maximian:
[b]By comparisons sake: using GL_TRIANGLES + vbo + glInterleavedArrays, glDrawArrays

610,000 triangle model

GeforceFx 5600 Ultra
30 FPS, 18 M Tri/Sec

I guess vertex caching is really a big help. I wonder if it is worth it for me to try to reorganize the data for better performance and use glDrawElements.[/b]

Try using triangle strips and make sure you are using arrays of floats if you are not using them already. The performance gain is around 50-60% compare to arrays of doubles.

I get ~32fps on my GFX5900 SE rendering
1,000,000 triangles

Thanks. Actually, I gave up on triangle strips. Normals do not look right, if I set them per face. Also, I need to highlight individual triangles, ie color them. This is not possible, as far as I know, with triangle strips.

But I get 16 M Tris with 1 Directional Light.
1/2 exactly of your FX5900. This makes sense since the 5600 has about half the horsepower.