View Full Version : OpenGL geometry benchmark released..

09-23-2002, 06:36 AM

This is an OpenGL pure geometry ( T&L ) benchmark. Binaries and source code are available.. it should answer quite a few questions, like: does it make a difference if a display list is compiled in immediate mode versus vertex arrays ? What kind of difference to expect from video / AGP VAR / VAO ? Are compiled vertex arrays any usefull today ? Are short indices faster than long indices when using vertex arrays..? Now, you can look at the results yourself.. http://www.opengl.org/discussion_boards/ubb/smile.gif


09-23-2002, 07:09 AM
Are you sure its correct? I get similar values for the immediate\cva pairs. I would have thought cva would be quicker in all cases?

09-23-2002, 07:24 AM
Hum, I've got large better result with my engine and the nvidia vertex array range demo. With this benchmark, I'm about 6 MPoly/s and 18 MPoly/s with nvidia's demo.


09-23-2002, 07:31 AM
Yes, i'm pretty sure it's correct. I get a (small) speed improvement with low tesselation on my Radeon 8500, which means it works. See the recent VAO thread and you'll even notice that the Radeon 9700 is a lot slower with CVAs.. ah well. I suppose you've got a NVidia card?


09-23-2002, 07:59 AM
I got 21 million Tris per sec max. That's a lot more then I get typically but seems like a reasonable (pretty high even) max since I have a GeForce 2MX. That was with stripped geometry in video mem. You'll probably need to run the tests for longer periods of time to get consistant results. I was surprised at how fast display lists were btw. Something else that seems strange is that integer indices are faster than shorts on most of the tests.

Roderic (Ingenu)
09-23-2002, 08:12 AM
768k is very slow, is it one array ?

I hope it's not the case, but can't really see why it would be slow if you don't make arrays bigger than the max implementation size.

65535 on my RadeOn 5800.

09-23-2002, 08:59 AM
Ok, i had a first look at the log. Arath, i suppose you're the one with an Athlon 1800 + GF3 ? If that's right, i noticed that in the test you did and got 6 MTS with VAR, you benchmarked the 44k tris scene. I don't think it's enough.. you should try again with the 768k tesselation. Other people's configs are showing what's expecting, so i don't think it's a bug in the code..

Harsman: longer tests, that's an idea.. except if you want to try everything.. it's already 10 mins with all the options now :)

Ingenu: no, in the 768k case it's a 64k vertex array which is shared by all the spheres. It's slow because.. well.. 768k is a lot of triangles :) You won't see anything smooth unless you run at > 20 MTris/sec.


09-23-2002, 09:57 AM
Ysaneya, you're right, my apologize to you, I did the benchmark and I've got good perf.
By the way, some times when I run the program, I've got no test, and no result, just back to windows, it happens when I do only VAR test (in VRAM), but it works sometime, may be the memory allocation doesn't work?


09-23-2002, 10:52 AM
Not sure. Since it tests all the combinations of options you've checked, you need to have at least one option checked per group, and at least one transfer method. If that's what you did, it's a bug, i'll look into it (if you can tell me exactly what are all the options you've got checked when it happens..). Thanks :)

Btw, i have added the first logs to the site (beware: it's already big..)


09-23-2002, 11:03 AM
The programm crashes right after the last test. Even if i select only some options so that only one test is performed (different types) it crashes after that test.
No question for a connection or something else.

pIII 933 256 mB WinXP GeForce2Go 16 MB Det 40.41


09-23-2002, 11:24 AM
Ok thanks, i'll try yo debug it tomorrow. I've been working on this program for 2 days now, and i'm starting to get a bad headache :)


09-23-2002, 06:56 PM
You should use the swap_interval extension to override the users vsync selection in drivers. Anyway, all my results seem to make sense so far. Nice program.

EDIT: (deleted a couple of statements about the driver settings for vsync which I realized were incorrect once I though about them)

[This message has been edited by Nakoruru (edited 09-23-2002).]

09-23-2002, 08:12 PM
Nice and useful app! Good work!

I tried some times ago to understand in the same way how the geometry (T&L) performance can change when varying the various transfer, format vertex parameters.

I did a post on this subject but got no answer (http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/007347.html ) Probably i should have added something to download and test :-)

Just to add my 5 cent. to the discussion, i noticed some things that probably could be useful:

1) when speaking about rendering high triangle-count scenes, the fact that we are rendering a real big scene and not rendering ten times the same array, affects a lot the performance. (and not only when your scene does not fit al in agp or video mem http://www.opengl.org/discussion_boards/ubb/smile.gif

2) The size of the vertex arrays blocks matters (best is from 1k to 7k vertices).

3) the length of the strips little changes the speed. (its important above 30M/sec).

4) i could degrade performance to 30M/sec by avoiding the vertex caching. (100vertex/turn)

3) if i reorder vertices in the block and performance drops considerably, depending on how much (and expecially how locally) i permutate.



09-23-2002, 10:58 PM
Nakoruru: that makes sense, i'll disable vsync for the tests.

Funes: didn't know your 5th cent was worthing your 3th :) Anyway, you've got excellent ideas, but i'm hesitating to add too many options. For every option i add, it multiplies by 2 the full-test time (since it tries all the possible combinations).. by implementing everything, the full test will take many hours! Ideally i should try to restrict the tests to the important combinations, but how to make sure no important one is missed ? Any ideas are welcome..


09-23-2002, 11:55 PM
When I implemented this kind of benchmark a while ago, I made an 'interactive' program. It was a dialog box where the user could switch options on and off on the fly. For example, you had a slider for tesselation level, radio buttons to choose between immediate / VA / CVA ... This way the user can test only things that interest him and get the results immediately (by displaying FPS and MTris/sec info). Problem is it's not well suited to generate nice reports like your program does.

Another feature for your wishlist : it would be nice to have ATI_map_object_buffer support (this extension has been around for months but specs are still not publicly available. Maybe some ATI registered developers out there could send you these specs (or maybe this extension could be documented ^^)).

09-23-2002, 11:57 PM
Ysaneya: i know, the number of test is exponential, but in my humble opinion, the main problem is not running the tests (the night is long... :-) ), but browsing the whole mess of results. I think that just looking at the fastest combination is not very useful. What i would like to know is what is the most discriminating factors when rendering. In other words, let assume that you have 2^n timings each one corresponding to the enabling/disabling of one of <n> different "things" (strip, cva, short/long indexes, chache aware primitives, video/agp... etc) the most discriminating factor should be the one that minimizes the ratio between the sum of timings with that feature enabled and the sum of all timings with that feature disabled.
Probably this should be a good hint for everyone on what path you should choose when starting to optimize your code...


09-24-2002, 12:38 AM
Originally posted by kehziah:
Another feature for your wishlist : it would be nice to have ATI_map_object_buffer support (this extension has been around for months but specs are still not publicly available. Maybe some ATI registered developers out there could send you these specs (or maybe this extension could be documented ^^)).

If you look in ATi's glATi.h file it's pretty much obviuos what the extension does and how to use it. It exports these functions:
void *glMapObjectBufferATI(GLuint buffer);
void glUnmapObjectBufferATI(GLuint buffer);

It's like Lock()/Unlock() on D3D vertex buffers.

09-24-2002, 12:42 AM
Usually not all combinations are interesting. If I have found out from testing that display lists are faster than immediate mode I might not care if immediate mode is faster with strips, or with short indices etc. Instead of ranking all possible combinations you might rank each group separately and then combine the results. I.e Video mem VAR is fastest, shorts are faster than longs, strips are faster than lists. Then you can guess what the fastest mode is and do more exhaustive testing on that.

09-24-2002, 04:22 AM
Nice app, thanks.

I would like to see options to benchmark with texturing and also only using 1 light not 3. I know this could make for a lot more combinations but I think most apps have texture mapping and either 0 or 1 gl light.

09-24-2002, 04:34 AM
i agree, great app.

Display Lists actually just pip Video VAR on my machine (although the difference is tiny...).. both about 22MTris (GF3, athlon 1800+)

09-24-2002, 06:33 AM
I agree with Adrian, a texture option (single and dual texture) should be nice, and one light option too


09-24-2002, 07:53 AM
If you do add texture, make sure it is a tiny one, because you do not want memory bandwidth to interfere with the benchmark.

Remember, there is no such thing as the number two. It makes no sense to have 3 lights. In programming 0, 1, and Infinity are the only numbers ^_^ If you have 3 then you should allow selection of up to as many lights as the implementation supports.

Same with textures coordinates, support up to the implementation maximum for sending texture coordinates. But, again, keep the actual textures small because this app measures the cost of sending those coordinates, not the cost of actual texturing.

Also, why not add support for texgen as well? Instead of flat texture coordinates it might be cool to test the speed of different texgen modes. Hmm, although you may consider that a little less focused. The current app seems to only be concerned with transfering geometry, not what happens when it gets there. The choice of 0 or 3 lights is the only exception.

Eventually it would be cool if you could load an arbitrary vertex program and pass it to the program to benchmark.

09-24-2002, 10:05 AM
I think you've all got good points, thank you ! :)

ATI_map_object_buffer is pointless, since it's used to dynamically update a video vertex array. I fill the array during its creation, not when rendering with it, so it's not interesting. Maybe in the future, if i add support for a rendering backend that streams data from system memory..

I'll see if it's not possible to group similar results in a same category. It's true that browsing the results is a real pain now..

I will also probably add support for 1 light and texture coordinates, since it's an important information. Texgen modes too, quite probably. Also on my plan, rendering 768k different tris and not the same array many times, plus an index randomizer to test the cache performance. Give me a few days :)


09-24-2002, 11:13 AM
i got a few crashes as well i think in the displaylists + VAR, i havent looked at the code but perhasps youre asking for too much memory.
texturing is nice (but of course u have to let the user decide what textures http://www.opengl.org/discussion_boards/ubb/smile.gif)
also tex env mode, colors, blending blah blah (btw lighting lighting ranks way down the list)

FWIW i tried to start something similar about a year ago (though much larger/exaustive eg like glspec) but noone was interested

09-24-2002, 10:33 PM
Originally posted by zed:
FWIW i tried to start something similar about a year ago (though much larger/exaustive eg like glspec) but noone was interested

Well, maybe today it's a little bit more important bcoz there are a few T&L boards available. Nvidia,ATI,Matrox,Sis and Trident are all back!
Thus 10 years after we're coming back in the era of dedicated co-processors! (muahaha it's so funny) BUT.... i really think that some implementations have missed the point! But anyway, what i think is *not* important then a good bench app is much much more relevant here! http://www.opengl.org/discussion_boards/ubb/smile.gif

[This message has been edited by Ozzy (edited 09-25-2002).]

09-24-2002, 11:20 PM
I don't think i'm asking for too much memory, but i admit i'm not doing a lot of error handling, and everything is uninitialized/reinitialized between each test (display lists are compiled, VAR is reallocated, etc.. ). Maybe some drivers don't like that..


09-24-2002, 11:53 PM
It crash on my SGI !!!!
Is your coding really clean ?


09-25-2002, 05:13 AM

Letting the user decide textures and blending have nothing to do with benchmarking triangles per second. If the program wanted to benchmark fillrate and/or memory bandwidth it would want to render only a few hundred really big overlapping triangles, and it would not even matter if it used immediate mode or VAR.

It would be neat to have such a benchmark. I can imagine it allowing you to test all the fancy Z buffer optimizations by allowing you to render front to back and back to front. Memory bandwidth would be tested with blending and big-textures. It would be a much more complicated benchmark because there are many more options that effect fill-rate and memory bandwidth use.

If you combined geometry throughput and fill-rate/memory-bandwidth into a single test however, you will end up with a very difficult to analyse dataset.

I bet that your choice of texture (big or small) would have absolutely no effect on the speed of this benchmark, because it is the texture coordinate being in the datastream that will slow it down, not the memory bandwidth usage.

09-25-2002, 11:39 AM
yes Nakoruru i was talking about a more general app (like glspec)
about the textures
i posted this a couple of months ago http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/006595.html