More speed tests

After finding all the surprising results from my first speed test program which I have explained earlier under the topic “speed with different geometry types” I decided to make some more tests because it seems to me that a lot of what poeple say to use or do or not to do is rubbish-on my computer with my programs anyway. My system is a P3 600,128MB, TNT2m64 32MB running windoze98 with the latest drivers from NV.

After finding the strange and disapointing results with display lists, triangles strips and glArrayElements I decided to use glDrawElements , using GL_QUADS (what I found to be the fastest) it was about 15% faster than non display lists but still slower than the display list version but about 15% again at about 70FPS. Then I tried glDrawElements using tri-strips and this was a disaster, it ran at about 1/5 of the speed of the non array version at 20 FPS. So arrays weren’t much use.
So I looked at other ways of speeding up the rendering process. I decided to see how slow binding to a new texture was:
1 bind = 217 FPS
100 binds to the same texture = 217FPS
10,000 bind to the same texture = 217FPS
100 binds to different textures = 217FPS
10,000 binds to 10,000 different textures = 212FPS
The last one took about 5 minutes to load and the hard drive was chugging away making it look like it was using virtual memory. The acual texture used was the smae though but loaded up into an array:
for (n = 0; n < 10000; n++) LoadTexture($texture[n], “thetexture.TGA”);
the texture was 25615624 bit!

This showed that bindings don’t make it that much slower as long as the textures reside in the gfx card memory anyway, the last result is debatable but I ran it twice and those are the results.

Next I built another speed tester to test different geometry: when filling/texturing all primitives were the same speed, without filling the shapes both quads and polygons had an FPS of 78, and triangles, tri-strips, and tri fans had an FPS of 29.% aproximately each.

Then I tried clearing the buffers (glClear() :wink: and the results are very dependent of the amount of drawing that has been done between clearings rangeing from 3.75 to 3375 mS during my test clearing both color and depth buffers. If I missed out clearing the color buffer speeds were only a little faster(5%).

Then I decided to do some maths tests and comapare floats and doubles, mults and divs etc. :
floats: 10,000 sqrt(100) = 1.34 ms
double: 10,000 sqrt(100) = 1.34 ms
double : 1 million sqrt(100) = 13.2 ms

float: 1 million sin(100) = 230 ms
float: 1 million cos(100) = 240 ms
( I later found out that the speed of the sin/cos is dependent on the angle provided)
float 1 million tan(100) = 319.2 ms

float: 1 million 100 /10 = 10.0 ms
float: 1 million 100 *10 = 10.0 ms
Double: 1 million 100 /10 = 10.0 ms
Double: 1 million 100 *10 = 10.0 ms

float: 1,000,000,000 100 * 10.01 = 10111.2 ms
Double 1,000,000,000 100 * 10.01 = 10069.1 ms

This shows that dopubles are slightly slower, this last test was repeated several times.

The I wanted to test the speed of drawing 1 gl_polygon of 5 vertices and doing the same but using a triangle fan.
Untextured and unfilled:
polygon = 3.83 FPS
tri-fan = 1.248 FPS
Textured and filled:
polygon = 0.255 FPS
tri-fan = 0.246 FPS
This shows than the polygon is fatser, maybe becuase of the greater number of vertices needed for the tri-fan (2 more)- maybe it will be faster to use a tri fan with a 20 vertex poly. Also it shows the main bottle neck is the fillrate.

Now I am completely stuck for ways of optermising my rendering.

[This message has been edited by Tim Stirling (edited 05-05-2001).]

“Premature optimization is the root of all evil” (Donald Knuth).

One question, Should I bother doing some of these ‘speed’ optermisations even although they will run slower on my computer although they should be faster on other peoples computers? I suppose the proof of the pudding is in the tasting so I should make these optermisations anyway and find out.

don’t worry about speed, it is the last thing to concern yourself with. going for performance is a beginners mistake, which i + most other ppl have made. my advice is to get the thing running (not fast) , get it stable + looking good. once you’re at that stage u will have learnt a lot more than now + will have a greater understanding of increasing the apps performance.

Interesting reply, I thought that if I made an ultra stable and fast basic rendering code, then I could add some eye candy. I will make my octree first then ,add collision detection (which I have done before and extended it on pen and paper) and then look back at optermisation. I guess it doesn’t need to be as fast as Q3 just yet, getting something working and nearly complete so I can show it off is probably better.

what impresses the ppl a nice screenshot or that the app runs at 30fps instead of 20fps?

The question is how you’re doing your test. The main drawback of state changing is that it can be necessary to finish the current drawing process before the state change is made. A drawback of changing textures is that the texture cache doesn’t contain anything from the new texture.

So the speed you’ll get depends a lot on what you’re doing, and not just on the fact that a texture change happened, as it’s not the binding itself that is slow, but what happens around it. Which is why I won’t take these results at face value. I do believe that the penalty of state changes is exaggerated in most discussions, but I still think that this test may not be the best.

Regarding float vs. double, I believe that for straight calculations current CPUs can calculate them at the same speed. However, there are still the following differences:

‘double’ takes twice as much memory as ‘float’, which also means it takes more time to read and write it to memory.

GPU T&L units use floats, which means that doubles should be converted to them, and conversion costs time. Also for accelerated vertex arrays you must use floats, for this reason.

SSE and 3DNow! accelerate floats. So even if you don’t have a GPU, but the ICD uses SSE or 3DNow!, you’ll get the conversion cost. And if you have a compiler that can optimize for SSE or 3DNow!, it will only be able to do this for floats. (SSE2 does support doubles, but it’ll be less efficient for them than for floats.)

Regarding the “untextured and unfilled” test, I didn’t understand what you meant by this. What was rendered? I think that your conclusion that fill rate is what slowed you down may not be correct. It may be the amount of data transferred to the card, or even the number of calls you made for drawing.

Anyway, I mostly agree with zed. The main reason I agree with him is that you’re doing artificial tests, which may or may not be valid for what you’re really trying to do. Even if these benchmarks are valid, there are still two main reasons you shouldn’t care that much about them:

  1. This is the major reason: the speed gained by technical optimizations usually pales in comparison to speed gains by using the right algorithms and data structures. Yes, you may gain 50% or more speed by using better rendering code, but you’ll gain several times more speed by culling things, using LODs, and so on. So while it’s nice to know that using strips for some things may be faster, you’d better concentrate on other things first.

  2. The data you use and the way you render it will help determine what you should render. If you get data from a 3D modeller, then you might have just triangles, and will find them hard to stripify. So it doesn’t matter how strips or quads or polygon perform. If you decide to use vertex shaders, then all the tests you made without them might not be as meaningful. For dynamic models, you won’t be able to use display lists - even if you just change the texture coords for them (as in the case of dynamic lighting using textures). Here the eye candy determines what you can and cannot use.

And as zed said, while fast is nice, this isn’t usually the first thing that people are looking for. Good looking games get much more attention and usually better reviews than bad looking ones (even if the latter run faster). If the game gives you just 10FPS on a TNT2 M64, the gamer can still upgrade to a GeForce and get playable speed. If the game looks bad, there’s nothing the gamer can do. Of course, if the game gives you 10FPS on a GeForce3, then you’ve got a problem, but technical optimizations aren’t that likely to help you then.

Thanks a lot for the advice!