Performance issues texbind / DisplayList / VBO sort

Hello,
I hope this isnt too basic for advanced forum but here we go

currently I try to optimize our 3d engine and created a list of what gets drawn. This list is sorted by texture binds. The sorting is only done once so its not the performance issue…
there is no “visibility” check so all data gets sent to GL

however I get different types of results
the test is 3 meshes each with a single texture at a total of 1500 polys and all of them are drawn 400 times in a big grid.

on the ati radeon mobility 9600 it makes no difference at all if they were sorted for textures or not, nor if I use VBO or display lists for them

on the nvidia geforce 4 4200 there is no difference if I sort or not either, but if I use display lists or VBOs. Like when I use VBO I get a constant amount of FPS no matter how many objects I see in the screen. If I use display lists it gets faster the less is drawn on screen, and even faster if the whole model is in a single list with 3 tex binds, instead of 3 lists for each tex bind.

so for ATI:
no matter if sorted or not, nor if VBO or DList
constant fps

Nvidia:
no matter if sorted, but perfomance is:
3 lists (external tex bind)
VBO (external tex bind)
1 list (containing the other 3 and tex binds) fastest
and when the lists are used its dependant on how much is seen.

So I wonder is sorting for textures actually a performance gain and did nvidia optimize their display lists so much that on their cards their is nothing better to use for static models?
or is it that both cards are too strong on performance so that changing textures like 1200 times is no problem for them

I dont run on latest drivers for the nvidia so their VBO loss might be gone now, but the engine should run on older specs as well so…

ah found a thread about lists on ATI and Nvidia so that explains some stuff

but in general is sorting for textures a “should do” or less needed ?
the game will not be very poly heavy and catered for Geforce1 +

The biggest reason for sorting by texture is if there is not enough memory for all the textures then switching between them is very expensive since the driver will probably have to load the texture from system memory back into agp or local memory.

If that isn’t a concern for you (don’t forget about lower memory cards or people forcing multisample if those things your target audience might have/do) then as you found out binding a texture isn’t that expensive. The fewer overall state changes you have the better for performance, but if you’re happy with the performance you’re seeing don’t worry about it.

Ideally, you want to reverse the sort order between each frame. That way, whatever is resident at the end of the last frame, will be used first at the beginning of the next frame.

If you don’t do this, once you hit the cliff, you page ALL the textures EVERY frame. That hurts. Lots.

thanks,
I already suspected the modern cards I have to be too fast for simple optimizations like that need to get a more low-end system somehow, no NVRiva128 emulate around eh ? :wink: though that might be even a bit too low-end

thx for the tip to flip the list order as well

I found another issue

when I move the colorarray clientstate out of the list generation, I can override the colorarrays with normal glColor before the list is called (nice !)

but I tried the same with the texcoordarray, and it didnt work. basically I wanted to be able to use texgen or texcoord array and decide which, outside of the list.

so to see textured displaylists I needed to include the enableclient texcoord in the list generation.

Originally posted by CrazyButcher:
[b]I found another issue

when I move the colorarray clientstate out of the list generation, I can override the colorarrays with normal glColor before the list is called (nice !)

but I tried the same with the texcoordarray, and it didnt work. basically I wanted to be able to use texgen or texcoord array and decide which, outside of the list.

so to see textured displaylists I needed to include the enableclient texcoord in the list generation.[/b]
The deal is, basically, that client state (array pointers and enables) is never compiled into a display list. Only the geometry you submit while compiling the list is compiled into it, no matter how you send it (arrays, immediate mode). Makes sense?

This also means that already compiled display lists do not react to later changes in client state. You can’t disable the texcoords saved in a display list with glDisableClientState before calling the display list.

The “current” attributes (current color, current texcoords etc) are undefined after being submitted through the array mechanism. Ie if you enable a color array, call glDrawElements, then disable the color array, the current color is something, not necessarily the last element of the color array. You have to fix that by calling glColor* (as you’ve seen).

I’m unsure atm about the interaction between explicit texcoords (which you can’t turn off, once they are compiled into a display list) and texgen. One should override the other, but I’m at a loss right now as to which one that’ll be.