I have a scene in which I have a lot of identical objects (let’s say a few thousand), each consisting of about 5 quads and 3 textures (about 256x256 each). Assuming that only a glTranslatef() is necessary to place them (no scale or rotates), is it faster to:
(a) draw each object one by one, calling multiple glBindTextures() per object, and one glPushMatrix(), glTranslate(), glPopMatrix().
(b) draw all the same textured segments at one go, cycling through all the objects 3 times (once for each texture), but only calling glBindTexture 3 times overall.
In a test on an NVIDIA GeForce2 Ultra 64mb, method (a) yielded 30fps while method (b) yielded 20fps. I suspect a large deal of the framerate drop in method (b) is due to the increase in the glPushMatrix() and glPopMatrix() calls, and that the performance advantage of calling glBindTexture() less is negated by the fact that the gfx card caches recent textures (?).
Are there other factors to be considered in cosidering the performance? I would also like to know whether these kind of numbers would be similar across different HW vendors or different due to driver-specific implementations.
tia