performance of glBindTexture() vs transforms

I have a scene in which I have a lot of identical objects (let’s say a few thousand), each consisting of about 5 quads and 3 textures (about 256x256 each). Assuming that only a glTranslatef() is necessary to place them (no scale or rotates), is it faster to:

(a) draw each object one by one, calling multiple glBindTextures() per object, and one glPushMatrix(), glTranslate(), glPopMatrix().
(b) draw all the same textured segments at one go, cycling through all the objects 3 times (once for each texture), but only calling glBindTexture 3 times overall.

In a test on an NVIDIA GeForce2 Ultra 64mb, method (a) yielded 30fps while method (b) yielded 20fps. I suspect a large deal of the framerate drop in method (b) is due to the increase in the glPushMatrix() and glPopMatrix() calls, and that the performance advantage of calling glBindTexture() less is negated by the fact that the gfx card caches recent textures (?).

Are there other factors to be considered in cosidering the performance? I would also like to know whether these kind of numbers would be similar across different HW vendors or different due to driver-specific implementations.

tia

Interesting numbers you are getting, but first of all, why only 30 FPS? What exactly are you rendering? There could be alot of factors here like texture sizes, opengl states like lighting, immediate mode or otherwise, hardware and driver combo, …

As a general rule of thumb, push and pop should not impede performance significantly. push and pop have almost “instantanious” effect, while binding could cause much longer delays due loading TMU states. When you actually use the texture, then caching becomes an issue.
Still, I’m not sure what you are doing. push and pop the texture matrix or ?

V-man

Originally posted by V-man:
Interesting numbers you are getting, but first of all, why only 30 FPS? What exactly are you rendering? There could be alot of factors here like texture sizes, opengl states like lighting, immediate mode or otherwise, hardware and driver combo, …

I’m not rendering only the objects I mentioned, but the other objects should not be affecting the render since they are all rendered before the objects under testing, and do not occlude them.

Texture sizes are 256x256. The tested object’s geometry is stored in a display list, all objects sharing that same display list.


As a general rule of thumb, push and pop should not impede performance significantly. push and pop have almost “instantanious” effect, while binding could cause much longer delays due loading TMU states. When you actually use the texture, then caching becomes an issue.
Still, I’m not sure what you are doing. push and pop the texture matrix or ?
V-man

I’m pushing and popping the Modelview matrix, since each individual object requires a translate. I’m basically testing whether for a simple object with several different textured parts (assuming no multitexturing), it is more efficient to draw all instances’ parts individually or to completely draw one object at a time.

To give an example, say I have 1000 of these objects, which have textured parts A, B and C.

What are the factors involved in determining which of these 2 methods are faster:
1.
for (i=0; i<1000; i++) {
glpushmatrix()
gltranslatef()
bindtexture(a)
drawpart(a)
bindtexture(b)
drawpart(b)
bindtexture(c)
drawpart(c)
glpopmatrix()
}

bindtexture(a)
for (i=0; i<1000; i++) {
glpushmatrix()
gltranslatef()
drawpart(a)
glpopmatrix()
}

bindtexture(b)
for (i=0; i<1000; i++) {
glpushmatrix()
gltranslatef()
drawpart(b)
glpopmatrix()
}

bindtexture(c)
for (i=0; i<1000; i++) {
glpushmatrix()
gltranslatef()
drawpart(c)
glpopmatrix()
}

where drawpart() calls a display list which is shared by all objects. no scaling or rotation of the instances, only translation.

The difference in FPS is quite large if you ask me. You are using loop with i<1000?

It would be nice if you can make a version that doesn’t use gl to tranform. Just transform yourself and keep modelview as identity. You should get better numbers than (a).

Also, set your compiler option to maximize performance, but not size.

I am planning to put together a series of tests myself sometime when I’m free.

V-man

Even better, stitch your quads into tristrips with degenerate triangles to join them. This is really fast on Geforce hardware. Do you know the translations/rotations before hand? If so, pre transform them and store the transformed quads. Do as much “offline” as you can!

Also, are you using mipmapping? This can be a speed gain if your texel/pixel ratio is getting big (ie, the objects move far away).

[This message has been edited by fresh (edited 07-30-2002).]