One large texture or multiple small textures?

Hi,

I have recently seen it is bad to switch VBOs during rendering - it is much better to have one, larger VBO. Alright.
Is it equally bad to switch textures?

I use Texture Buffer Objects, so I can have a very large texture that I just bind once per frame.

Does calling glBindTexture many times per frame have the effect of noticeably reducing performance? I prefer asking the question here as I have to use a framework in my company that prevents me from using ‘raw’ OpenGL.

I know the answer is implementation dependent, I’m looking for some advice based on real-world experience here, eg. recent ATI or nVidia driver implementations.

Cheers,
Fred

It’s not so much the texture changes (which are fast on modern hardware) that cause problems but the fact that each such state change will break the current batch.

Modern (and not so modern) hardware likes you to submit data in large batches, with as many triangles as possible in each batch. On a more primitive level it’s like the difference between:

glBegin (GL_TRIANGLES);
for (int i = 0; i < 10000; i++)
{
   glVertex3f (0, 0, 0);
   glVertex3f (0, 1, 0);
   glVertex3f (1, 0, 0);
}
glEnd ();

And:

for (int i = 0; i < 10000; i++)
{
   glBegin (GL_TRIANGLES);
   glVertex3f (0, 0, 0);
   glVertex3f (0, 1, 0);
   glVertex3f (1, 0, 0);
   glEnd ();
}

So when you need to intersperse lots of state or texture changes among your drawing commands your batches become smaller and performance suffers as a result.

The moral of the story is that you can’t just consider the implications of each element in isolation, but instead you need to look at how they all interact with each other and whether a design decision in one place is going to have impact elsewhere.

I have recently seen it is bad to switch VBOs during rendering - it is much better to have one, larger VBO.

Define “bad”. There are levels of “bad.”

First, the general issue with vertex buffer objects is in changing the pointer bindings. IE: glVertexAttribPointer and its cousins.

Second, this will only be a significant problem if you do it a lot, and then only if it makes you CPU bottlenecked.

Does calling glBindTexture many times per frame have the effect of noticeably reducing performance?

Can it? Yes. Does it? Well, that depends on how often you do it.

In any of these cases, you need to benchmark your own application. You shouldn’t front-load optimizations like this. It’s way too easy to convince yourself that X is going to be a problem, then spend days optimizing X, only to find out that, in practice, it did nothing for performance.

Modern (and not so modern) hardware likes you to submit data in large batches, with as many triangles as possible in each batch.

Even this advice should be taken with a grain of salt and some benchmarking. Different APIs are more sensitive to batching, and are so for different reasons. D3D, for example, is far more sensitive to batching than OpenGL. In OpenGL, you can get away with making multiple draw calls without any state changes between them and have far less of an impact on overall performance.

Yeah that’s quite true. A few thousand DrawPrimitive calls will bring many systems to their knees, but the equivalent in glDrawArrays would have significantly less impact. It doesn’t mean that you should completely ignore it though, and I believe that the point that you should aim to maximize numbers of triangles (as opposed to minimize numbers of draw calls, which is really a completely different thing and not necessarily related) between state changes remains valid. Which, in fairness, you did make too.

Having large batches makes a huge difference in the framerate. The more work the graphics card does in one go, the better. Everytime you go back to the CPU you loose performance in my case. For this reason, I don’t expect BindTexture switches to do any good.

Having 100 DrawElements call each of which using 1000 vertices if much slower than 10 DrawElements using 10000 vertices.

Display Lists seem faster than VBOs, even when in the 10 DrawElements / 10000 vertices configuration. It seems the internal implementation of Display Lists creates just one very large VBO, resized on demand - which is why performance seems always optimal. My guess.