Texture bind performance ?

We encountered a strange performance slowdown in our object renderer on NVIDIA hardware. I didn’t have mesh sorting by material id implemented yet; I knew that this could be issue but I thought it could take 1ms top. But the actual slowdown was way bigger. Later I found out that it was indeed caused by glBindMultiTextureEXT. So I ran a couple of tests, first using one big texture (2048x2048) for all meshes, then adding the sorting by material id.

Objects in the test scene had about 400k tris, ~250 meshes and 42 textures
all textures had the same format, diffuse DXT1, normal 3DC/ATI2 etc.
Times for my test scene (the final object pass):
||NV460
|—|
drv 306.97,
306.94, 310.33|AMD 6850
drv 12.10|
|without sorting|15.0|5.0|
|one texture|3.3|[RIGHT]4.0[/RIGHT]
|
|sort by material|6.8|[RIGHT]4.3[/RIGHT]
|

Those numbers are pretty bad for NVIDIA. But the weird thing is that we have no such issue in the terrain renderer, which uses unique textures for every terrain tile, making about ~200 meshes/textures per frame, but the performance there is great. There is no difference in the render states, object renderer follows immediately after the terrain renderer without any changes to the render states. I have tried a lot of tests - a simple shader, non-DSA functions for texture binding, render without terrain etc., without luck - every time I started to change the textures per mesh I hit the wall. (I used two textures for this test and the time was almost 15 ms).

There is a workaround described at Outerra: OpenGL Notes #2: Texture bind performance

Does anyone have an idea what can cause this large slowdown?

From what I remember of an Nvidia talk, changing the GL state will cause the GPU to be reprogrammed, and this introduces a bubble in the pipeline where it needs to wait for the previous work to finish before it can start on the new batch with the new state. In this case you’re varying the number of times that you switch the texture state, so your results seem consistent with what they were saying.

Texture atlases or texture arrays are another way around this problem, if you run out of texture bindings.

2048 is not really a ‘big’ texture size on modern hardware. My guess is the texture formats you are using that are potentially causing the slow downs.

[QUOTE=malexander;1244694]From what I remember of an Nvidia talk, changing the GL state will cause the GPU to be reprogrammed, and this introduces a bubble in the pipeline where it needs to wait for the previous work to finish before it can start on the new batch with the new state. In this case you’re varying the number of times that you switch the texture state, so your results seem consistent with what they were saying.

Texture atlases or texture arrays are another way around this problem, if you run out of texture bindings.[/QUOTE]

Thanks for you answer but it doesn’t explain why this slowdown happens only in our object renderer. Terrain is much worse case and it runs with the same speed with one 2048 texture. I should try to isolate this into simple app sometime…but my workaround seems to be faster than simple material ordering, i’m just curious…

It doesn’t matter how big is the texture once it has mipmaps. I tested various formats compressed and uncompressed and the result was the same…

if you are short on video memory you might find that textures will swap in and out of main mem

I have a lot of free memory and the workaround works in full speed with the same set of textures…

How big are your terrain meshes? It seems like your object meshes are ~1600 tris on average, given the stats you provided. Are the terrain meshes larger? Perhaps the terrain meshes keep the GPU busy enough to minimize the impact of changing textures. Are you accessing all of the textures in the fragment shader, or does the vertex shader access some?

I have tried to render objects without terrain, i have tried a simple shader with one texture channel too, all such test with the same result…