We encountered a strange performance slowdown in our object renderer on NVIDIA hardware. I didn't have mesh sorting by material id implemented yet; I knew that this could be issue but I thought it could take 1ms top. But the actual slowdown was way bigger. Later I found out that it was indeed caused by glBindMultiTextureEXT. So I ran a couple of tests, first using one big texture (2048x2048) for all meshes, then adding the sorting by material id.

Objects in the test scene had about 400k tris, ~250 meshes and 42 textures
all textures had the same format, diffuse DXT1, normal 3DC/ATI2 etc.
Times for my test scene (the final object pass):
NV460
drv 306.97,
306.94, 310.33
AMD 6850
drv 12.10
without sorting 15.0 5.0
one texture 3.3
4.0
sort by material 6.8
4.3

Those numbers are pretty bad for NVIDIA. But the weird thing is that we have no such issue in the terrain renderer, which uses unique textures for every terrain tile, making about ~200 meshes/textures per frame, but the performance there is great. There is no difference in the render states, object renderer follows immediately after the terrain renderer without any changes to the render states. I have tried a lot of tests - a simple shader, non-DSA functions for texture binding, render without terrain etc., without luck - every time I started to change the textures per mesh I hit the wall. (I used two textures for this test and the time was almost 15 ms).

There is a workaround described at http://outerra.blogspot.sk/2012/11/o...rformance.html

Does anyone have an idea what can cause this large slowdown?