Appalling performance on ATI hardware

Hey everyone,

I recently turned in my university project in programming, and since it’s unfinished but working on it was heaps and loads of fun, I’ll be finishing it in my free time.

Thing is, while the game runs great on NVIDIA GPUs, it struggles to get a two-digit framerate value on ATI hardware - values in the range 1-5 are quite common; it’s a tiny bit better with the latest Radeon generation, but certainly the complexity of the graphics does not justify the poor performance.

Unfortunately, I don’t have access to an ATI-equipped system for a period of time long enough or with the tools to properly debug the game. I was hoping maybe someone could point me in the direction of a solution?

I suspect the problem lies in the performance of glDrawElements - the game uses the GLSL pseudo-instancing technique (as described in the NVIDIA paper here) and may be making dozens of thousands of glDrawElements calls per frame. While NVIDIA explicitly states that this function is very efficient on their hardware, there might be something in ATI’s implementation (VBO locking or something, I don’t know) that causes a dramatic slowdown. Any hints?

Of course I’m rendering geometry out of a STATIC_DRAW VBO, with a very simple programmable pipeline. The problem also existed, however, when I was using the fixed pipeline and the matrix stack, and the switch to instancing greatly improved performance on NVIDIA hardware.

The game can be downloaded here:
http://www.sourceforge.net/projects/ac130

Thanks in advance for any help.

hi, I’ve used the same technique on ATI hardware, in my case a good old radeon 9600 just fine. So the technique itself shouldn’t be the issue with ATI, or at least was not back then.

Your game run fine on my system (dual core AMD ATIHD4670)
The framerate are off the chart too.


210 FPS, 12921 tris/12526 verts, 22/28 terrain patches culled (per frame)
217 FPS, 12921 tris/12526 verts, 22/28 terrain patches culled (per frame)
216 FPS, 12921 tris/12526 verts, 22/28 terrain patches culled (per frame)
216 FPS, 12545 tris/12216 verts, 22/28 terrain patches culled (per frame)
225 FPS, 10934 tris/11314 verts, 22/26 terrain patches culled (per frame)
201 FPS, 13319 tris/13243 verts, 22/28 terrain patches culled (per frame)
218 FPS, 12383 tris/12433 verts, 21/26 terrain patches culled (per frame)
200 FPS, 14599 tris/14377 verts, 23/29 terrain patches culled (per frame)
135 FPS, 25542 tris/24951 verts, 24/36 terrain patches culled (per frame)
84 FPS, 43496 tris/42864 verts, 26/46 terrain patches culled (per frame)
66 FPS, 49615 tris/51512 verts, 16/35 terrain patches culled (per frame)
72 FPS, 45178 tris/47117 verts, 11/28 terrain patches culled (per frame)
74 FPS, 43126 tris/45699 verts, 13/28 terrain patches culled (per frame)
125 FPS, 29648 tris/28471 verts, 19/33 terrain patches culled (per frame)
170 FPS, 23743 tris/21281 verts, 23/37 terrain patches culled (per frame)
243 FPS, 11404 tris/10254 verts, 16/22 terrain patches culled (per frame)
274 FPS, 10991 tris/9394 verts, 10/18 terrain patches culled (per frame)
259 FPS, 11707 tris/10418 verts, 19/27 terrain patches culled (per frame)
214 FPS, 15998 tris/14776 verts, 31/39 terrain patches culled (per frame)
208 FPS, 16722 tris/15457 verts, 31/40 terrain patches culled (per frame)
232 FPS, 14006 tris/12675 verts, 32/40 terrain patches culled (per frame)
231 FPS, 14126 tris/12938 verts, 31/39 terrain patches culled (per frame)
178 FPS, 18908 tris/18433 verts, 28/37 terrain patches culled (per frame)
156 FPS, 20733 tris/20855 verts, 18/27 terrain patches culled (per frame)
149 FPS, 21511 tris/21716 verts, 19/28 terrain patches culled (per frame)
139 FPS, 24553 tris/24813 verts, 19/29 terrain patches culled (per frame)
144 FPS, 23394 tris/23655 verts, 19/28 terrain patches culled (per frame)
155 FPS, 22202 tris/21980 verts, 18/28 terrain patches culled (per frame)
172 FPS, 17733 tris/18054 verts, 18/25 terrain patches culled (per frame)
113 FPS, 26676 tris/28568 verts, 13/22 terrain patches culled (per frame)
130 FPS, 22255 tris/23885 verts, 12/18 terrain patches culled (per frame)
143 FPS, 19569 tris/21158 verts, 7/13 terrain patches culled (per frame)
129 FPS, 23849 tris/25371 verts, 10/18 terrain patches culled (per frame)
133 FPS, 23186 tris/24506 verts, 14/22 terrain patches culled (per frame)
157 FPS, 20800 tris/20497 verts, 20/29 terrain patches culled (per frame)
177 FPS, 18194 tris/17497 verts, 20/29 terrain patches culled (per frame)


As I said, it gets a bit better on more recent ATI hardware, but it’s unbearably slow on older ATI GPUs, and I have no clue what might be the cause of this.

Thanks for testing it out, though.

Are you using NPOT textures maybe?

Nope, all the content is procedurally generated, so textures are strictly POT. I actually took extra care to make this game as friendly to a wide range of hardware as possible, sticking to GL 1.4 with the most popular extensions only (multitexturing, VBO, FBO, shaders) etc. I also thought it could be a funny FBO colour attachment internal format (I did have an FBO completeness check failing on some hardware at early stages of development due to a GL_LUMINANCE attachment), but it’s GL_RGBA now, so it should be fine.

Any suggestions on how to debug this? I tried sending a profiling build to people with these issues, but I can’t see any obvious bottlenecks in the gprof outputs I’m receiving from them, apart from glDrawElements calls taking about 7 times longer to complete (in relation to other, constant-time code parts; i.e. average glDrawElements time divided by average time of one of the constant-time functions of the game) on their machines than it does on mine…

it struggles to get a two-digit framerate value on ATI hardware

Which ATI hardware?

I suspect the problem lies in the performance of glDrawElements - the game uses the GLSL pseudo-instancing technique (as described in the NVIDIA paper here)

From the paper:

As a corollary, if there is not “efficient in-lining of persistent vertex attributes” (and the only reason you would expect this to be true on NVIDIA hardware is because they say so), then one would not expect this technique to be particularly fast.

I don’t think it’s the number of glDrawElements calls. I think it’s the fact that you’re combining glDrawElements calls with glVertexAttrib (or their fixed-function equivalent). That is, you’re relying on a hardware/driver-based optimization technique on hardware/drivers that do not conform to the expected requirements. You’ll need to add a rendering path that uses uniforms or whatever to pass the data.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.