Hi! My name is Morgan Johansson and I am working on (among other things) an OpenGL-based graphics engine for a game with a very high triangle count.
The last few days I have been optimizing the code and I found something strange - I seem to have a black hole in my code draining performance.
The program is multithreaded (though SDL) and rendering has its own thread. Profiling has shown me that 99.91% of the time in the demo program is spent waiting for rendering to finish. Nearly all of the CPU-intensive tasks (of my creation) are performed in that last 0.09%.
At first I simply thought the graphics card was limiting but that is not the case. Moving from an Intel 865 integrated chip to a Geforce 3 or an ATI FireGL X1 gives no more than twice the framerate (from 20 to 40 fps).
The scene is a rendering of 300 objects of 550 triangles each (though only 6 geometries). This is currently displayed using vertex arrays (glDrawElements with GL_TRIANGLES). Each vertex has position, normal and texture coordinates in floats. There is a single texture on the triangles.
So far I have tried the following with no or very little effect on the framerate:
- Turned off the textures and blocked all calls to send the textures to graphics memory and activation of these.
- Decreased the number of triangles in each object to 260.
- Used display lists for all the drawing (6 lists in total).
- Switched between matrix loading and calls to glTranslate etc.
The only thing that seems to have effect on the framerate is decreasing the number of instances drawn of the six meshes.
Some statistics I have:
- I only get a vertex processing rate of 5-10M vertices/second on a geforce 3 (Athlon 1.33 GHz). As I dont use strips, that is about 2-3M triangles/second.
- I change material settings 9 times each frame.
- I do one push, load, pop on the modelview matrix for each instance.
I realize that there are planty of things I can do to boost performance. But what I would like to know is where I loose performance. Seems to me it is probably either some CPU intensive task hidden in the drivers or some bus that isn’t fast enough.
Any help with this problem is appreciated! Sorry about the lengthy post, I wanted to describe the problem in detail.
Cheers,
Morgan Johansson