Regarding performance(CPU and GPU)

Mukund · May 26, 2011, 2:14am

Hello Everyone,

Well, this might sound trivial, but its just something im curious to know, and i’m not sure how i can search for it either. Here is my question:

Are there any resources that i can read, so that i can come to know the exact way the CPU sends data to the GPU for calculation?

What i mean to say is, how can i be sure if a program’s performance is bad due to

The CPU sending data slowly to the GPU
The GPU itself being slow
The program is really bad(too many unnecessary things)

So, how can i be sure what the problem is? Are there any methods by which i can check this?

Long back, in one of my earlier posts, ZBuffer had asked me to check if the FPS increased when the window is made smaller.
Well i would like to know how that tells about the performance, the issues with it etc.

i hope im being clear. Kindly let me know if i need to be more elaborate.

Thanks in advance.

danbartlett · May 26, 2011, 4:28am

You could try gDEBugger to locate where bottleneck is, or to see if you’re making redundant state changes, too many small batches etc. You can turn off various stages of the rendering pipeline to see if it has any effect. You can also check for unnecessary state changes. It’s not perfect, for example, it doesn’t detect state changes that have no effect on rendering:


glColor4f(1, 0, 0, 1); // this is not used, since no vertices issued before it is changed again, but not reported as unused
glColor4f(0, 1, 0, 1);
glVertex3f(1, 0, 0);

but it will detect if you are making a state change that doesn’t change a value from it’s current value:

glColor4f(1, 0, 0, 1);
glColor4f(1, 0, 0, 1); //redundant state change detected
glVertex3f(1, 0, 0);

gDEBugger is a good starting place though.

I think it would be great if there was a tool that recorded all your OpenGL calls, and gave you alternative versions of the frame with re-ordered/re-structured OpenGL calls that could potentially give a better framerate.

BionicBytes · May 26, 2011, 4:56am

At a simplistic level somthing like this can be fairly easy to determine (depending upon how complex your engine is to implement it).
First thing to know is that you should be careful just how you are recording your timings. Using FPS is not a good measure and VSYNC must be turned off for all testing (perhaps do this automatically if $DEBUG is your complier directive).
A better way to measure performance is by the time a frame (and/or individual sections) take to render. You do this by using a high resolution timer.

On Windows platforms there is an API for this:
queryPerformancefrequency
QueryPerformanceCounter

Now that you have a consistent way to measure performance, you can answer your first two questions (number 3 is too general to even attempt).
Given a time to render the scene, say for example 30 ms. If you then substitute a few meshes for much higher resolution meshes (many more verticies) then, as long as everything else remains exactly the same, the difference in render time is a measure of whether sending vertex attributes is affecting your overall rendering speed. If the rendering speed remains close to 30ms (in this example) then it’s a fair bet that sending geometry is not the bottle neck; neither is the processing of it on the GPU.

For question number to - the GPU being too slow:
There are two simple cases: 1) vertex processing (transform), 2) Fragment processing (fill).
If you render the scene a second time using identical geometry but only a basic fragment shader (outputting just white for example), then a change in scene render time indicates that the GPU is spending a lot of time shading your triangles. Swapping for a simpler shader should increase performance and you should see a drop in frame time; if you don’t then shading is not your bottle neck (at this point).
This can be backed up by rendering to a smaller viewport - less pixels are being written to and so frame times should increase.

Please understand there is no one bottle neck in a program; more like there are several. At any one time you will be limited by one or more and tweaking those until they are no longer the main bottle neck just means that the bottle neck has just shifted to somewhere else instead.

As an example, the CPU is mostly responsible for scene mangement which in turn feeds the rendering pipeline. If the CPU is too slow organising which objects to draw then the engine could well be limited by this function. A method of testing that is to by-pass the scene manager and manually arrange for the exact same batches of objects to be placed directly in front of the camera’s view and see if the render time is affected. Assuming a like-for-like scene appearance, then this is the cost of your scene management function. Of course it’s supposed to be doing you a useful job by eliminating objects outside the view fustrum and this should give you pay-back with increased performance compared to not having any scene management. It’s worth a thought…

Mukund · May 26, 2011, 8:57am

Thanks a lot BionicBytes and Dan Bartlett. That was really helpful.