PDA

View Full Version : Avoiding "round-trip" API calls and performance



Sergey K.
03-28-2007, 03:37 AM
It is said that calls such as glGetFloatv, glGetIntegerv, glIsEnabled, glGetError, glGetString require a slow, round trip transaction between the application and renderer. Currently i'm doing frustum culling with spheres and before rendering each object i extract modelview matrix from OpenGL


glGetFloatv( GL_MODELVIEW_MATRIX, &CurrentModelView );(only modelview, since projection is changed outside of the main loop)

Is it ok in general or is it better to track modelview matrix in the application (by multiplying transformation matrices)?

Zengar
03-28-2007, 04:21 AM
I would do all the matrix stuff myself and load it to GL via LoadMatrix. The GetFloat will almost certainly stall the CPU.

Sergey K.
03-28-2007, 05:14 AM
I have about 100 objects in the scene and the scenegraph transforms hierarchy for each object could be up to 3 matrices. This will result in 300 matrix multiplications. Is it still ok? I'm realy worrying about it.

knackered
03-28-2007, 05:58 AM
represent your rotations as a quaternion, your translation as a vec3f, and your scale as vec3f.

CrazyButcher
03-28-2007, 05:58 AM
well those multiplications were done before, just by GL?
when you do yourself just add some sort of dirty flag, so that if stuff remains static, you dont recalc everything each frame, that would be actually faster than relying on the gl Matrices..

Zengar
03-28-2007, 06:09 AM
I guess that matrix operations in the driver are done on CPU anyway, so you don't lose anything. 300 matrix multiplications is nothing, modern CPUs can do so much more :-)

sqrt[-1]
03-28-2007, 06:15 AM
I actually don't believe that these "get" functions stall anything. I mean, they are just retrieving CPU state. (occlusion queries excepted)

Obviously a local math class would be faster, but for 100 objects I would not worry about it.

Zengar
03-28-2007, 06:38 AM
Yes, but IMHO they probably must wait for pending operations to complete. If you issue several matrix operations, glGet will return the result of the last one, but this is not guaranteed to complete at that point. I can imagine that glGet calls glFinish

Sergey K.
03-28-2007, 07:36 AM
Originally posted by CrazyButcher:
well those multiplications were done before, just by GL?
Yep. Packed the matrices into the stack with glPushMatrix(), glMultMatrix() and then at the proper points extracted a modelview matrix for the current object

Rob Barris
03-28-2007, 09:50 AM
For performance, avoid ever asking GL questions. Strive to make all of your calls to GL, the ones that return 'void'.

If that means foregoing conveniences such as the matrix stack, and potentially having to keep shadow copies of certain bits of state that you sent to GL so you can recall them cheaply later, then that's what you have to do.

Korval
03-28-2007, 10:28 AM
I imagine that most glslang hardware no longer has an actual matrix stack in hardware (if any implementation ever did have one). So asking for the matrix probably won't hurt anything.

You could always profile it to see for yourself.

knackered
03-28-2007, 12:05 PM
If a display list is somehow queued in the pipeline then getting a current matrix, or any state, is going to have to call glFinish. Virtually any state can be stored in a display list.

V-man
03-29-2007, 05:38 PM
Calling glFinish is not necessary for the driver. The driver could just keep track of the matrices in a large array on the CPU side. The problem is that glGet functions are simply slow.

If you are going to use your own matrix code, it better be SSE optimized.

Also, D3D has something called a pure device. The D3D layer doesn't store any state and any calls to Get functions are illegal. It is for better performance.

Sergey K.
03-29-2007, 11:45 PM
With my own transformations stack implemented i've got about 5-7% increased FPS but GPU idle dropped to almost zero and driver sleep time went up to 20-23 ms. Seems i still have problems somewhere.

Komat
03-30-2007, 01:54 AM
Originally posted by Sergey K.:
With my own transformations stack implemented i've got about 5-7% increased FPS but GPU idle dropped to almost zero and driver sleep time went up to 20-23 ms.It is possible that the GPU speed is now the bottleneck.

Overmind
03-30-2007, 06:46 AM
GPU idle dropped to almost zeroThat's usually a good sign. Now you are rendering at full speed without any nasty bottlenecks at the CPU or data submission level ;)

Chances are that the driver sleep time will decrease when you give the CPU more to do. The driver does only queue a few frames, after that it'll sleep in SwapBuffers. This sleep time is freely available to do what you like on the CPU.

To further increase performance, you have to look for bottlenecks on the GPU side (geometry, fragment, ...).

knackered
03-30-2007, 03:57 PM
Originally posted by V-man:
Calling glFinish is not necessary for the driver. The driver could just keep track of the matrices in a large array on the CPU side. Yes it could run through the display list at compile time and store the resultant matrix state.
But I can't see it being optimized enough to do that to avoid the special case of someone calling glGetFloatv. It is more likely it would call glFinish for every get.

Korval
03-30-2007, 05:19 PM
Why is it that nobody (besides Mesa3D) has a nice, GL-compliant, open-source (MIT or BSD license) matrix stack implementation out there?

oBFusCATEd
04-04-2007, 11:43 AM
Originally posted by Sergey K.:
With my own transformations stack implemented i've got about 5-7% increased FPS but GPU idle dropped to almost zero and driver sleep time went up to 20-23 ms. Seems i still have problems somewhere. Sorry for the offtopic, but how did you measure this parameters?

Sergey K.
04-04-2007, 12:51 PM
Using nvAPI and NVPerfSDK from nVidia.

Look here: http://developer.nvidia.com/object/nvperfsdk_home.html

knackered
04-04-2007, 02:08 PM
Originally posted by Korval:
Why is it that nobody (besides Mesa3D) has a nice, GL-compliant, open-source (MIT or BSD license) matrix stack implementation out there? You can use mine korval, I'll make a special license for you.

V-man
04-04-2007, 05:46 PM
Originally posted by knackered:

Originally posted by V-man:
Calling glFinish is not necessary for the driver. The driver could just keep track of the matrices in a large array on the CPU side. Yes it could run through the display list at compile time and store the resultant matrix state.
But I can't see it being optimized enough to do that to avoid the special case of someone calling glGetFloatv. It is more likely it would call glFinish for every get. Getting matrices is different than getting the framebuffer.
If you call glReadPixels, then the driver has to wait for the GPU to finish.
If you call glGetFloat then the driver will just compute or just retrieve the matrix from RAM and memcpy to your memory.

For example :

glRotate(....);
DrawMyModel();
glRotate(....);
glGetFloat(....);


Even if the GPU hasn't completed to render the model, the glGetFloat will be able to return the matrix to you immediatly.

Zengar
04-05-2007, 01:02 AM
Well, not really. Imagine you have the command buffer, where different matrix operations are located together with rendering operations. If you call glGet, the driver has to return the result of the last matrix operation. Basically this means, that the buffer has to be processed to the point where no matrix operations are pending. So it means waiting for the remaining (rendering) operations. Of course, you can optimize this, but I don't think the driver developers would bother. BTW, this is another reason we need LP soon :-)