PDA

View Full Version : glGetXXX cost 15 millisecond! Why?



linghuye
03-10-2008, 04:53 AM
I have recently wrote a test application to profile my OpenGL render engine performance.
The test application draw a mob model 1000 times, using a GLSL vertex shader for skeleton animation calculating. The output is correct, but the FPS is low.

After doing some profile, I found that a glGetInteger(GL_VIEWPORT, vp) cost me 15 millisecond, so I search the internet, and found someone said that:
glGetIntegerv(GL_VIEWPORT, vp) cause the CPU to wait for the gl command buffer to clear.

But after deleting this glGetIntegerv call, the FPS is still low, and this time I found a glGetFloatv(GL_TRANSPOSE_MODELVIEW_MATRIX, v) cost me 15 millisecond. This made me crazy.

Are there some tricky things about glGetXXX that I must be care of? Why it so stupid to cost 15 millisecond?
And I did not use multithread in this application, why it cause the cpu to wait for a trivial glGetXXX call?

Test environments
1.Pentium4 D CPU 2.8G Geforce 7950GT driver 169.21 WinXP SP2
2.Core2 6300 1.8G ATI 1650XT driver 8.3 WinXP SP2

Any tip would be appreciated.

Nicolai de Haan Brøgger
03-10-2008, 05:19 AM
I don't know if there's a trick to make the call return faster (I doubt it). A good idea is avoid those calls whenever possible (nvidia and ati mention that in their performance papers).

However I am sure your CPU(s) can do it faster, and if you only need the viewport and the transpose MV the code is very simple and would run lightning fast. If you need inverse, it gets a bit worse but still much better than 15 ms. I think you can look for Gauss-Jordan elimination for computing the inverse of a matrix but you can probably also find source code in your language.

-NiCo-
03-10-2008, 05:42 AM
You can try calling glFlush() once in a while e.g.

for (unsigned int i=0;i<1000;++i)
{
drawModel();
glFlush();
}

Zengar
03-10-2008, 06:09 AM
The answer s simple: the Get operations are slow and should not be used. There is no reason to use them in the first place anyway ( if you want performance ). You can track data you need (like matrices) yourself. The GPU has to wait for the call because all current operations have to be finished in order to get the correct state.

despoke
03-10-2008, 06:23 AM
Could it be that the glGetXXX() you are calling implicitely implies a glFinish()? Thus the driver gets stuck until the GPU is back in sync with the CPU. In my own engine, I record all my matrices & render states locally so I never have to call any glGetXXX() function. Also, driver calls are bad for CPU performance :o)

knackered
03-10-2008, 06:37 AM
In my experience this cost was introduced when nvidia put their driver in it's own thread. Every GL call is pushed onto the driver threads queue, including all glGet's. So you'll be waiting for all pending commands to be flushed before it gets to your glGet, at which point the result will be passed back to the apps blocked thread. Only then can you continue to add more GL commands to the driver queue, meanwhile the GPU is completely idle.

CatDog
03-10-2008, 10:48 AM
The answer s simple: the Get operations are slow and should not be used.
Nice. Where is that official list of deprecated OpenGL commands? I'm reading this twice a week here: "should not be used (anymore)".

CatDog

ZbuffeR
03-10-2008, 11:00 AM
List of deprecated OpenGL features :

indexed color mode
single buffered rendering
gluBuildMipmaps (anything glu* should not be used in production anyway)
feedback selection mode
accum buffer
pbuffers (use FBO instead)
pre-shader systems such as texture combiners, etc.
anything not VBO for vertex operations
anything not PBO for pixel operations
glGet*

Anything left ?

CatDog
03-10-2008, 11:07 AM
Nice again. Now someone please go to that "SDK (http://www.opengl.org/sdk/docs/man/)" and add a

** DO NOT USE **

tag to all related entry points.

CatDog

V-man
03-10-2008, 11:17 AM
List of deprecated OpenGL features :

indexed color mode
single buffered rendering
gluBuildMipmaps (anything glu* should not be used in production anyway)
feedback selection mode
accum buffer
pbuffers (use FBO instead)
pre-shader systems such as texture combiners, etc.
anything not VBO for vertex operations
anything not PBO for pixel operations
glGet*

Anything left ?

Accum buffer is alright. It has been hw accelerated for a few years now.
I would add glDrawPixels, glBitmap, wglFontBitmaps, glaux.

-NiCo-
03-10-2008, 11:21 AM
I would add parts from the imaging subset (histogram,convolution,colormatrix?)

mfort
03-10-2008, 11:27 AM
I am sure that not all Get commands are slow.
I was not able to use push attrib to store current FBO bound. So I am using:
glGetIntegerv(GL_FRAMEBUFFER_BINDING_EXT, &oldFrameBuffer);

and tested it quite a lot to be sure it doesnt slow down the rendering. Found out it takes 0.003ms measured by performance counters.
(NVidia, WinXP)

knackered
03-10-2008, 12:39 PM
No, I can confirm that glGetIntegerv(GL_VIEWPORT, vp) at least is prohibitively slow. I was trying to make a simple gui library that could be dropped into any GL app completely transparently. Alas, it was not to be - the user has to pass this information to the GUI library whenever it changes the viewport. What a shame ;)

thinks
03-10-2008, 05:09 PM
glGetXXX can be quite useful at startup to query the driver for constants (and, of course, extensions). But at that stage things are not really performance bound...

Nicolai de Haan Brøgger
03-10-2008, 06:24 PM
a) Don't rely on GL_PROXY_TEXTURE
b) I would remove "anything not VBO..." but add "immediate mode for vertex operations" (because DL can be the right solution in some cases)

linghuye
03-10-2008, 06:52 PM
Thank you so much indeed for all your replies. I learn a lot.
I will try caching the render state locally from now.

linghuye
03-10-2008, 10:32 PM
I try removing all the glGetFloatv/glGetIntegerv and cache the matrices I need, but the call to

::SwapBuffers(m_hDeviceDC);

cost me 15 ms now, this 15 ms cost is just like a ghost haunting around this appliaction.

I try calling glFlush before SwapBuffers, no difference.

According to my profile data, each animated model draw batch cost only 0.004ms(just a multithread post?), and the final 15ms maybe the presentation of the multithread driver.

Any idea?

lodder
03-11-2008, 01:04 AM
I try removing all the glGetFloatv/glGetIntegerv and cache the matrices I need, but the call to

::SwapBuffers(m_hDeviceDC);

cost me 15 ms now, this 15 ms cost is just like a ghost haunting around this appliaction.

I try calling glFlush before SwapBuffers, no difference.

It sounds like you are using Vertical Synchronisation.



List of deprecated OpenGL features :

gluBuildMipmaps (anything glu* should not be used in production anyway)

Anything left ?
So how would you create mipmaps then?

linghuye
03-11-2008, 02:19 AM
It sounds like you are using Vertical Synchronisation.
nope, I turn off Vertical Synchronisation by wglSwapIntervalEXT(0), and if I draw only 10 mob models the FPS can reach upto more than 1000 on my mathine. If I draw 1000, the FPS drops down to 21 and the SwapBuffer cost 15 ms.


So how would you create mipmaps then?
I create model mipmap texture by feeding the each mipmap level image data with calling,
glCompressedTexImage2DARB(GL_TEXTURE_2D, i, srcImageFormat, nWidth, nHeight, 0, psize[i], pdata[i]);
And I am sure no one call gluBuild2DMipmaps

I add a glFinish call just before SwapBuffer and find that glFinish cost 15 ms now. It seems that GPU is slow calculating bone anim vertex shader asynchronously, but when call glFinish, CPU wait for it.

I also test this app using my DirectX render engine and draw the same 1000 models, it can reach 30 FPS. It seems DirectX does not have this 15 ms problem, so it can reach 1000 / (1000 / 21 - 15) = 30 FPS

ZbuffeR
03-11-2008, 02:52 AM
List of deprecated OpenGL features :
gluBuildMipmaps (anything glu* should not be used in production anyway)
So how would you create mipmaps then?

Hardware accelerated :
glTexParameteri( GL_TEXTURE_2D, GL_GENERATE_MIPMAP_SGIS, GL_TRUE );

Xmas
03-11-2008, 03:05 AM
Hardware accelerated :
glTexParameteri( GL_TEXTURE_2D, GL_GENERATE_MIPMAP_SGIS, GL_TRUE );
Better yet, use glGenerateMipmapEXT.

Nicolai de Haan Brøgger
03-11-2008, 08:35 AM
Hardware accelerated :
glTexParameteri( GL_TEXTURE_2D, GL_GENERATE_MIPMAP_SGIS, GL_TRUE );
Better yet, use glGenerateMipmapEXT.

Why is it better?

-NiCo-
03-11-2008, 08:55 AM
I wouldn't say it's better, but it does give you more control over when and if mipmaps should be generated.

Nicolai de Haan Brøgger
03-11-2008, 10:58 AM
I wouldn't say it's better, but it does give you more control over when and if mipmaps should be generated.

OK, in that sense! As I read the spec, the only difference between the two, parameter GL_GENERATE_MIPMAP and glGenerateMipmapEXT, is that you can postpone the generation of the mipmap chain to some suitable time, after having specified (possibly implicitly) the base level. That is, the when. But the if is the same (either we set the parameter or use the extension, or we don't). Please let me know if I understand this wrong way :)

-NiCo-
03-11-2008, 11:11 AM
Ehm, that's basically it. For the if I was referring to the fact that by using GL_GENERATE_MIPMAP_SGIS you would need an operation that changes the base level of your texture (e.g. at initialization) before it actually starts allocating mipmap space while you don't need such an operation using glGenerateMipmapEXT. But like you said, it boils down to the fact that they are the same for the if-case :)