PDA

View Full Version : compiling up front



Ludde
11-28-2011, 10:50 AM
I have an application that uses around 500-1000 shader programs.
I tried to compile the shaders at startup instead of when needed.
The problem is that there is a huge performance hit when doing that, around 40% less fps.
I have a 580gtx(windows7-64) and tried it on 285.62 and 290.36 and there were no difference.

The only difference I can see is that when i use the lazy(when needed) pattern I upload the uniforms from the material short after the compiling/link step.

Do you know if there are any usage patterns or guidelines for this type of scenario?

BionicBytes
11-28-2011, 11:10 AM
I can only think that the driver has decided to 'release' some of the pre-compiled shaders from GPU memory - based on the fact that they weren't used immediately.
The lazy method has the advantage that GL driver knows they are needed and they are in the optimum memory location.
There is nothing I know of which can infulence driver memory optimisation or shader compiling.

Ludde
11-28-2011, 11:42 AM
Ok thanks.
Just to clarify, the performance is constantly lower even when the camera is standing still and all needed shaders are active/in use.

The Little Body
11-28-2011, 05:35 PM
Yours 500-1000 shaders can perhaps to be factorised into a smaller number of "more generics shaders" that use uniforms variables for to make the differentiation between them ?

Ludde
11-29-2011, 02:45 AM
Yours 500-1000 shaders can perhaps to be factorised into a smaller number of "more generics shaders" that use uniforms variables for to make the differentiation between them ?

Yes they are already permutations from a set of around 20 shader templates. But maybe reduce the amount of shaders by moving static branching to dynamic would help.

But the problem is that there seems to be a usage pattern regarding when to compile that influence performance.

And its not the number of shaders that is the main problem, because when I compare performance I move around to "cache up" to the same number of shaders before doing the comparison.

I forgot to mention that I use core-profile and tested this on version 3.3-4.2

tksuoran
11-29-2011, 06:33 AM
Just a random thought - could the compile time state affect the compilation? You could verify this by resetting all OpenGL state during compilation, both in lazy and up front cases, and see if that would make any difference.

aqnuep
11-29-2011, 07:06 AM
Yes, very good point by tksuoran. A lot of OpenGL states can in fact affect the resulting shader, in fact the reason behind pre-compilation is slower is that the driver has to recompile your shader because of some state differences between the point you compiled the shader and the point you used it.

Ludde
11-29-2011, 07:34 AM
Thank you for your help.

interesting... but if the affected shaders are recompiled shouldn't that make the performance to gradually be better, especially if you holding the camera still. I tested and waited several minutes and it was exactly the same performance. It took 30-40 seconds to pre-compile all the shaders, so it would be enough with 3-4 minutes I believe.

agnuep, you mentioned that "A lot of OpenGL states can in fact affect the resulting shader". Do you have more information of this?

aqnuep
11-29-2011, 03:49 PM
Actually it heavily depends on the GL implementation and the hardware. What first comes into my mind is the many deprecated features that are sometimes solved by driver-baked shaders (AFAICT) like alpha testing, quads as input primitives, point sprites, line stipple and many others. But I think in practice there are a lot of other more general use cases when this might be needed.

I can see that the reason why the performance doesn't increase over time is because the recompiled version is not reused, though this is just pure speculation. Maybe if we would know about the states you set or the type of rendering you perform.

Foobarbazqux
11-29-2011, 05:09 PM
Eric Lengyel says you are wrong about alpha testing. Please stop spreading misinformation.
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=261995#Post2619 95

Ludde
11-30-2011, 01:21 AM
Actually it heavily depends on the GL implementation and the hardware. What first comes into my mind is the many deprecated features that are sometimes solved by driver-baked shaders (AFAICT) like alpha testing, quads as input primitives, point sprites, line stipple and many others. But I think in practice there are a lot of other more general use cases when this might be needed.

I can see that the reason why the performance doesn't increase over time is because the recompiled version is not reused, though this is just pure speculation. Maybe if we would know about the states you set or the type of rendering you perform.

AFAIK I don't use any deprecated features, I'm doing my best to stay away from them and it actually helps in the long run. For me the code becomes much cleaner.

The thing that worries me is that performance actually could be dependent on WHEN(and/or general state) you compile. If it would be a minor performance degradation I would accept it, but when fps drops from 45 to 28 it becomes a problem.

aqnuep
11-30-2011, 01:48 AM
Eric Lengyel says you are wrong about alpha testing. Please stop spreading misinformation.
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=261995#Post2619 95

I know that topic well, believe me, but Eric Lengyel never said that this is true also for the Evergreen Radeon GPUs as an example. While I'm not 100% sure whether any of the newer Radeon or Intel GPUs support FF alpha testing or not, but considering it was removed I really believe that there are some.