Update on the issue many small shaders vs few big ones? Passing uniforms?

In the context of the open source project Terasology I have inherited a number of shaders that look like this:

#define CHROMATIC_ABERRATION

#ifdef BLOOM
uniform float bloomFactor;

uniform sampler2D texBloom;
#endif

#if defined (CHROMATIC_ABERRATION)
uniform vec2 aberrationOffset = vec2(0.0, 0.0);
#endif

uniform sampler2D texScene;

#ifdef VIGNETTE
uniform sampler2D texVignette;
uniform vec3 inLiquidTint;
#endif

#ifdef LIGHT_SHAFTS
uniform sampler2D texLightShafts;
#endif

void main() {

#if !defined (CHROMATIC_ABERRATION)
    vec4 color = texture2D(texScene, gl_TexCoord[0].xy);
#else
    float r = texture2D(texScene, gl_TexCoord[0].xy - aberrationOffset).r;
    vec2 ga = texture2D(texScene, gl_TexCoord[0].xy).ga;
    float b = texture2D(texScene, gl_TexCoord[0].xy - aberrationOffset).b;

    vec4 color = vec4(r, ga.x, b, ga.y);
#endif

#ifdef LIGHT_SHAFTS
    vec4 colorShafts = texture2D(texLightShafts, gl_TexCoord[0].xy);
    color.rgb += colorShafts.rgb;
#endif

#ifdef BLOOM
    vec4 colorBloom = texture2D(texBloom, gl_TexCoord[0].xy);
    color += colorBloom * bloomFactor;
#endif

#ifdef VIGNETTE
    float vig = texture2D(texVignette, gl_TexCoord[0].xy).x;

    if (!swimming) {
        color.rgb *= vig;
    } else {
        color.rgb *= vig * vig * vig;
        color.rgb *= inLiquidTint;
    }
#endif

    gl_FragData[0].rgba = color.rgba;
}

That is, a single shader takes care of a number of unrelated effects. My understanding is that this is a simple example of an ubershader (?).

As Terasology is meant to be designed from the ground up for modding, I’m thinking of breaking this type of shaders into smaller ones, one for each group of related ifdef blocks. This would enable modders to insert their own shaders and renderings in the chain that leads to the image shown on screen. This however, of course leads to an increased number of glUseProgram() calls and an increased number of separate, if computationally simpler, renderings.

I imagine this is a typical trade-off: code simplicity and flexibility on one side and performance on the other. I’ve read a number of old threads on this topic and I wonder what’s the current state of affair. Is frequently changing shaders still a problem, to the point that ubershaders still have a considerable performance advantage all else being equal? Or are there other considerations to make before breaking up these kind of shaders?

Bonus question: what’s the situation with passing uniforms to a shader? Does it make sense to pass them only if the value has changed or is that not worth the effort?

Kind regards, Manu

[QUOTE=emanuele3d;1285082]I have inherited a number of shaders that look like this:

#ifdef BLOOM
...
#if defined (CHROMATIC_ABERRATION)
...
#ifdef VIGNETTE
...
#ifdef LIGHT_SHAFTS
...
}

That is, a single shader takes care of a number of unrelated effects. My understanding is that this is a simple example of an ubershader (?). [/QUOTE]

That’s one form yes, though not the easiest to read or maintain.

Are they really unrelated? The implication from how this is written is that you can enable multiple of these effects for a single compiled shader. Are you saying you’d never do this?

…I’m thinking of breaking this type of shaders into smaller ones, one for each group of related ifdef blocks. … This however, of course leads to an increased number of glUseProgram() calls and an increased number of separate, if computationally simpler, renderings.

Does it? You said these are unrelated effects. That would imply that you’d never enable more than one of these effects per compiled shader. Thus for N effects, you’d have N compiled shaders (compiled from one source shader). If you split the source up into separate shaders, you’d still have N compiled shaders (but now compiled from N separate source shaders). Is this correct? If so, there’d be no more glUseProgram()s than before.

ubershader tend not to be used anymore since some years. Little shaders will generally outperform a ubershader, for the same task for several reasons. Ubershaders will generally have a lot of functions, a lot of loops and variables, which will make the compiler hard to optimize them all.

Bonus question: what’s the situation with passing uniforms to a shader? Does it make sense to pass them only if the value has changed or is that not worth the effort?

Consider Uniform Buffer Objects.

Thank you @Silence, that answers my question.

Thank you for the advice. Unfortunately I’m currently writing against OpenGL 2.1 and UBOs have been introduced only in 3.1. Would I be able to fake UBOs by using glTexSubImage1D to update sections of data available to shaders?

[QUOTE=Dark Photon;1285086]
Does it? You said these are unrelated effects. That would imply that you’d never enable more than one of these effects per compiled shader. Thus for N effects, you’d have N compiled shaders (compiled from one source shader). If you split the source up into separate shaders, you’d still have N compiled shaders (but now compiled from N separate source shaders). Is this correct? If so, there’d be no more glUseProgram()s than before.[/QUOTE]

I’m not following your reasoning here. In the example above each of the four effects can be independently enabled/disabled. This would lead to 2^4=16 possible combinations (I think?). As far as I understand right now our ShaderManager compiles and stores all those combinations at startup. When I enable a shader program the ShaderManager calls glUseProgram with the appropriate shader depending on what effects have been enabled. So, for the shader above, every frame only one program is enabled and only one render is made with it.

If I broke that shader in four separate shaders I’d have less shaders to compile at startup but N shaders to enable and N separate rendering to execute, where N is the number of enabled effects. From what Silence says this is not an issue anymore and is probably better as simpler shaders are easier to optimize driver-side. Do you concur?

I was also thinking that smaller shaders allow for some reuse. For example our rendering engine uses blurring functionality for a number of tasks, i.e. in the context of a bloom effect and in the context of the Depth of Field effect. I’m also noticing a number of straightforward alpha-based (or not) compositing operations. If enabling/disabling many small shaders per frame is not too much of an issue, I’d be moving simple, frequently used operations (such as compositing two images together) into their own shaders and change their inputs/outputs, separating the generation of an effect from the application of the effect. Bad idea?

Compiling multiple shader variants is less necessary with modern GPUs. You can just use one shader with uniforms. The implementation may transparently compile specialisations if there’s an advantage to doing so.

The main factor for one shader versus multiple shaders is whether it lets you merge draw calls. If you’re having to split up rendering in order to change shaders, that’s going to have an overhead. But that also applies to splitting up rendering to change uniforms (which may end up actually changing the shader code anyway).

Being modding-friendly is going to cost in performance terms. In a sense, efficiency boils down to working out exactly how much flexibility you really need and providing no more than that. And that is almost the exact opposite of “open ended”.

If you’re targetting GL 2.1 then you’re not targetting modern GPUs so current best practice isn’t going to be relevant for you. You really need to optimize around older hardware instead.

That’s not entirely correct mhagain. I’m targeting 2.1 just because I’ve inherited a complex (for my experience) piece of otherwise working code and I have to thread very carefully not to break it. Yet.

When I adopted the rendering engine of Terasology (its original developer has moved on) it consisted in two big classes a few thousands lines of very intricate code each, with a plethora of supporting classes. It is now structured as a node-oriented architecture, each node representing a relatively atomic step in the rendering process taking inputs and producing outputs - i.e. one node produces light shafts, another ambient occlusion and some nodes put those different outputs together. This has been a major shift in the rendering engine’s architecture, it is still in progress and it is oriented toward eventually moving toward higher OpenGL releases.

So, it’s not like we are not targeting modern GPUs. We just haven’t had the human resources to switch to more modern OpenGL specifications yet. If it was up to us we’d develop multiple renders in parallel, capable of supporting old and new, high performance and not and exploit the quirks and capabilities of the main GPU brands in the best possible way. As an open source project with a smallish crew and even fewer people interested in the rendering/opengl aspects, those are just not realistic propositions for us. Small incremental changes is the only way forward for me and we haven’t quite reached the point yet where heading to OpenGL 3.1 or 4.x, is realistically feasible. We’ll get there though.

Currently the shaders that are the most complex are in the post-production part of the rendering process, mostly rendering to a full screen quad taking a number of buffers filled in previous steps as inputs. The shader in my first post is an example of that and splitting it into four would also increase the draw calls from one to four. That been said, those draw calls just draw a quad, stored in a display list. In that context I imagine shader execution would still remain the biggest item, with perhaps input texture sampling (now happening only once) becoming more of a concern?

But I understand what you are saying in terms of flexibility vs performance. I guess given the nature of our project we will prioritize flexibility/moddability - within reason.

[QUOTE=emanuele3d;1285101]That’s not entirely correct mhagain. I’m targeting 2.1 just because I’ve inherited a complex (for my experience) piece of otherwise working code and I have to thread very carefully not to break it. Yet.

When I adopted the rendering engine of Terasology (its original developer has moved on) it consisted in two big classes a few thousands lines of very intricate code each, with a plethora of supporting classes. It is now structured as a node-oriented architecture, each node representing a relatively atomic step in the rendering process taking inputs and producing outputs - i.e. one node produces light shafts, another ambient occlusion and some nodes put those different outputs together. This has been a major shift in the rendering engine’s architecture, it is still in progress and it is oriented toward eventually moving toward higher OpenGL releases.

So, it’s not like we are not targeting modern GPUs. We just haven’t had the human resources to switch to more modern OpenGL specifications yet. If it was up to us we’d develop multiple renders in parallel, capable of supporting old and new, high performance and not and exploit the quirks and capabilities of the main GPU brands in the best possible way. As an open source project with a smallish crew and even fewer people interested in the rendering/opengl aspects, those are just not realistic propositions for us. Small incremental changes is the only way forward for me and we haven’t quite reached the point yet where heading to OpenGL 3.1 or 4.x, is realistically feasible. We’ll get there though.[/QUOTE]

So you’re certainly using FBOs. FBOs were introduced in GL 3.0.

What I’d like to say is that you are already most certainly using extensions that were made part of OpenGL in newer versions. You can get stuck with an OpenGL 2.0 context and use more newer functionalities like FBOs or UBOs. Just use them if they are available. Use something else or simple uniforms if they are not. Or simply upgrade the hardware requirement so that it is now 3.1 instead of 2.x.

Bonus question: what’s the situation with passing uniforms to a shader? Does it make sense to pass them only if the value has changed or is that not worth the effort?

uniform variables are stored in the program object, the program object wont “forget” them, even if you use other programs in between

Thank you for the advice. Unfortunately I’m currently writing against OpenGL 2.1 and UBOs have been introduced only in 3.1. Would I be able to fake UBOs by using glTexSubImage1D to update sections of data available to shaders?

sure

[QUOTE=Silence;1285107]So you’re certainly using FBOs. FBOs were introduced in GL 3.0.

What I’d like to say is that you are already most certainly using extensions that were made part of OpenGL in newer versions. You can get stuck with an OpenGL 2.0 context and use more newer functionalities like FBOs or UBOs. Just use them if they are available. Use something else or simple uniforms if they are not. Or simply upgrade the hardware requirement so that it is now 3.1 instead of 2.x.[/QUOTE]

That’s true Silence. We use GL_ARB_framebuffer_object, GL_ARB_texture_float and GL_ARB_half_float_pixel on top of OpenGL 2.1 - again something I inherited. I guess I could start using more extensions to smooth the migration to more modern opengl releases.

Aaah. Thank you for reminding me this. I thought that might be the case. That been said some shaders get used multiple times with different uniform values. And as we are heading toward modders being able to inject their own effects into the rendering process it could occur that rendering node X uses blur shader with value 0.5 and rendering node Y, not knowing about the other node, attempts to set that uniform with the same 0.5 value. In that context I’m wondering if it’s worth for the cpu-side shader object to prevent the opengl call to set the uniform or if setting uniforms is really not going to slow down processes much.

Also thank you for the confirmation about the glTexSubImage1D idea.