PDA

View Full Version : GLSL; return statement ...



tmason
02-20-2015, 11:26 AM
Hello,

Simple question about GLSL and fragment shaders. May I return the "final color" early if I don't need to do any further processing in the void main function?

Consider the following example:



#version 330 core

out vec4 finalColor;

uniform float drawWireframe;

uniform vec4 materialColor;

vec4 fancyLightingFunction(vec4 colorToProcess) {

// Fancy lighting function here...

}

void main() {

if (drawWireframe == 1.0) finalColor = materialColor;

// May I call "return" here?

finalColor = fancyLightingFunction(materialColor);

}



Thank you for your time.

Agent D
02-20-2015, 12:00 PM
You have to understand how the processor architecture that this is run on works. Branches in general are very bad for performance.

Shader executions are basically divided into groups, where each group has only one single instruction decoder that controlls a bunch of
parallel ALUs with register files and local memory attached to them. It's like an extremely wide SIMD architecture. There is only one instruction
fetch and execution unit, so diverging controll flow within a group is usually implemented by executing both branches and flagging the individual
"cores" on whether the results should be used or not.

Having a return statement in your shader code that is only taken by a few shader executions within a group will at best not influence performance
at all. The code is still executed, but the results are ignored for those that hit the return statement, unless all shaders in the the entire group hit the return
statement, so the entire group could finish early.

It is better if you simply use different shader programs for your wireframe rendering, rather than cluttering it with branches.

tmason
02-20-2015, 12:20 PM
I see; that makes sense.

I was trying to avoid shader switching but if that hurts performance then I'll go ahead with using different shaders.


You have to understand how the processor architecture that this is run on works. Branches in general are very bad for performance.

Shader executions are basically divided into groups, where each group has only one single instruction decoder that controlls a bunch of
parallel ALUs with register files and local memory attached to them. It's like an extremely wide SIMD architecture. There is only one instruction
fetch and execution unit, so diverging controll flow within a group is usually implemented by executing both branches and flagging the individual
"cores" on whether the results should be used or not.

Having a return statement in your shader code that is only taken by a few shader executions within a group will at best not influence performance
at all. The code is still executed, but the results are ignored for those that hit the return statement, unless all shaders in the the entire group hit the return
statement, so the entire group could finish early.

It is better if you simply use different shader programs for your wireframe rendering, rather than cluttering it with branches.

GClements
02-20-2015, 07:49 PM
Having a return statement in your shader code that is only taken by a few shader executions within a group will at best not influence performance at all. The code is still executed, but the results are ignored for those that hit the return statement, unless all shaders in the the entire group hit the return statement, so the entire group could finish early.

If you look at the sample code, the condition expression is uniform (i.e. involves only uniforms and constants), so the value will be the same for all fragments within a group. Modern hardware will actually branch here. Older hardware lacks the ability to branch in the conventional manner, but an implementations may compile distinct variants of the shader for each case, and select the appropriate one prior to execution.

OTOH, creating distinct variants of the shader yourself ensures that this will happen. You can use the preprocessor to simplify the process, replacing the condition with e.g. "#ifdef WIREFRAME ... #endif" then switching between "#define WIREFRAME\n" and an empty string. glShaderSource() takes an array of strings rather than a single string, which makes it easy to dynamically insert, remove or replace arbitrary chunks of source code.

Alfonse Reinheart
02-20-2015, 10:01 PM
If you look at the sample code, the condition expression is uniform (i.e. involves only uniforms and constants), so the value will be the same for all fragments within a group. Modern hardware will actually branch here. Older hardware lacks the ability to branch in the conventional manner, but an implementations may compile distinct variants of the shader for each case, and select the appropriate one prior to execution.

I believe the older hardware you refer to would be NVIDIA's pre-8xxx line. Since they're quite literally 10 years old this June, I'd say it's probably a moot point to consider them.

Also, changing programs is a fairly heavyweight option. Odds are good that any internal change will be faster than your external one. As long as it doesn't provoke a recompile.

mbentrup
02-21-2015, 01:26 AM
By the way, you seem to use a float uniform where a bool uniform would be more appropriate. If the driver knows that there are only 2 possible input values it can compile two shader variants to avoid dynamic branching (if this is a useful optimization on the given hardware). For a float there are millions of possible values, so it's more difficult for the driver to optimize this.

tmason
02-26-2015, 08:43 AM
By the way, you seem to use a float uniform where a bool uniform would be more appropriate. If the driver knows that there are only 2 possible input values it can compile two shader variants to avoid dynamic branching (if this is a useful optimization on the given hardware). For a float there are millions of possible values, so it's more difficult for the driver to optimize this.

Thank you (and everyone else) for all of the feedback.

Since I am a beginner maybe my thinking isn't correct on this but I will share it in hopes of hearing what experts have to say on the matter:

I wanted to use one (1) shader for drawing such that I don't have to constantly switch shaders CPU-side. The shader would use float-based uniforms to chose whether to draw:

Wireframe
Skybox (using a texture2D sampler)
Standard shading (with different texture channels for ambient, diffuse, specular, emissive, etc.)

Within my standard shading model I plan to have the capability to calculate multiple lights and be able to position them, etc.
I would also have the capability to turn lighting "on" or "off" such that I can just show the ambient/diffuse textures without lighting calculations.



The method I was going about doing this was as shown in my OP as an example; if a "drawWireframe" uniform was equal to 1.0, use the wireframe section of code in my shader but if that uniform was 0.0 then do something else.

I my mind this was simpler than using multiple shaders as I am just altering the uniforms in memory but not changing shaders which from what I read you should avoid changing shaders if you can.

Let me know if this makes sense and if my thinking is correct here.

Thank you.

Alfonse Reinheart
02-26-2015, 08:58 AM
I wanted to use one (1) shader for drawing such that I don't have to constantly switch shaders CPU-side.

You have 3 shader scenarios. That's not "constantly" switching.

Many actual applications switch between dozens, if not hundreds, of different shaders every frame. That you've managed to boil it down to 3 is really good enough.

It's generally not a good idea to avoid a shader switch if the different scenarios have wildly different resource needs. For example, if one shader form needs 4 textures and another only uses 2, those should probably be two separate shaders, not governed by an internal switch.