Trade conditionals for switching shaders

nickels · August 26, 2010, 7:21am

I recently implemented much of the quake shader set, and ended up with vertex and pixel shaders that are loaded with conditionals involving uniform variables:

for (int i = 0; i < uNumDeforms; ++i) {
if (deform[i] == WAVE) {
} else if (deform[i] = MOVE)
…

You get the idea.

I am investigating compiling each quake ‘shader’ into a GLSL shader that use the #define mechanism to turn these conditionals into compile time constants (thanks Dark Photon for the idea in a recent thread).

The problem is that Now instead of:

glUseProgram(qshader);

for (faces…

glUniform…

end for

I will have:
for faces

glUseProgram(face->shader);

end face

So I am wondering about the performance hit from switching programs more often. Is it likely that this will be less than the possible hit from the conditionals? Do I need to try and be carefull about sorting faces by shader, etc…

This project is going to be a lot of work, I would hate to go for it and find out the results are no better! Thanks.

Alfonse_Reinheart · August 26, 2010, 8:15am

Do I need to try and be carefull about sorting faces by shader, etc…

Yes, but to be fair, you should have been doing that before. Sort by program, then by the parameters/textures you use with that program. If you have to call glUniform for every face, you’re doing something very wrong.

nickels · August 26, 2010, 11:22am

Each quake shader has a different texture/set of textures and possibly vertex deformations, texture mods, so avoiding some kind of update is inevitable with their system…
There will likely be multiple surfaces per quake shader, so if I 1 to 1 GLSL shader and quake shader maybe I will be good.
From your suggestion I should sort these surfaces by common shader.
And what you’re telling me is that the time to qsort the faces will be less than the extra time in glUnifrom and glUseProgram

That’s a good data point, I appreciate it.

Just trying to be careful not to over optimize !!

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” Donald Knuth

mbentrup · August 27, 2010, 12:26am

I made a quake shader -> GLSL translator that used the one program per shader approach, and it worked reasonably well. I kept the generated GLSL programs in a hash table, so that I could reuse the GLSL program, when a shader compiled to the same GLSL code (a very common case) and kept the shaders indexes sorted by GLSL program (the Q3 engine sorts by shader index anyway).

The performance was ok, though in the end you could have hundreds of programs for a complex map.

nickels · August 27, 2010, 7:27am

Thanks for the background. I didn’t realize that the Q3 engine was sorting by shader, interesting.

I was thinking about maybe starting with autogen for the obvious things such as:

Number of stacked textures in the shader
sky/no sky
additive / fog / regular surface

I worry about 1 to 1 qshader to GLSL shader, because it doesn’t ultimately seem like a scalable approach.

Did you see a decent speedup using generated shaders, or did you not try the if () approach first?
Thanks!

nickels · August 28, 2010, 4:37pm

Thanks for the help, everyone. So I finally decided on recompiling shaders based on:

of texture layers, isSky, isFog, isAdditive

This reduced a number of if statements and reduced the buffers sizes of texcoords and other values for shaders not taking the maximum number of textures (8).
I also reorganized the drawing to it is sorted by a) glsl program, b) quake shader (= textures, texmod params, etc…), c) face

The main gain was reducing the number of textures from the max to the actuall. I believe this savings is largely because of the reduction of varying parameters.

Results: reduced gbuffer draw time from 18 milliseconds back to about 3.5!!!

nickels · September 1, 2010, 7:06pm

system · October 19, 2021, 7:26pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.