Hardcode vs variables performance

dronus · September 21, 2005, 3:10am

hello
does anybode have any experience whats better:

-using several (eg. 15) precompiled fragment shaders with constant values
-using the same fragment program but changing some variables (eg. 10) several times while rendering

maybe it even depends on gpu chipset or somethign else, but maybe there is a clear answer what way performs better?

hm the critical times are somewhere in the runtime results of glUseProgram and glUniform… i think

thanks very much
Paul

Heady · September 21, 2005, 3:59am

A very powerfull tool you could use is the gDEBugger. gDEBugger

I tried this tool last week, and wow, I was able to find my bottleneck very easy.

Try both versions of your program and look what the gDEBugger means.
This can give you a first expression what’s better.

BUT, this may not be the same on other hardware, so check it with various graphic cards (if you can).

(Theoretic questions like: “What is better?” are important, but the most important thing is: “What you will get in practice?” .
Remember also one of the most important programming guidlines: “Make it simple.” )

Korval · September 21, 2005, 10:44am

maybe it even depends on gpu chipset or somethign else, but maybe there is a clear answer what way performs better?
Well, look at it this way. Either one is a state change, this provoking a certain degree of slowdown. On some hardware, shaders can be paged out of video memory, so changing a shader has a certain pain attached on top of that. On other hardware, the uniforms may all need to be updated if you change just one. But that’s OK, because the uniforms would all be updated anyway, because glslang stores the uniforms as part of the compiled program, so you pay the upload cost for changing shaders too.

In short, it’s best to just update uniforms. In most hardware, there’s no such thing as constants anyway; the compiler will turn constants into constant uniform values. So you may as well make them variable uniforms.

system · September 21, 2005, 4:45pm

It depends on how the GPU’s instruction set works.
If you have a immediate value, it’s possible to put the immediate value as part of the instruction. I’m going to assume that they don’t do this.

If GPU’s use 32 bits for each instruction, then it’s possible they use 8 bits for register file addressing. It’s possible for them to have 256 vec4 uniforms.

I think both gives the same performance.

Humus · September 21, 2005, 6:48pm

Using explicit constants generally creates faster shaders. It may not be a big difference, but sometimes it’s a win. If the constants in question are common stuff like 1, 0, 0.5, 2.0 etc the compiler will often be able to shave off some instructions. Also, constants can often be combined, which can shave off some instructions.

dronus · September 23, 2005, 6:51am

thats for a new one…
is there any experience out there how smart the compiler is?

i just found some same code in ATI’s “RenderMonkey” playground running at vast different speeds between OGL and DX9 forcing me to think the OGL compiler does not much optimizing…

thanks
Paul

kingjosh · September 23, 2005, 7:37am

Paul, the degree at which the compiler optimizes code is up to the hardware vendor. Each hardware vendor creates their own optimized compiler. There is not a uniform “OGL” compiler that every video card manufacturer ships with their driver, rather the opportunity for optimization to each vendors’ unique architecture is present. Remember, the language is young and so are the compilers. As they mature, they’ll get better and more optimized. If more developes use the language, and more developers let the hardware vendors know they think it is important, the more effort the vendors will put into the compiler.

kingjosh · September 23, 2005, 7:46am

Humus - are you suggesting that it would be faster to swap out programs instead of changing uniforms? This may be correct in some narrow case(s), however uniforms are designed to be used for this purpose. I find it hard to believe that swapping out a program/shader could be faster than updating a uniform variable?

dronus · September 23, 2005, 10:12am

the question in fact is if the program is “exchanged” (loaded in some other mem) or maybe just “switched” (gpu binarys all lying in vram) as glUseProgram intend…
if it where called “glSwitchProgram” or on the other hand “glLoadProgram” that would be much more obvious

but i think that may depend on hardware too…

system · September 23, 2005, 4:28pm

Originally posted by dronus:
thats for a new one…
is there any experience out there how smart the compiler is?

I had this shader that was a close shave for SM2 hw. On ATI, to my surprise, it was able to execute in HW.
The HLSL version of the same shader would not fit. I changed the code around, but that only cost more instructions.
I figured that ATI’s GLSL compiler was agressive or D3D’s system was at a loss because it doesn’t know the hw well enough.

“but i think that may depend on hardware too…”

Of course it does.

Humus · September 24, 2005, 10:23pm

Originally posted by kingjosh:
Humus - are you suggesting that it would be faster to swap out programs instead of changing uniforms?
No. I’m suggesting that a shader that has constants written out explicitly often compiles to fewer instructions, thus run faster than the equivalent that uses uniforms. If the number of permutations is fairly low and you’re limited by shading, rather than API overhead of setting shaders/uniforms, then this may be an optimization, but your milage may vary.

Humus · September 24, 2005, 10:28pm

Originally posted by dronus:
i just found some same code in ATI’s “RenderMonkey” playground running at vast different speeds between OGL and DX9 forcing me to think the OGL compiler does not much optimizing…
There will be differences, sometimes favoring D3D but in my experience equally often favoring OpenGL. In OpenGL the IHV writes the front-end, while in D3D it’s provided by the runtime and the high-level shader compiler is provided by D3DX. Sometimes the HLSL compiler may be able to spot optimizations that our optimizer misses, so D3D could gain there, while on the other hand, when HLSL misses an optimization the lost semantics may make it hard to for our optimizer to figure it out while this problem does not exist in OpenGL.

ze_moo · September 29, 2005, 2:36pm

Originally posted by dronus:
t…
i just found some same code in ATI’s “RenderMonkey” playground running at vast different speeds between OGL and DX9 forcing me to think the OGL compiler does not much optimizing…
…

It’s not the compiler… (i think)

those (quite extreme) slowdowns always appeared
when i used rendertargerts and glsl on my old
radeon 9500 pro

system · October 19, 2021, 7:44pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.