View Full Version : GLSL Branching and Clamp function
07-01-2009, 07:50 AM
It is advisable to sparingly use branching in shader code.
But glsl in built functions such as clamp , min or max inherently use "if" to determine final value.
My question: If for clamp(x , 0.0, 1.0) we replace it by
x = (x < 0.0) ? 0.0 : (x > 1.0)? 1.0 : x; .
Will this be faster than clamp() ???
similarly, if min() and max() or step() are expanded using "if", will there be a speed slowdown? If yes, then why such built-in functions are fast?
07-01-2009, 08:13 AM
own functions will never be faster since built-in functions are optimized. you could do dot3 yourself as well but as built-in function on "dedicated" hardware it uses way less cycles to execute. always avoid writing your own code (reflect goes into same category and many others, ftransform). besides with a different GPU the speed of built-in functions might increase.
07-01-2009, 08:34 AM
I see. it means there is some sort of hardware logic associated with in-built functions?
For example: if clamp() is considered, it may be implemented through logic gates ... right ?
Exactly. Clamping has been available since day one (or two...) in programmable shading hardware. Long long before branching got introduced and supported in hardware.
07-01-2009, 08:46 AM
thank you def and _NK47, now I understand.
07-01-2009, 09:54 AM
Not only clamp/min/max are implemented in hardware and do no branching, but if you have simple code like this:
var2 = 7;
var3 = var4-5.0;
There will be no branching, either. Thanks to conditional execution (a flag specifying whether/when the instruction should be executed).
x86 cpus have CMOVxx instructions that do the same (but are limited to "mov"), and ARM cpus have exactly the same flags on every instruction.
Also, if real branching is done on all gpu cores at the same instruction (coherent branching), it only takes 2 gpu cycles. Coherent branching is obviously guaranteed if you loop uniform_N times. The slowness with uniform-looping comes mostly from the extra loop-preparation instructions that compilers still don't optimize well enough.
07-01-2009, 10:34 AM
This is an interesting information.I never new that.
Does special conditional flag exist for gpus like amd , nvidia?
ok, if i use if-else pair and only "if" then will conditional execution take place for former?
Thank you again.
07-01-2009, 11:22 AM
Cg's command line compiler cgc will generate an assembly listing for your inspection. Otherwise I think you're pretty much at the mercy of vendor perf documents and good old fashioned testing.
09-20-2009, 02:06 PM
Do they really use if clauses ?, this is some basic math we learn at college, should this not be faster then an if clause and an compare operation ?
max(x,y) = 1/2 * (x + y + |x - y|)
min(x,y) = 1/2 * (x + y - |x - y|)
someone compared original c++ max with these functions and achieved double performance with the above equations
09-20-2009, 02:13 PM
Ouch :) . No need for arithmetic like that.
GPU hardware is not as ridiculous as a 386 cpu. The silicon logic's schematic for min/max/clamp is really easy, it's just been missing from Intel cpus until SSE came.
09-20-2009, 09:59 PM
Not to mention that this implementation of min/max is prone to overflow and/or precision issues...
Powered by vBulletin® Version 4.2.0 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.