View Full Version : How costly is a sqrt in a fragment shader?
Mukund
10-20-2011, 03:02 AM
Hello all,
This is my fragment shader:
varying vec4 color;
varying vec3 fragPos;
uniform vec4 lightPos;
void main(void)
{
float dis = sqrt((fragPos.y - lightPos.y) *
(fragPos.y - lightPos.y)
+(fragPos.x - lightPos.x) *
(fragPos.x - lightPos.x)
);
if (dis <= 5.0)
gl_FragColor = color * vec4(1.0, 1.0, 1.0, 1.0);
else
gl_FragColor = color * vec4(0.5, 0.5, 0.5, 1.0);
}
Can anyone please tell me how costly this is?
I just want to check for a circular region and assign colors. Please let me know if there is a better way to do the same.
Thanks a lot!
aqnuep
10-20-2011, 03:55 AM
It shouldn't be that much expensive. Reciprocal square root is actually only a single instruction on modern GPUs, while this means that sqrt() should not be more than two, however it requires division what may be a bit expensive.
You may use inversesqrt() instead and change your comparison but I think you should not be afraid of square root calculation. Actually the branching (if) is much more costly.
Mukund
10-20-2011, 04:17 AM
Thanks for the reply aqnuep.
>>Actually the branching (if) is much more costly.
Well, i need to do the check every fragment. Any idea how i can make it better?
Thanks!
mbentrup
10-20-2011, 04:23 AM
aqnuep is absolutely correct, except sqrt() is usually implemented as inversesqrt() + multiplication, not inversesqrt() + division, so it should be quick.
One thing you should keep in mind though is that sqrt is not vectorized on most GPUs, so computing a sqrt(vec4) requires 4*2 instructions.
aqnuep
10-20-2011, 04:35 AM
aqnuep is absolutely correct, except sqrt() is usually implemented as inversesqrt() + multiplication, not inversesqrt() + division, so it should be quick.
Yes, you are right, as usually if the GLSL compiler is smart enough then it may figure out that there is no need for division/multiplication at all, or maybe only a multiplication is enough. Also, it is true that sqrt(vec4) most probably will require 4 instructions for the reciprocal square root, but may not require 4 instructions for the multiplication.
V-man
10-20-2011, 05:16 AM
This is 4 subtractions, 2 multiplications, 1 addition, 1 inversquareroot, 1 inverse (because sqrt might be a inversquareroot followed by a 1/x).
TOTAL = 9 clock cycles
float dis = sqrt((fragPos.y - lightPos.y) *
(fragPos.y - lightPos.y)
+(fragPos.x - lightPos.x) *
(fragPos.x - lightPos.x)
);
This is 1 subtraction, 1 dot product, 1 inversesqrt.
TOTAL = 3 clock cycles
vec2 result = fragPos.xy - lightPos.xy;
float result2 = dot(result, result);
float dis = inversesqrt(result2);
and then you change your "if (dis <= 5.0)"
Mukund
10-20-2011, 05:29 AM
Thanks V-man, aqnuep, mbentrup.
@V-man
>> and then you change your "if (dis <= 5.0)"
I didn't quite get you. Change that to what?
Thanks!
aqnuep
10-20-2011, 06:05 AM
Maybe you can play with some math like min/max/clamp/ceil/floor to get the values 0.5 or 1.0 based on whether dis is greater than 5.0 or not.
Most probably even multiple ALU instructions will be faster than a conditional.
Mukund
10-20-2011, 06:27 AM
Hmm, i came up with this one:
float val = step(dis, 5.0);
gl_FragColor = mix( color * vec4(0.5, 0.5, 0.5, 1.0),
color * vec4(1.0, 1.0, 1.0, 1.0),
val);
Is this better? step would internally have to do a comparison right? So, is it inevitable that there is a loss of cycles or is it in any way avoided?
aqnuep
10-20-2011, 06:28 AM
FYI, Groovounet just posted on twitter about an ALU technique that can be used for conditional elimination: http://developer.amd.com/documentation/articles/pages/New-Round-to-Even-Technique.aspx
Groovounet
10-20-2011, 06:41 AM
Ahah: many people seems to have missed that the build in function mix also have a version with a bool type.
genType mix(genType, genType, genBType);
So this is enough:
gl_FragColor = color * mix(
vec4(0.5, 0.5, 0.5, 1.0),
vec4(1.0, 1.0, 1.0, 1.0),
dis <= 5.0);
Mukund
10-20-2011, 07:23 AM
@aqnuep: Thanks!
@Groovounet: Well GLSL version 1.2(the one i'm using) doesn't seem to support it. But yeah, i didn't know we could use mix that way from 4.0 onwards.
Thanks!
Groovounet
10-20-2011, 08:15 AM
Ahhh I didn't realized that you were using GLSL 1.20.
Ilian Dinev
10-20-2011, 11:21 AM
But then again, gpus generally have predicated-execution :) .
So, fastest version should be:
varying vec4 color;
varying vec3 fragPos;
uniform vec4 lightPos;
void main(void)
{
vec2 tmp = fragPos.xy - lightPos.xy;
float disSq = dot(tmp,tmp);
float col = 0.5;
if (disSq <= 5.0 * 5.0) col = 1.0;
gl_FragColor = vec4(color.xyz * vec3(col), 1.0);
}
Ilian Dinev
10-20-2011, 11:39 AM
And if some gpus can do single-cycle compare to 0.0f and conditionally move, then:
varying vec4 color;
varying vec3 fragPos;
uniform vec4 lightPos;
// for scalar-ISA gpus
void main(void)
{
vec2 tmp = fragPos.xy - lightPos.xy; // 2 fsub = 2 cycles
float col = 0.5; // mov, 1 or 0 cycles, see below
float disSq = tmp.x*tmp.x + (tmp.y*tmp.y - 25.0); // fmad, fmad = 2 cycles.
if (disSq <= 0.0) col = 1.0; // 1 cycle . Some gpus might merge-in the above "col = 0.5" execution in here.
gl_FragColor = vec4(color.xyz * vec3(col), 1.0); // 3 fmul, 1 mov = 4 cycles.
}
Things get funny when some gpus can do an fmul and an fmad together in a single cycle, though :)
V-man
10-21-2011, 03:54 AM
"gl_FragColor = vec4(color.xyz * vec3(col), 1.0); // 3 fmul, 1 mov = 4 cycles. "
That should be 1 MUL and 1 MOV
gl_FragColor.xyz = color.xyz * col.xxx;
gl_FragColor.w = 1.0;
and modern hw supports direct multiple writes to gl_FragColor.
Ilian Dinev
10-21-2011, 11:10 AM
// for scalar-ISA gpus
void main(void)
:)
yuriks
10-28-2011, 07:47 AM
No one seems to have mentioned it, but op can simply square 5 and compare with 25 instead, doing away with the sqrt entirely. :P
sqrt[-1]
10-28-2011, 05:42 PM
No one seems to have mentioned it, but op can simply square 5 and compare with 25 instead, doing away with the sqrt entirely. :P
I think Ilian Dinev's code above does this.
Powered by vBulletin® Version 4.2.0 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.