View Full Version : How costly is a sqrt in a fragment shader?

Mukund

10-20-2011, 04:02 AM

Hello all,

This is my fragment shader:

varying vec4 color;

varying vec3 fragPos;

uniform vec4 lightPos;

void main(void)

{

float dis = sqrt((fragPos.y - lightPos.y) *

(fragPos.y - lightPos.y)

+(fragPos.x - lightPos.x) *

(fragPos.x - lightPos.x)

);

if (dis <= 5.0)

gl_FragColor = color * vec4(1.0, 1.0, 1.0, 1.0);

else

gl_FragColor = color * vec4(0.5, 0.5, 0.5, 1.0);

}

Can anyone please tell me how costly this is?

I just want to check for a circular region and assign colors. Please let me know if there is a better way to do the same.

Thanks a lot!

aqnuep

10-20-2011, 04:55 AM

It shouldn't be that much expensive. Reciprocal square root is actually only a single instruction on modern GPUs, while this means that sqrt() should not be more than two, however it requires division what may be a bit expensive.

You may use inversesqrt() instead and change your comparison but I think you should not be afraid of square root calculation. Actually the branching (if) is much more costly.

Mukund

10-20-2011, 05:17 AM

Thanks for the reply aqnuep.

>>Actually the branching (if) is much more costly.

Well, i need to do the check every fragment. Any idea how i can make it better?

Thanks!

mbentrup

10-20-2011, 05:23 AM

aqnuep is absolutely correct, except sqrt() is usually implemented as inversesqrt() + multiplication, not inversesqrt() + division, so it should be quick.

One thing you should keep in mind though is that sqrt is not vectorized on most GPUs, so computing a sqrt(vec4) requires 4*2 instructions.

aqnuep

10-20-2011, 05:35 AM

aqnuep is absolutely correct, except sqrt() is usually implemented as inversesqrt() + multiplication, not inversesqrt() + division, so it should be quick.

Yes, you are right, as usually if the GLSL compiler is smart enough then it may figure out that there is no need for division/multiplication at all, or maybe only a multiplication is enough. Also, it is true that sqrt(vec4) most probably will require 4 instructions for the reciprocal square root, but may not require 4 instructions for the multiplication.

V-man

10-20-2011, 06:16 AM

This is 4 subtractions, 2 multiplications, 1 addition, 1 inversquareroot, 1 inverse (because sqrt might be a inversquareroot followed by a 1/x).

TOTAL = 9 clock cycles

float dis = sqrt((fragPos.y - lightPos.y) *

(fragPos.y - lightPos.y)

+(fragPos.x - lightPos.x) *

(fragPos.x - lightPos.x)

);

This is 1 subtraction, 1 dot product, 1 inversesqrt.

TOTAL = 3 clock cycles

vec2 result = fragPos.xy - lightPos.xy;

float result2 = dot(result, result);

float dis = inversesqrt(result2);

and then you change your "if (dis <= 5.0)"

Mukund

10-20-2011, 06:29 AM

Thanks V-man, aqnuep, mbentrup.

@V-man

>> and then you change your "if (dis <= 5.0)"

I didn't quite get you. Change that to what?

Thanks!

aqnuep

10-20-2011, 07:05 AM

Maybe you can play with some math like min/max/clamp/ceil/floor to get the values 0.5 or 1.0 based on whether dis is greater than 5.0 or not.

Most probably even multiple ALU instructions will be faster than a conditional.

Mukund

10-20-2011, 07:27 AM

Hmm, i came up with this one:

float val = step(dis, 5.0);

gl_FragColor = mix( color * vec4(0.5, 0.5, 0.5, 1.0),

color * vec4(1.0, 1.0, 1.0, 1.0),

val);

Is this better? step would internally have to do a comparison right? So, is it inevitable that there is a loss of cycles or is it in any way avoided?

aqnuep

10-20-2011, 07:28 AM

FYI, Groovounet just posted on twitter about an ALU technique that can be used for conditional elimination: http://developer.amd.com/documentation/articles/pages/New-Round-to-Even-Technique.aspx

Groovounet

10-20-2011, 07:41 AM

Ahah: many people seems to have missed that the build in function mix also have a version with a bool type.

genType mix(genType, genType, genBType);

So this is enough:

gl_FragColor = color * mix(

vec4(0.5, 0.5, 0.5, 1.0),

vec4(1.0, 1.0, 1.0, 1.0),

dis <= 5.0);

Mukund

10-20-2011, 08:23 AM

@aqnuep: Thanks!

@Groovounet: Well GLSL version 1.2(the one i'm using) doesn't seem to support it. But yeah, i didn't know we could use mix that way from 4.0 onwards.

Thanks!

Groovounet

10-20-2011, 09:15 AM

Ahhh I didn't realized that you were using GLSL 1.20.

Ilian Dinev

10-20-2011, 12:21 PM

But then again, gpus generally have predicated-execution :) .

So, fastest version should be:

varying vec4 color;

varying vec3 fragPos;

uniform vec4 lightPos;

void main(void)

{

vec2 tmp = fragPos.xy - lightPos.xy;

float disSq = dot(tmp,tmp);

float col = 0.5;

if (disSq <= 5.0 * 5.0) col = 1.0;

gl_FragColor = vec4(color.xyz * vec3(col), 1.0);

}

Ilian Dinev

10-20-2011, 12:39 PM

And if some gpus can do single-cycle compare to 0.0f and conditionally move, then:

varying vec4 color;

varying vec3 fragPos;

uniform vec4 lightPos;

// for scalar-ISA gpus

void main(void)

{

vec2 tmp = fragPos.xy - lightPos.xy; // 2 fsub = 2 cycles

float col = 0.5; // mov, 1 or 0 cycles, see below

float disSq = tmp.x*tmp.x + (tmp.y*tmp.y - 25.0); // fmad, fmad = 2 cycles.

if (disSq <= 0.0) col = 1.0; // 1 cycle . Some gpus might merge-in the above "col = 0.5" execution in here.

gl_FragColor = vec4(color.xyz * vec3(col), 1.0); // 3 fmul, 1 mov = 4 cycles.

}

Things get funny when some gpus can do an fmul and an fmad together in a single cycle, though :)

V-man

10-21-2011, 04:54 AM

"gl_FragColor = vec4(color.xyz * vec3(col), 1.0); // 3 fmul, 1 mov = 4 cycles. "

That should be 1 MUL and 1 MOV

gl_FragColor.xyz = color.xyz * col.xxx;

gl_FragColor.w = 1.0;

and modern hw supports direct multiple writes to gl_FragColor.

Ilian Dinev

10-21-2011, 12:10 PM

// for scalar-ISA gpus

void main(void)

:)

yuriks

10-28-2011, 08:47 AM

No one seems to have mentioned it, but op can simply square 5 and compare with 25 instead, doing away with the sqrt entirely. :P

sqrt[-1]

10-28-2011, 06:42 PM

No one seems to have mentioned it, but op can simply square 5 and compare with 25 instead, doing away with the sqrt entirely. :P

I think Ilian Dinev's code above does this.

Powered by vBulletin® Version 4.2.3 Copyright © 2018 vBulletin Solutions, Inc. All rights reserved.