Optimize this (X<0) ? mix( a,b,X ) : mix( a,b,X*Y

result = (X<0) ? mix( a,b,X ) : mix( a,b,X*Y );

I sure theres something simple that Im missing that could speed this up

ta zed

result = mix(a, b, mix(X, X*Y, step(X, 0)));

should work, although I have no idea if it’s faster…

Sometimes optimizing one liners in isolation doesn’t buy you as much as you might think. Might be better to look at the output of one of the vendor tools to get a better view of the overall. Though a little shader massaging can go a long ways, it’s tough to drill this down without some careful testing and some real optimized output to analyze. Least that’s been my experience…

What about this, to minimize code when both branches are evaluated :

Z = (X<0) ? X : X*Y ;
result = mix( a,b,Z );

cheers guys

@Zbuffer Yes yours is ~10% quicker + simple really
@Lord crc yours is also quicker but didnt seem to give the same results

@modus true, I was thinking along some trick of avoiding the branch altogether (as we know they’re not good), looks like noone spotted it (perhaps it doesnt exist)

I really should stop posting before going to bed…

It should be (building on ZbufferR’s version):

Z = mix(X, X*Y, step(0, X));
result = mix(a, b, Z);

Ie I mixed up the order of the parameters for step(). For some reason I find it more logical that it’s step(x, edge). Guess I’m weird that way :slight_smile:

I’m weird that way too.

I agree with crc and modus on this one

Probably the spec writer was the weird one after all :slight_smile:

One note about this optimization, I am not sure about the validity of my below results, feel free to criticize/correct me :

I made a simple frag shader with the requested computations (w/ ZBuffer’s opt), and ran it through cgc with the profile gp4fp


#version 120

uniform float X;
uniform float Y;
uniform float a;
uniform float b;

void main()
{
float Z = (Y < 0.0) ? X : X*Y;
float result = mix(a,b,Z); 
gl_FragColor = vec4(result);
}

It gave 9 instructions.

I replaced the mix() function with a homemade one, which resulted in … 8 instructions.


float mixit(float v1,float v2,float v3)
{
return v1 - v3*v1 + v3*v2;
}

Ok I know that the test is for Nvidia & a specific profile (bored to do it for 100 different configs), but … does it mean that using ready-made functions cost more?? (see also reflect etc.)

PS. I hope this is not considered hijacking since, hey, you may be able to reduce your computations by one op! :smiley:

EDIT : Whoops! corrected a typo.

Certainly what you want is v1 - v3v1 + v3v2.

yes I see what u mean about the step() function being backwards,
also the mix (…) one is as well mix(x,y,a)
Ild think about that as lerping from x to y

  • not as in glsl x*(1−a)+y*a

also (from the fixed pipeline) stencil is backwards as well


It gave 9 instructions.
I replaced the mix() function with a homemade one, which resulted in … 8 instructions.

also I dont know if counting instructions is the best way of measuring, FPS is a better indication, Ive seen longer + obfuscated GPU code outperform shorter more logical code, something as simple as the order of execution can make a big difference.
This is part of the reason for my other recent post in this forum, what program are ppl using to benchmark shaders?
it would be nice to have an app that lets u run 2 shaders side by side thus u can compare the execution rate

Not sure what you mean there. mix() is lerping from x to y, if a is 0 you get x, if it’s 1 you get y.

How can stencil be backwards?

I think the stepping and clamping functions are probably just being true to their Renderman roots, so there likely is some reason to this parameter ordering madness after all (weirdness is still in the eye of this beholder).

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.