result = (X<0) ? mix( a,b,X ) : mix( a,b,X*Y );
I sure theres something simple that Im missing that could speed this up
ta zed
result = (X<0) ? mix( a,b,X ) : mix( a,b,X*Y );
I sure theres something simple that Im missing that could speed this up
ta zed
result = mix(a, b, mix(X, X*Y, step(X, 0)));
should work, although I have no idea if it's faster...
Sometimes optimizing one liners in isolation doesn't buy you as much as you might think. Might be better to look at the output of one of the vendor tools to get a better view of the overall. Though a little shader massaging can go a long ways, it's tough to drill this down without some careful testing and some real optimized output to analyze. Least that's been my experience...
What about this, to minimize code when both branches are evaluated :
Z = (X<0) ? X : X*Y ;
result = mix( a,b,Z );
cheers guys
@Zbuffer Yes yours is ~10% quicker + simple really
@Lord crc yours is also quicker but didnt seem to give the same results
@modus true, I was thinking along some trick of avoiding the branch altogether (as we know they're not good), looks like noone spotted it (perhaps it doesnt exist)
I really should stop posting before going to bed...
It should be (building on ZbufferR's version):
Z = mix(X, X*Y, step(0, X));
result = mix(a, b, Z);
Ie I mixed up the order of the parameters for step(). For some reason I find it more logical that it's step(x, edge). Guess I'm weird that way
I'm weird that way too.
I agree with crc and modus on this one
The unofficial community-lead OpenGL SDK is in development! http://glsdk.sourceforge.net
Probably the spec writer was the weird one after all
One note about this optimization, I am not sure about the validity of my below results, feel free to criticize/correct me :
I made a simple frag shader with the requested computations (w/ ZBuffer's opt), and ran it through cgc with the profile gp4fp
Code :#version 120 uniform float X; uniform float Y; uniform float a; uniform float b; void main() { float Z = (Y < 0.0) ? X : X*Y; float result = mix(a,b,Z); gl_FragColor = vec4(result); }
It gave 9 instructions.
I replaced the mix() function with a homemade one, which resulted in .. 8 instructions.
Code :float mixit(float v1,float v2,float v3) { return v1 - v3*v1 + v3*v2; }
Ok I know that the test is for Nvidia & a specific profile (bored to do it for 100 different configs), but .. does it mean that using ready-made functions cost more?? (see also reflect etc.)
PS. I hope this is not considered hijacking since, hey, you may be able to reduce your computations by one op!
EDIT : Whoops! corrected a typo.
Certainly what you want is v1 - v3*v1 + v3*v2.