ARB FP - Strange multiply issue. (driver bug?)

After I do a normalization cube map lookup, I unpack the value with a:

MAD result, texValue, 2, -1;

However, I am finding on the latest drivers the results I get are wrong.
Geforce FX 5600 it can behave like the instruction is ignored.
ATI 9800 - Negative values are messed up.
Geforce 6800 - No problems.
Intel Integrated - No problems.

However, if I disable the OPTION ARB_precision_hint_fastest; on Nvidia I get expected results. (have not tried ATI -but I assume ATI mostly ignores this flag).

If I change the code to be:

MAD result, texValue, 2.0001, -1;

I get expected results on all cards.
(If I also do the same operation on someting that does not come from a texture, I also seen to get expected results.)

I am assuming Nvidia/ATI have a “optimization” in to recognize the MAD x,x, 2,-1; type of instruction and optimize it. (ie. unpacking bump maps etc which seem to work fine) However, from a cube map the results are just wrong.

It could be that I am doing someting wrong as it is strange the both Nvidia and ATI have a similar bug. Just wondering if anyone else has the same problems?

Driver versions : ATI - Cat 4.8 (will try 4.10 soon)
Nvidia - 66.81

I’m just taking a stab in the dark here but what if you used:

MAD result, texValue, 2.0, -1;

The R300 have many of the old ps1.4 style modifiers left. This means that it can do pack operations like that into a 2x_bias modifier. So that these values are treated specially is expected. It shouldn’t behave differently than 2.0001 however, except being faster. We’ve had some issues like this before. It may be the same bug as affected my old VolumetricLightingII demo which broke down recently. Changing a 1.2 to 1.0 fixed it that time.

rgpc:

Yeah I have tried 2.0, 2.000 and I even tried a constant parameter with 2.0 in it with the same result. If I pass in 2.0 as a uniform and use it(ie

MAD result, texValue, dummy2.x, -1;

I get expected results. (It seems that the driver must have some advanced logic and tries to optimize all multiplies by 2)

Ok I just tried Cat 4.10 and there is no change. If I have some spare time I’ll do up a demo and send it to Nvidia/ATI