MUL+ADD faster than MAD on NV40 ?

next simplest cg’s code :

float4 fmain(varying in float3 v) : COLOR
{
float3 vec = v * 2.0 - 1.0;
return vec.xyzz;
}

with -profile fp40 compiled to

PARAM c[1] = { { 2, 1 } };
TEMP R0;
TEMP RC;
TEMP HC;
OUTPUT oCol = result.color;
MULR R0.xyz, fragment.texcoord[0], c[0].x;
ADDR oCol, R0.xyzz, -c[0].y;
END

but with -profile fp30 compiled to


MADR o[COLR], f[TEX0].xyzz, {2, -1}.x, {2, -1}.y;
END

Is it more optimized code for NV40 ?