vertex/fragment program perfomance

Hi guys!
Could you please tell me is there a difference in perfomance on nv30 in vp2.0 and fp1.0
if i use for example
MUL R1.x,R0.x,c[1].x;
and
MUL R1.xxxx,R0.xxxx,c[1].xxxx;
?

If you’re asking about the performance difference between a scaler operation and a vector operation, there usually isn’t one.

However, an R300-based card’s fragment programs can co-issue many scaler operations with 3-vector operations, such that they happen simultaneously. It has been speculated that an NV30-based card can do so as well, though there is no proof of this as of yet.

The difference is that the the second line is infinitely slower. It wouldn’t compile because .xxxx is not a valid optional mask.
.x is just an abbreviation for .xxxx in a swizzle postfix.

If you’re interested in the diffs of
MUL R1.x,R0.x,c[1].x;
and
MUL R1,R0.x,c[1].x;
there are some, for example you have three components less left for other values. It can make a difference in long programs.

Aside from that, there’s not much to add to the previous post.

[This message has been edited by Relic (edited 10-08-2003).]

Originally posted by Relic:
The difference is that the the second line is infinitely slower. It wouldn’t compile because .xxxx is not a valid optional mask.

Do you mean as an input or an output ? My guess is that it is perfectly valid as input and not valid as output.