vertex/fragment program perfomance

Roman_Grigoriev · October 7, 2003, 10:33pm

Hi guys!
Could you please tell me is there a difference in perfomance on nv30 in vp2.0 and fp1.0
if i use for example
MUL R1.x,R0.x,c[1].x;
and
MUL R1.xxxx,R0.xxxx,c[1].xxxx;
?

Korval · October 7, 2003, 10:52pm

If you’re asking about the performance difference between a scaler operation and a vector operation, there usually isn’t one.

However, an R300-based card’s fragment programs can co-issue many scaler operations with 3-vector operations, such that they happen simultaneously. It has been speculated that an NV30-based card can do so as well, though there is no proof of this as of yet.

Relic · October 8, 2003, 3:11am

The difference is that the the second line is infinitely slower. It wouldn’t compile because .xxxx is not a valid optional mask.
.x is just an abbreviation for .xxxx in a swizzle postfix.

If you’re interested in the diffs of
MUL R1.x,R0.x,c[1].x;
and
MUL R1,R0.x,c[1].x;
there are some, for example you have three components less left for other values. It can make a difference in long programs.

Aside from that, there’s not much to add to the previous post.

[This message has been edited by Relic (edited 10-08-2003).]

vincoof · October 8, 2003, 9:51am

Originally posted by Relic:
The difference is that the the second line is infinitely slower. It wouldn’t compile because .xxxx is not a valid optional mask.

Do you mean as an input or an output ? My guess is that it is perfectly valid as input and not valid as output.