I was looking at the line-by-line parser option.
The (small) problem with your approach is:
What would happen if the second instruction required the alpha combiner for something else (for example a co-issued instruction?).
I was looking at the line-by-line parser option.
The (small) problem with your approach is:
What would happen if the second instruction required the alpha combiner for something else (for example a co-issued instruction?).
You mean if the second instruction was the one who was using the alpha component ?
Answer: it would use the blue component ie r0.b
And if alpha it had to be mapped to an rgb instruction (in register combiners you can use an alpha component as an input of the rgb stage) you would use r0.rgb which all components have the (same) dot3 result.
[This message has been edited by vincoof (edited 01-29-2003).]
Sorry I wasn't clear. What I mean is, what if the second instruction did NOT use the alpha component we want to save, but did use the alpha combiner for something else. For example:
dp3 r0.rgba, t0, t1
mul r0.rgb, r0, t0
+add t0.a, c0.a, c0.a
add r0.a, r0, t1
There is no way to save the dot product result to the alpha channel in the second combiner, and no way to access that data in the third combiner as it has been overwritten.
It is also no good to do the second instruction in the third combiner and the third in the fourth, and use the second combiner to do the "mov r0.a, r0.b". This is because we then cannot guarantee that an 8 instruction program will wit in 8 combiners.
I realize this would be a very rare situation, but it can occur.
You mean that :
mul r0.rgb, r0, t0
add t0.a, c0.a, c0.a
should fit in a single combiner (mul for rgb stage and add for alpha stage) ?
Well, according to the "line-by-line" concept, I would let it fit into 2 different combiners.
But if you really want to allow the program to "group" instructions, then obviously dot3_rgba is a real problem as you described above.
Though there is still the other solution (that I don't like, but oh well) : with your shader language you could specify that "all instructions use 1 combiner except the dot3_rgba instruction which uses 2 combiners".
[EDIT] I've read pixel shader terminology again and now I see exactly your problem. Even though this problem is not likely to appear easily, it *can* happen. Please let me apologize for wasting your time because of my bad pixel shader knowledge.
[This message has been edited by vincoof (edited 01-29-2003).]
That's OK. The discussion has helped me to think further about the problem, even though I do not see a solution.Originally posted by vincoof:
Please let me apologize for wasting your time.
The thing is, sometimes you can play with the capabilities of register combiners. For instance, the co-issued instruction you presented :
dp3 r0.rgba, t0, t1
mul r0.rgb, r0, t0
+add t0.a, c0.a, c0.a
add r0.a, r0, t1
there is a solution :
stage 1 :
- rgb : compute dot product
- alpha : discard
stage 2 :
- rgb : compute mul
- alpha : set c0.a in A, set two (eg from a constant) in B, set r0.b in C and set one in D, then output "AB" in t0.a and output "CD" in r0.a
stage 3 :
- rgb : discard
- alpha : compute add
But I doubt such trick is always possible.
Though, when detected, such trick can save a combiner stage for a significant number of co-issues.
btw, I've heard of a GF3/4 pixel shader bug that forbids usage of co-issued instruction in the last (8th) instruction. Could it be a GeForce limitation that is representative of that dot product problem ?
[This message has been edited by vincoof (edited 01-29-2003).]
That seems to be fixed in later drivers, so it doesn't appear to be a hardware limitation.Originally posted by vincoof:
btw, I've heard of a GF3/4 pixel shader bug that forbids usage of co-issued instruction in the last (8th) instruction. Could it be a GeForce limitation that is representative of that dot product problem ?
At the moment, I think i will just say that "The dp3 instruction writes to the r, g and b components only", and let the program do any moving into alpha that it requres.
Unless someone can tell be a better solution...
Since every combiner can do two dot products simultanously, you can calculate your dot product twice in the same combiner and output it to two different registers. This way the result can 'survive' in this second copy if anything changes the first. Of course you need a free register for this, but remember that you have eight registers that you can write to (primary, secondary, spare0, spare1, tex0-tex3).
I just realized that unless you coissue a lerp or mad with the next rgb instruction after your dot product you only need A and B inputs to do this alpha operation. Then you can use C and D to just move the dot product result from the blue to the alpha component.
Kuba
[This message has been edited by coop (edited 01-30-2003).]