DOT3_RGBA in register combiners

NV_register_combiners can easily do a dot product “tex0 dot tex1”, and put this into the r, g and b components of spare0, for example. This is similar to the DOT3_RGB texture env mode.
Does anyone know how, in a single general combiner, to do the dot product and save the results into all 4 r, g, b AND A components of the destination register? i.e. can I do a “DOT3_RGBA” in a single general combiner?

Thanks

Paul

[This message has been edited by bakery2k (edited 01-29-2003).]

wrong infos have been deleted

Diapolo

[This message has been edited by Diapolo (edited 01-29-2003).]

This will not work, since the alpha part of the combiner cannot do dot products. Quote from the spec:

CombinerOutputNV(GLenum stage, GLenum portion, GLenum abOutput, GLenum cdOutput, GLenum sumOutput, GLenum scale, GLenum bias, GLboolean abDotProduct, GLboolean cdDotProduct, GLboolean muxSum);

 If the <portion> parameter is ALPHA, specifying a non-FALSE value
 for either of the parameters <abDotProduct> or <cdDotProduct>,
 generates an INVALID_VALUE error.

Damn, you are absolutely right, only Mult / Mult / Mux or Mult / Mult / Sum … sorry then!

Diapolo

While I was working on a DirectX pixel shader to reg combiners translator for my engine I faced the same problem and couldn’t find a real solution (without using another combiner stage that is).
In DirectX pixel shaders the dp3 instruction replicates the result to all four components. The only way to emulate this behavior in my translator was to skip ahead and look for instructions that would read the dp3 result from the alpha component and make those instructions read from the blue component instead.

Believe it or not, a pixel shader to register combiners translator is exactly what I am working on!
I also had the same idea to look ahead and read from the blue component, but realised that may cause a problem with a program such as:

dp3 r0.rgba, t0, t1
mul r0.rgb, r0, t2
add r0.a, r0, t3

In this case, reading from the blue component of r0 in the 3rd instruction would be no good, as it has been modified in the 2nd instruction.

My translator also handles such a case by using another combiner stage to duplicate the dp3 result into the alpha component. I spent a lot of time on it and couldn’t find a better way. If you find one, let me know.

You can grab the source of my translator from the CVS repository of my project http://xengine.sourceforge.net.
The translator is in /src/Renderers/RendererOGL13/XFragmentShaderOGL13_DirectXPSA.cpp and /include/Renderers/RendererOGL13/XFragmentShaderOGL13_DirectXPSA.h.
Good luck.

[This message has been edited by Asgard (edited 01-29-2003).]

In the register combiners, you can map a blue component to the alpha stage.
So, when you compute an RGB doct product into an RGB register, say spare0.rgb, you can use the blue component of that result (ie spare0.b) into the alpha stage in the next combiner stage.

Vincoof: That’s exactly what we have been discussing above.

Asgard:
Thanks, that’s very useful.
Now I need to make a decision.
I am not trying to produce an exact copy of the directx pixel shader functionality, so I have some flexibility.

Either I could code a solution like yours, with the possibility that a dp3 will need more than one combiner and hence some 8 instruction programs will not run.

Or, I could simply state in the language specification that dp3 instructions cannot write to an alpha channel. If the program requires the alpha to hold the results of a dp3, the mov instruction could be programmed explicitly, possibly co-issued with the next instruction. However, this reduces the number of instructions available even in the case where a simple “read from blue” would be adequate. (EDIT: Although I suppose the “read from blue” could be made explicit in the shader instead).

D3D does not do either of these. An 8 instruction program which also would require the extra move instruction works fine.
This, and the fact we have DOT3_RGBA, makes me think the hardware is capable, but the feature is not exposed in the register combiners interface.
(@nVidia: Is this correct?)

[This message has been edited by bakery2k (edited 01-29-2003).]

Excuse me for insisting, but I don’t see why the mapping from blue component is not enough.
For instance in your pixel shader example :

dp3 r0.rgba, t0, t1
mul r0.rgb, r0, t2
add r0.a, r0, t3

it can be done in two combiner stages :
first stage :

  • rgb : map t0.rgb to A and t1.rgb to B, and output “A dot B” into r0.rgb
  • alpha : discard
    second stage :
  • rgb : map r0.rgb to 1 and t2.rgb to B, and output “A mul B” into r0.rgb
  • alpha : map r0.b to A, one to B, t3.a to C and one to D, and output “AB+CD” into r0.a

and if you want your pixel shader parser to be a line-by-line translator you’ll need 3 stages :
first stage : same as above
second stage :

  • rgb : same as above’s rgb second stage.
  • alpha : copy r0.b to r0.a
    third stage :
  • rgb : discard
  • alpha : same as above’s alpha second stage.

or maybe I’m missing something, and in that case please pardon me and I’d be glad to know

I was looking at the line-by-line parser option.
The (small) problem with your approach is:
What would happen if the second instruction required the alpha combiner for something else (for example a co-issued instruction?).

You mean if the second instruction was the one who was using the alpha component ?
Answer: it would use the blue component ie r0.b
And if alpha it had to be mapped to an rgb instruction (in register combiners you can use an alpha component as an input of the rgb stage) you would use r0.rgb which all components have the (same) dot3 result.

[This message has been edited by vincoof (edited 01-29-2003).]

Sorry I wasn’t clear. What I mean is, what if the second instruction did NOT use the alpha component we want to save, but did use the alpha combiner for something else. For example:

dp3 r0.rgba, t0, t1

mul r0.rgb, r0, t0
+add t0.a, c0.a, c0.a

add r0.a, r0, t1

There is no way to save the dot product result to the alpha channel in the second combiner, and no way to access that data in the third combiner as it has been overwritten.
It is also no good to do the second instruction in the third combiner and the third in the fourth, and use the second combiner to do the “mov r0.a, r0.b”. This is because we then cannot guarantee that an 8 instruction program will wit in 8 combiners.

I realize this would be a very rare situation, but it can occur.

You mean that :
mul r0.rgb, r0, t0
add t0.a, c0.a, c0.a

should fit in a single combiner (mul for rgb stage and add for alpha stage) ?
Well, according to the “line-by-line” concept, I would let it fit into 2 different combiners.

But if you really want to allow the program to “group” instructions, then obviously dot3_rgba is a real problem as you described above.

Though there is still the other solution (that I don’t like, but oh well) : with your shader language you could specify that “all instructions use 1 combiner except the dot3_rgba instruction which uses 2 combiners”.

[EDIT] I’ve read pixel shader terminology again and now I see exactly your problem. Even though this problem is not likely to appear easily, it can happen. Please let me apologize for wasting your time because of my bad pixel shader knowledge.

[This message has been edited by vincoof (edited 01-29-2003).]

Originally posted by vincoof:
Please let me apologize for wasting your time.

That’s OK. The discussion has helped me to think further about the problem, even though I do not see a solution.

The thing is, sometimes you can play with the capabilities of register combiners. For instance, the co-issued instruction you presented :

dp3 r0.rgba, t0, t1

mul r0.rgb, r0, t0
+add t0.a, c0.a, c0.a

add r0.a, r0, t1

there is a solution :
stage 1 :

  • rgb : compute dot product
  • alpha : discard
    stage 2 :
  • rgb : compute mul
  • alpha : set c0.a in A, set two (eg from a constant) in B, set r0.b in C and set one in D, then output “AB” in t0.a and output “CD” in r0.a
    stage 3 :
  • rgb : discard
  • alpha : compute add

But I doubt such trick is always possible.
Though, when detected, such trick can save a combiner stage for a significant number of co-issues.

btw, I’ve heard of a GF3/4 pixel shader bug that forbids usage of co-issued instruction in the last (8th) instruction. Could it be a GeForce limitation that is representative of that dot product problem ?

[This message has been edited by vincoof (edited 01-29-2003).]

Originally posted by vincoof:
[b]btw, I’ve heard of a GF3/4 pixel shader bug that forbids usage of co-issued instruction in the last (8th) instruction. Could it be a GeForce limitation that is representative of that dot product problem ?

[/b]

That seems to be fixed in later drivers, so it doesn’t appear to be a hardware limitation.
At the moment, I think i will just say that “The dp3 instruction writes to the r, g and b components only”, and let the program do any moving into alpha that it requres.
Unless someone can tell be a better solution…

Since every combiner can do two dot products simultanously, you can calculate your dot product twice in the same combiner and output it to two different registers. This way the result can ‘survive’ in this second copy if anything changes the first. Of course you need a free register for this, but remember that you have eight registers that you can write to (primary, secondary, spare0, spare1, tex0-tex3).

I just realized that unless you coissue a lerp or mad with the next rgb instruction after your dot product you only need A and B inputs to do this alpha operation. Then you can use C and D to just move the dot product result from the blue to the alpha component.

Kuba

[This message has been edited by coop (edited 01-30-2003).]