Register Combiners: The equations.

LO.
I’m looking at the register combiner extension at the mo, and I seem to understand it all, except what equations are done on the variables.

gcc1rgb = [ Argb[r]*Brgb[r], Argb[g]*Brgb[g], Argb[b]*Brgb[b] ]
gcc2rgb = [ Argb[r]*Brgb[r] + Argb[g]*Brgb[g] + Argb[b]*Brgb[b],
Argb[r]*Brgb[r] + Argb[g]*Brgb[g] + Argb[b]*Brgb[b],
Argb[r]*Brgb[r] + Argb[g]*Brgb[g] + Argb[b]*Brgb[b] ]
gcc3rgb = [ Crgb[r]*Drgb[r], Crgb[g]*Drgb[g], Crgb[b]*Drgb[b] ]
gcc4rgb = [ Crgb[r]*Drgb[r] + Crgb[g]*Drgb[g] + Crgb[b]*Drgb[b],
Crgb[r]*Drgb[r] + Crgb[g]*Drgb[g] + Crgb[b]*Drgb[b],
Crgb[r]*Drgb[r] + Crgb[g]*Drgb[g] + Crgb[b]*Drgb[b] ]
gcc5rgb = gcc1rgb + gcc3rgb
gcc6rgb = gcc1rgb or gcc3rgb
gcc1a = Aa * Ba
gcc2a = Ca * Da
gcc3a = gcc1a + gcc2a
gcc4a = gcc1a or gcc2a

This is what the NV spec says, but it looks horrendously complcated.

Can anyone tell me what is going on here?

Then whats all this bit about?

if <portion> is RGB, out1rgb = max(min(gcc1rgb + cbiasrgb) * cscalergb, 1), -1)
if <portion> is ALPHA, out1a = max(min((gcc1a + cbiasa) * cscalea, 1), -1)
otherwise <portion> must be RGB and
out1rgb = max(min((gcc2rgb + cbiasrgb) * cscalergb, 1), -1)
If the <cdDotProduct> parameter is FALSE, then
if <portion> is RGB, out2rgb = max(min((gcc3rgb + cbiasrgb) * cscalergb, 1), -1)
if <portion> is ALPHA, out2a = max(min((gcc2a + cbiasa) * cscalea, 1), -1)
otherwise <portion> must be RGB so
out2rgb = max(min((gcc4rgb + cbiasrgb) * cscalergb, 1), -1)
If the <muxSum> parameter is FALSE, then
if <portion> is RGB, out3rgb = max(min((gcc5rgb + cbiasrgb) * cscalergb, 1), -1)
if <portion> is ALPHA, out3a = max(min((gcc3a + cbiasa) * cscalea, 1), -1)
otherwise
if <portion> is RGB, out3rgb = max(min((gcc6rgb + cbiasrgb) * cscalergb, 1), -1)
if <portion> is ALPHA, out3a = max(min((gcc4a + cbiasa) * cscalea, 1), -1)

Any help understanding this would be appreciated.

cheers,
Nutty

[This message has been edited by Nutty (edited 07-18-2001).]

It takes a while to really understand what is going on in register combiner code. So, just walk through it slowly.

gcc1rgb is an element-wise multiply. Basically, it takes the variable’s A and B and multiplies the elements together. Call that A*B.

gcc2rgb is a vector dot-product of A and B. It copies the scale result into all 3 portions of the output vector. Call that dot(A, B).

gcc3rgb and gcc4rgb are the same as gcc1&2, but for the variables C and D.

The part about max(min(etc…)) is basically saying that the output (plus the bias and multiplied by the scale) is clamped to the range 1 to -1 (unfortunately, the nVidia guys couldn’t be nearly that lucid).

A single general register combiner stage can perform 2 dot products, 2 vector multiplies, one of each, or a single “muxsum”, which is AB+CD.

To determine which the register combiner does, you set the values of abDotProduct, cdDotProduct and muxSum. If you want dot(A, B) or dot(C, D), you set the appropriate DotProduct flag to true. If you want a muxSum, you must set it to true and both DotProducts to false. If you just want AB and CD, set all the flags to false. If you want dot(A, B) and C*D, set abDotProduct to true, but cdDotProduct to false.

Simple.

The nVidia presentations on the subject are good. I definitely recommend looking at Cass’ nvparse wrapper for the register combiners. It is much easier to understand and I find that if I needed to write combiner code longhand, I can think of how I would do it with nvparse and easily translate it. But of course you don’t need to with nvparse.

Anyway, the nvparse_for_regcom (sp?) presentation distributed with the SDK docs has all you need to know. It explains all the equivalent nvparse code to the specification cr*p in your post and shows that there really isn’t much you need to know to write simple combiner code.

Hope that helps.

Nice one guys.

I should’ve spotted that was a dot product, but when it’s written out all mathematically, it looks all weird!

Thanks again!

Nutty

Thanks I was wandering about thos equations as well.
I found out that the new 12.41 drivers have given me load more extensions on my TNT2 m64. When I first got the card I had about 19 extensions I think, that went up to about 23 then 26 and now it is 33 with each new driver- thanks Nvidia. I know most of this new stuff wont be hardware accelerated but it will be good to mess about with until I get my GeForce3. I have tried to get some information and smaple code for some of them but I can’t find much info at all.

Here are some new extensions I have:

GL_EXT_draw_range_elements
GL_NV_evaluators
WGL_EXT_swap_control
GL_WIN_swap_hint
GL_NV_texture_env_combine4
GL_EXT_texture_lod_bias

These I think are the more important ones or at least ones that I can’t get any info for.

What do these do, is there any smaple code or good documentation?

The only one of these ones that I’ve used is the first, 'cept it’s part of OpenGL1.2 anyway. glDrawRangeElements is almost the same as glDrawElements with the addition of a range of legal values for its indices. It’s supposed to be faster since I guess the card knows it doesn’t have to access indices outside the range you’re specifying. I use it instead of DrawElements but you’d have to ask Matt if it is actually faster. I’ve never benchmarked the two together.

Hope that helps.

Texture env combine 4 is the same as texture env combine, except that ADD and ADD SIGNED are slightly different.

ADD = (ARG0 * ARG1) + (ARG2 * ARG3)
ADD SIGNED is the same as above with -0.5 at the end.

P.S. Oh yeah, you can also specify the texture of another texture unit, instead of only the texture of the current combine status.

Nutty

[This message has been edited by Nutty (edited 07-19-2001).]

Thanks alot, first I will try the glDrawElements to speed up my engine, I guess it should be easy to convert it to use glDrawRangeElements. Hopefully this will increase the engine a lot but I think that it is fillrate limited, I only display about 2000 triangles at a time yet frame rates are about 30 FPS!

Korval,

If you want a muxSum, you must set it to true and both DotProducts to false

I believe the muxsum is done when it’s FALSE, otherwise it does the OR thing.

According to the spec, and by the fact that my code works…

Now that I have combiner code up and working, it aint nearly as complicated as I initially thought it was.

Cheers guys!

Nutty