difference between register combiners and fragment programs

default · March 27, 2003, 9:16am

Hi i’m just wondering the differences between fragment programs and register combiners. are combiners used only when combining multiple textures in the same pass? It seems to me that fragment program can do everything register combiners can do. Am i correct in this? If so does fragment programs replace the need to use register combiners. thanks

Mark_Kilgard · March 27, 2003, 9:34am

Fragment programs are more powerful than register combiners, but fragment programs (whether using the multi-vendor ARB_fragment_program extension or the more powerful NV_fragment_program extension) require more recent GeForceFX GPUs for hardware acceleration (though users of earlier GeForce GPUs can enable the extensions via SW emulation for development purposes - SW emulation is too slow for production use however).

If you want to target the widest range of NVIDIA GPUs, use NV_register_combiners. If you want all the latest functionality, use ARB_fragment_program, or better yet, NV_fragment_program.

If you program with Cg (C for Graphics), you can write your programs in Cg and then simply indicate with your target profile whether to target register combiners or fragment programs. The “fp20” profiles targets register combiners and texture shaders; the “arbfp1” profile targets ARB_fragment_program; and the “fp30” profile targets NV_fragment_program.

One problem is how to deal with this variety of profiles supported by various GPU generations. One solution is writing your Cg shader programs such that a single program can be recompiled, via conditional compilation, to target a basic profile such as fp20 for basic rendering but use the fp30 or arbfp1 profiles if more capable hardware is detected. This can allow you to adapt the shader quality to the available hardware.

The CgFX file format is really useful for this purpose. You can write a shader with CgFX and write multiple “techniques” in Cg embedded within the CgFX file for various GPU capabilities.

I hope this helps.

Mark

Arath · March 27, 2003, 10:06am

Mark, the Cg solution doesn’t work with the ati 8500 at this time (and don’t tell me that one can use the arb_texture_env_combine). BTW is the final combiner the only difference between DX8.1 pixel shader and register combiner? I mean, nvparse can parse either code for a GF3, so is there any difference in term of capacities?

Arath

jra101 · March 27, 2003, 11:01am

Register combiners are a bit more powerfull than DX8 pixelshaders as they directly expose the GeForce3+ hardware.

vincoof · March 27, 2003, 12:07pm

Jason, correct me if I’m wrong but the NV_register_combiners doesn’t expose the dot3_rgba functionality which is available in DX8.1 ps

jra101 · March 27, 2003, 12:12pm

dp4 in ps1.2+ is essentially the same as this:

dp3 r0.rgb, t0, t1
mad r0.a, t0.a, t1.a, r0.b

its just internally changed into multiple instructions (counts as 2 instructions).

Coriolis · March 27, 2003, 12:25pm

You can do a 3 component dot product whose results get stored in all components of rgba with register combiners. If you use two combiners, you can also do a full 4-component dot product.

vincoof · March 27, 2003, 10:45pm

Originally posted by Coriolis:
You can do a 3 component dot product whose results get stored in all components of rgba with register combiners.

With two combiners I clearly see how, but I’d be glad to know how with one combiner.

Jason, if the dot4 instruction can be emulated using 2 combiners, imagine a pixel shader that calls 8 consecutive dot4 instructions : that means GF3-4 need 16 combiner stages to emulate such shader ?

[This message has been edited by vincoof (edited 03-28-2003).]

imported_Asgard · March 27, 2003, 11:52pm

You can’t have 8 dp4s in a D3D 8.1 pixel shader version 1.2 or 1.3.
Quote from the Direct3D documentation:

For pixel shader version 1.2 and 1.3, dp4 counts as two arithmetic instructions.

Regards.

imported_Asgard · March 28, 2003, 1:17am

On the topic of reg combiners/D3D pixel shaders: Does anybody know how to properly implement the D3D pixel shader ‘cmp’ instruction with register combiners?

system · March 28, 2003, 7:43am

In reg combiners, I think all you can do is mux to have a cmp like function.

imported_Asgard · March 28, 2003, 8:02am

Hmm, can you please be a bit more specific on how to do this? I’m looking for a good (or at least very close) replication of cmp’s functionality.
If D3D pixel shaders can do it with two instructions, there should be a way to do it in two combiners, but I don’t see how. Or maybe D3D pixel shaders use some GF3/4 feature not available through register combiners…

jra101 · March 28, 2003, 8:13am

On the topic of reg combiners/D3D pixel shaders: Does anybody know how to properly implement the D3D pixel shader ‘cmp’ instruction with register combiners?

You would use the mux operation. Here’s an example that chooses between two textures depending on the value stored in the alpha of the primary color:

!!RC1.0
{
  alpha
  {
    spare0 = col0.a;
  }
}
{
  rgb
  {
    discard = tex1.rgb;
    discard = tex0.rgb;
    spare0 = mux();
  }
}

if col0.a is greater than 0.5, tex0 is stored in spare0, otherwise tex1 is used.

imported_Asgard · March 28, 2003, 8:18am

Thanks Jason, that emulates D3D’s cnd instruction, but not the cmp instruction (which performs a component-wise comparison >= 0). I don’t really see a good way how this could be done (even when using more than one combiner stage).

vincoof · March 28, 2003, 9:40am

With lots of combiner stages you could subtract the two colors to compare, then add 0.5, and for each component you try the mux() instruction.
For a four-component component-wise comparison you need at least 5 combiner stages but still it is possible.

jra101 · March 28, 2003, 1:31pm

Originally posted by Asgard:
Thanks Jason, that emulates D3D’s cnd instruction, but not the cmp instruction (which performs a component-wise comparison >= 0). I don’t really see a good way how this could be done (even when using more than one combiner stage).

This is ps1.2 (NV25+) functionality that isn’t exposed in OpenGL. You’ll need to do it using multiple mux’s.

imported_Asgard · March 28, 2003, 2:08pm

Jason,
Thanks for the info. And I thought I was just too stupid to find a solution with two combiners
Cheers.

jra101 · March 28, 2003, 2:12pm

I was thinking the same thing, had to ask around to figure out what was up

vincoof · March 31, 2003, 12:33am

If OpenGL vendor-specific extensions directly expose the hardware functionality, how does it come that some D3D features aren’t available to OpenGL ? Do we need some kind of NV_register_combiners3 extension ?

Korval · March 31, 2003, 8:30am

If OpenGL vendor-specific extensions directly expose the hardware functionality, how does it come that some D3D features aren’t available to OpenGL ? Do we need some kind of NV_register_combiners3 extension ?

Like what? What does D3D expose that OpenGL does not? If you’re talking about that one opcode (the name of it slips my mind), bear in mind that it takes 2 instruction slots in D3D, which is exactly what RC’s would take.