PixelShader Questions

Hi

If i have a gfx-card that supports ARB_fragment_program, does it make sense NOT to use that extension and instead use ARB_texture_env_combine or register combiners, ect. (if possible) ?

I mean, are those other (older) extensions hardwired and therefore always present (and fast), or do they get translated into fragment programs by the driver?

Especially on Gf FX i am very unsure, because the fragment programs “unlock” 12 additional texture units and 4 additianal texcoords, so it wouldn´t surprise me, if it´s a good thing not to use fragment programs whenever possible.

Thanks,
Jan.

On Radeons, everything will be roughly the same speed, fillrate wise. Changing ARB_fp programs is still a bit slower than changing fixed function state. This may or may not be a problem, depending on how you generate your shader code (JIT or “on level load”).

You’re right about the GeforceFX line, though. In some models, there are still some integer units augmenting the floating point units. While it’s possible that the integer units get used even for ARB_fp, it’s more work for the driver to figure out the correct circumstances, and it may not work well at all depending on driver version.

NV_fragment_program is the way to get best performance out of NV3x (if only because it explicitly supports integer, half float and full float data types).

If you don’t need floating point data at all, you can just stick to fixed function (or NV_register_combiners + NV_texture_shader{2}). It’s harder to program for, but it again reduces pressure on the driver.

Another thing you should think about is atomicity of shader objects. If you need lots of similar shaders with only a few differences, you need a full shader program for each ‘version’. If you can achieve the same by flipping only a few bits in a fixed function setup, this will most likely be more efficient.

I guess a big performance difference comes from the fact that with fragment programs, you can do a lot more in one pass than with fixed function path, especially when having 16 texture units, so you simply need less rendering passes.

ARB_texture_env_combine is quite restricted anyway, even in two pass, you cannot do really “state of the art” things (only diffuse+specular bump mapping with gloss map, but no specular exponent above 1), RC is a lot better than ARB (well I am quite surprised, I thought you would know all about that ) in these aspects.

I guess the rendering speed itself does not differ at all, no matter if you use ARB, RC or FP (as long as you stay below certain limitations, like max. two general combiners with RC etc.).

I would use fp anyway, if a) my card would support this and b) I would be able to, and c) I would be sure if my client als has fx/radeon cards, which is not the case.

Jan

Originally posted by JanHH:
even in two pass, you cannot do really “state of the art” things (only diffuse+specular bump mapping with gloss map, but no specular exponent above 1)

No, you can’t This: http://www.hut.fi/~ikuusela/images/goodspecular.jpg http://www.hut.fi/~ikuusela/images/inengine.jpg

…is all done with ARB+2 texture units path in three passes per light, and without shadows it could be done in two. The specular isn’t exactly raised to an exponent, but I can choose the sharpness of the highlight freely and the result looks ok to me.

-Ilkka

Originally posted by JanHH:
(well I am quite surprised, I thought you would know all about that ) in these aspects.

Yes, i DO know what is possible with the different extensions. I know their limitations. The question is only, if more recent hardware implements the “old stuff” by replacing it with a fragment program, which would mean that it does not matter to me which extension i use, or if they still support those old extensions with REAL hardware, so that i would get a speedup, if i use “the old stuff” instead of “the new stuff” whenever possible.

Well, if i understand zeckensack correctly, then i don´t have to bother about all that on Radeon 9500+, but i should bother on Gf FX cards.

Thanks,
Jan.

If you care about radeon cards or not is your choice, I would make it dependent on what audience you are writing for. Does this even matter when using ARB_vp and ARB_fp, as far as I understand it, both chipsets support them, so the program should run on both!?

And regarding your initial question, most of us do not at all know what the driver is doing and what happens inside the hardware, I really guess that when using gf fx, the driver converts everything into the same sort of instruction, may it be ARB, RC or FP, whatever this will look like.

Jan

Originally posted by JanHH:
If you care about radeon cards or not is your choice, I would make it dependent on what audience you are writing for. Does this even matter when using ARB_vp and ARB_fp, as far as I understand it, both chipsets support them, so the program should run on both!?
Yes, it will run on both.

aren’t there nvidia driver guys around here who could answer Jan’s question?

But I really think that inside the hardware, there is one way of programming the fragment shading pipeline, and ARB, RC and FP are all translated to that. I cannot imagine that it would be useful to have three rendering paths in hardware co-existing if you can emulate the less powerful ones with the newest one. Also, the cip isn’t designed for OpenGL only but also d3d and so it has to support the d3d shader stuff as well… So I really think there is ONE way the hardware does it, and the driver does the translation work. These drivers must be very complicated pieces of software…

Jan

The way I understand it is that ati is using one path while nv is using three. Nv supports 12,16,32 modes while ati does 24. If I was coding for nvidia I would use their ihv extensions for speed. Ati has that nice 1.4 shader and nv have to use the less flexible texture shaders. But the kick is that ts are still faster even if multipassed. John Carmack '03 .plan file has details. Ati did really good job with arb shaders, imo and nv40 will have to compete with that. Ati new lineup is around the corner as well so is doom3. It’s going to be interesting to watch.

On FX5600 RC is “emulated” via fragment programs with fixed-point precision. I maen, both of them deliver same performance, as my tests show. However, CineFX 1.0 cards have RC units if one believes what people say.
My opinion is to use fragment programs only then if you really need them. I guess Nvidia had the same guideline while designing CineFX, and that’s why the card was such a fiasko. I like NV_FP, otherways(what about RFL instuction )

JD I do not understand even half of what you are saying.

What are 12,16,32 modes and what is 24 mode?

which “ihv extensions” do you mean?

what is a “nice 1.4 shader”?

what are “less flexible texture shaders”, in comparison to the above?

Either I am too far behind (what might be the case) or you are too advanced.

Jan

Originally posted by JanHH:
What are 12,16,32 modes and what is 24 mode?

Precision. 12-bit fixed-point, 16, 32 and 24 bit floats.


which “ihv extensions” do you mean?

NV_fragment_program


what is a “nice 1.4 shader”?

DirectX pixel shader version 1.4, for ATI hardware. It was more poverfull then all the other stuff(support of indirections etc). I guess it was for Radeon 8500(mot shure)


what are “less flexible texture shaders”, in comparison to the above?

NV_texture_shader

thx

but this sounds like “ati is better than nvidia”!?

Jan

I don’t think it would be a god idea to start another ati vs. nvidia battle now.

I’m confused now (or rather, more confused than I already was ).

Basically, nvidia screwed up as far as arb fragment programs go. Gffx needs reduced precision so its hw can be speedy. As far as older cards go, I think the 8500 is comparable to gf3/4 and any lower version of ati hw is worse than equivalent nv hw. The 5900se/xt gffx is the only card in the serie that makes sense buying. It’s a high end card shoved pricewise into mid range market and no doubt is the cash cow for nv.

Well, with the FX5200, FX5600, and FX5800, their floating point performance – regardless of precision – is kinda sucky. The best speed from those cards comes from using the fixed point format, 12 bit integer, in combination with floating point (the majority of instructions being FX12, of course). With the FX5700 and FX5900, however, one can use floating point formats much more freely. As long as you keep the register usage down, performance should be decent. All FP16 does for the card is allow it to use more registers without a performance hit than with FP32.

EDIT: Basically, in the FX5700 and FX5900, it’s not the precision that slows the cards down, but the way NVidia designed the registers.

[This message has been edited by Ostsol (edited 01-03-2004).]

The 5900se/xt gffx is the only card in the serie that makes sense buying.

Actually, the 5700 cards are pretty good, too, compared to the 9600XT cards.

EDIT: Basically, in the FX5700 and FX5900, it’s not the precision that slows the cards down, but the way NVidia designed the registers.

It’s not even that, really. With the NV35+ based cards, the performance difference mostly comes down to R300 cards having dual texture/opcode instruction issuing, though the register thing doesn’t help either.

[This message has been edited by Korval (edited 01-03-2004).]

Korval, I agree. W.r.t 5900 which is 256bit on mem bus the 5700 is only 128bit, has higher core clocks(good) but cost as much if not more than 5900se/xt so it’s meh

Anand, had good 5700 review that explained the changes in hw. I don’t remember the details.