Fragment shader lenght : ATI vs nVidia

Hi, I read some people saying it was easy produce code that exceeds the maximum length of the fragment shading unit on ATI boards when coding in GLSL, which would force CPU based execution of the code.
But how shorter is it than nVidia’s fragment shading unit? are there any benchmarks I could consult to get a more precise info? thnx alot.

If the GLSL implementations are based on the previous fragment_program extensions, querying GL_MAX_PROGRAM_INSTRUCTIONS_ARB defined in ARB_fragment_program should give you the maximum instructions on your chip.

nVidias NV3x generation is able to run shaders with a lenght of 1024 instructions.
ATIs R3xx generation is able to execute shader with a lenght of about 94 (or 96, don’t know right now) instructions. Theoretically ATI R3xx hardware supports a technique called F-Buffer which allows to split up shaders and execute them in multi-pass but there’s no driver support yet…

All right thank you guys. One more question : Using GLSL, how can I figure out the actual lenght (asm equivalent) of a shader I’ve just written? how can I get feedback from the API after the compilation step?

There’s no way to get the actual lenght of the shader you have written. The only feedback you will get is the InfoLog.

Si I guess I’m on to some sort of time consuming trial-and-error process… Who said that GLSL was better than ASM :smiley: ?

R4x0 will have longer shader(I don’t know corrctly, but at least >=256 fp instruction),
NV4x - 24K(!)fp…
2Corrail
R3x0 - 64 fp instruction.

Only 64? Thought that was about 94…
Thanks for the info!

I believe it’s actually 64 ALU instructions and 32 texture instructions, for a total maximum of 96 instructions. NV3x does not make this distinction, so the maximum of 1024 is regardless of instruction type.

– Tom

What the R3xx really provides internally is 32 triple opcodes. Each opcode may include a texture addressing operation, a 3-vector operation, and/or a scalar operation. So, using clever operation reordering, you can get 96. But, technically, you only have 32 of each type of instruction.

Si I guess I’m on to some sort of time consuming trial-and-error process
That’s the process everyone has to go through when working on limitted systems. When people use machines without virtual memory and only 16MB of RAM, they keep building stuff until it runs out of memory, then they start cutting things. The only difference is that you aren’t aware of whether or not any particular cut will be meaningful until you actually do it.

No, you can do 64 vector operations. The 96 instructions figure only includes “normal” fragment program ALU ops and texture ops (64 ALU, 32 TEX). In preactice, you might be able to have more instrucitons by utuilising the separate scalar unit but that isn’t guaranteed.

In real apps usually used 1-4 TEX instruction and a lot of ALU’s instructions. By this reason I didn’t calculate TEX instructions togetehr with ALU instruction.

That’s the process everyone has to go through when working on limitted systems. When people use machines without virtual memory and only 16MB of RAM, they keep building stuff until it runs out of memory, then they start cutting things. The only difference is that you aren’t aware of whether or not any particular cut will be meaningful until you actually do it.
For any language you can get an idea of the size of the binary code once it’s compiled. I guess GLSL just gets compiled by the driver to arb asm instructions, so we should be able to have better feedback from the compiler. I know GLSL’s been designed to be higher level and to abstract away from hardware, but c’mon man, we’re talking real time operation here :wink: so we need some way to figure out (even roughly) how many instructions the shaders get to eat…

I guess GLSL just gets compiled by the driver to arb asm instructions
Then you guessed wrong. Glslang is compiled directly to the hardware opcodes.

but c’mon man, we’re talking real time operation here so we need some way to figure out (even roughly) how many instructions the shaders get to eat
Why do you think I argued against a high-level shading language at this juncture? It’s too soon to be abstracting away stuff. But no, the ARB (mostly 3DLabs) just had to have their language incorporated into the driver…

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.