Fragment program on ATI FireGL X1

Hi,

I have a problem with a fragment program on an ATI card. When I load the following program, the routine tells me “native resource exceeded” or sth. like that.
On a GeForce FX it just works fine. I read the spec about texture indirection, but I did not understand that very well. Maybe this could be the problem. I know it’s rather boring reading assembler code, but I appreciate any help!

!!ARBfp1.0

PARAM u1 = program.local[1];
PARAM u0 = program.local[0];
PARAM u2 = program.local[2];
PARAM u3 = program.local[3];
PARAM c0 = {2, 2, 2, 1};
PARAM c1 = {0.5, 0.5, 0.5, 0};
PARAM c2 = {2.7182817, -5.5599999, 0, 0};
TEMP R0;
TEMP R1;
TEMP R2;
TEMP R3;
TEMP H0;
TEX R0.xyz, fragment.texcoord[1].xyxx, texture[1], 2D;
TEX R1.w, fragment.texcoord[3].xyzx, texture[3], 3D;
ADD R0.xyz, R0.xyzx, -c1.x;
MUL R0.xyz, c0.x, R0.xyzx;
MUL R0.w, c2.y, R1.w;
POW R0.w, c2.x, R0.w;
MOV H0.w, c1.w;
DP3 R1.x, fragment.texcoord[5].xyzx, R0.xyzx;
DP3 R0.x, fragment.texcoord[6].xyzx, R0.xyzx;
MOV R1.y, R0.x;
TEX R1, R1.xyxx, texture[2], 2D;
TEX R2.xyz, fragment.texcoord[0].xyxx, texture[0], 2D;
MUL R0.xyz, R1.xyzx, R2.xyzx;
MUL R1.xyz, R1.w, u1.xyzx;
MAD R1.xyz, R0.xyzx, u0.xyzx, R1.xyzx;
MOV R0.x, u3.x;
ADD R0.x, c1.w, -R0.x;
CMP H0.x, R0.x, c0.w, H0.w;
ADD H0.x, -H0.x, c0.w;
TEX R2.w, fragment.texcoord[3].xyzx, texture[3], 3D;
TEX R3.w, fragment.texcoord[0].xyxx, texture[0], 2D;
MOV R1.w, R3.w;
ADD R0.x, c0.w, -R2.w;
CMP R0.x, -H0.x, R0.w, R0.x;
MUL R0.x, R0.x, u2.x;
MUL R0.xyz, R1.xyzx, R0.x;
MOV R1.xyz, R0.xyzx;
MOV result.color, R1;
END

28 instructions, 4 R-regs, 1 H-regs.

End of program

Since the FireGL X1 bases on the R300 chip, I thought there should be no problem with this program.
Thanks for help!

[This message has been edited by mako (edited 09-18-2003).]

You use unneeded swizzles all over the place. While the program should still run, this increases pressure on the compiler. xyzx and xyxx probably count as arbitrary swizzles, something R300 doesn’t really handle natively. Maybe this causes the front end to ‘panic’
Have you tried simplifying the program?
Ie

!!ARBfp1.0

PARAM u1 = program.local[1];
PARAM u0 = program.local[0];
PARAM u2 = program.local[2];
PARAM u3 = program.local[3];
PARAM c0 = {2, 2, 2, 1};
PARAM c1 = {0.5, 0.5, 0.5, 0};
PARAM c2 = {2.7182817, -5.5599999, 0, 0};
TEMP R0;
TEMP R1;
TEMP R2;
TEMP R3;
TEMP H0;
TEX R0.xyz, fragment.texcoord[1], texture[1], 2D;
TEX R1.w, fragment.texcoord[3], texture[3], 3D;
ADD R0.xyz, R0, -c1.x;
MUL R0.xyz, c0.x, R0;
MUL R0.w, c2.y, R1.w;
POW R0.w, c2.x, R0.w;
MOV H0.w, c1.w;
DP3 R1.x, fragment.texcoord[5], R0;
DP3 R0.x, fragment.texcoord[6], R0;
MOV R1.y, R0.x;
TEX R1, R1, texture[2], 2D;
TEX R2.xyz, fragment.texcoord[0], texture[0], 2D;
MUL R0.xyz, R1, R2;
MUL R1.xyz, R1.w, u1;
MAD R1.xyz, R0, u0, R1;
MOV R0.x, u3;
ADD R0.x, c1.w, -R0.x;
CMP H0.x, R0.x, c0.w, H0.w;
ADD H0.x, -H0.x, c0.w;
TEX R2.w, fragment.texcoord[3], texture[3], 3D;
TEX result.color.w, fragment.texcoord[0], texture[0], 2D;
ADD R0.x, c0.w, -R2.w;
CMP R0.x, -H0.x, R0.w, R0.x;
MUL R0.x, R0, u2;
MUL result.color.xyz, R1, R0.x;
END

# 28 instructions, 4 R-regs, 1 H-regs.
# End of program

This simplification will not change output.
More could be done, this is only a first shot at removing unneeded swizzles.

Amazing! It really works! Thanks for your quick response. I compiled the code using NVidia’s Cg compiler, because I am not that experienced in using fragment programs. I now see that Cg’s output is not very well optimized.
Thanks again!

ATI may be interested in this so that they can fix it. It is, after all, a valid program. The swizzles didn’t actually do anything.
These were two cases:
1)Swizzling in a component that will not influence the result due to the target writemask.
2)Swizzling in components into texture coords, beyond what is needed (ie TEX <…>,2D only needs xy coords and ignores the rest anyway).

You should mail to devrel@ati.com, pointing them to this thread. And mention your driver version

Edit: nevermind. I just did the email thing.

[This message has been edited by zeckensack (edited 09-18-2003).]

Just for kicks, here’s a fully optimal version (wrt instruction count and temporaries). Scalar have been moved to the alpha channel to exploit co-issue (helps ATI, shouldn’t hurt NVIDIA; could be possibly already be optimized to work this way inside the driver). Untested.

!!ARBfp1.0

PARAM u1 = program.local[1];
PARAM u0 = program.local[0];
PARAM u2 = program.local[2];
PARAM u3 = program.local[3];
PARAM c0 = {2, 2, 2, 1};
PARAM c1 = {0.5, 0.5, 0.5, 0};
PARAM c2 = {2.7182817, -5.5599999, 0, 0};
TEMP R0;
TEMP R1;
TEX R0.xyz, fragment.texcoord[1], texture[1], 2D;
TEX R0.w, fragment.texcoord[3], texture[3], 3D;
MAD R0.xyz,R0,c0.x,-c1.x;
MUL R0.w, R0.w,c2.y;
POW R0.w, c2.x, R0.w;
DP3 R1.x, fragment.texcoord[5], R0;
DP3 R1.y, fragment.texcoord[6], R0;
TEX R0.xyz, fragment.texcoord[0], texture[0], 2D;
TEX R1, R1, texture[2], 2D;
MUL R0.xyz, R1, R0;
MUL R1.xyz, R1.w, u1;
MAD R1.xyz, R0, u0, R1;
TEX R1.w, fragment.texcoord[3], texture[3], 3D;
TEX result.color.w, fragment.texcoord[0], texture[0], 2D;
SUB R1.w,c0.w,R1.w;
CMP R1.w,-u3.x,R1.w,R0.w;
MUL R1.w, R1, u2;
MUL result.color.xyz, R1, R1.w;
END

I have to say the compiler overlooked a couple of fairly obvious things. Is that a current version? Quite horrible, for my tastes …