Does this answer your question?
The short answer is I reorganized the ARB assembler instructions of my fragment program via my own newly aquired ideas and knowledge, which eliminated the texture indirection error.
Oh, before that In my earlier response when I responded to Jwatte, I forgot to mention, I only checked with the Cg compiler, I know next to zip about GLSL, which I think his code was. The Cg compiler outputs the same, whether the HLSL is written in a temporary-heavy fashion or an indirection-heavy fashion, it favored the indirection-heavy organization in it’s fragment output code either way.
I load up my fragment and vertex programs on my own and use the ARB extensions to control them in OGL, just the way they do it in the ATI Simple Shader demo, the one with the cute little elephant.
You should be able to copy this code and use it straight away with cg or the ARB_ extensions, my hardware is an Radeon 9800. I have no idea if or what the result would be on something else, but it works famously on mine.
!!ARBfp1.0
ARB_fragment_program generated by NVIDIA Cg compiler
cgc version 1.1.0003, build date Jul 7 2003 11:55:19
command line args: -profile arbfp1 -entry ps_main
#vendor NVIDIA Corporation
#version 1.0.02
#profile arbfp1
#program ps_main
#semantic Texture0
#var sampler2D Texture0 : : texunit 0 : -1 : 1
#var float4 inDiffuse : $vin.COLOR0 : COLOR0 : 0 : 1
#var float2 tex : $vin.TEXCOORD0 : TEXCOORD0 : 1 : 1
#var float4 ps_main : $vout.COLOR0 : COLOR0 : -1 : 1
PARAM c0 = {0.11111111, 0.0024999999, 0, 0};
PARAM c1 = {-0.0024999999, 0, 0, 0.0024999999};
PARAM c2 = {0, -0.0024999999, -0.0024999999, 0.0024999999};
PARAM c3 = {0.0024999999, 0.0024999999, 0.0024999999, -0.0024999999};
TEMP R0;
TEMP R1;
TEMP R2;
TEMP R3;
TEMP R4;
TEMP R5;
TEMP R6;
TEMP R7;
TEMP R8;
MOV R0.xy, fragment.texcoord[0];
ADD R1.xy, R0, c0.yzyy;
ADD R2.xy, R0, c1.xyxx;
ADD R3.xy, R0, c1.zwzz;
ADD R4.xy, R0, c2.xyxx;
ADD R5.xy, R0, c2.yzyy;
ADD R6.xy, R0, c2.zwzz;
ADD R8.xy, R0, c3.zwzz;
ADD R7.xy, R0, c3.xyxx;
TEX R0, R0, texture[0], 2D;
TEX R1, R1, texture[0], 2D;
TEX R2, R2, texture[0], 2D;
TEX R3, R3, texture[0], 2D;
TEX R4, R4, texture[0], 2D;
TEX R5, R5, texture[0], 2D;
TEX R6, R6, texture[0], 2D;
TEX R7, R7, texture[0], 2D;
TEX R8, R8, texture[0], 2D;
ADD R0,R0,R1;
ADD R0,R0,R2;
ADD R0,R0,R3;
ADD R0,R0,R4;
ADD R0,R0,R5;
ADD R0,R0,R6;
ADD R0,R0,R7;
ADD R0,R0,R8;
MUL result.color, R0, c0.x;
END
26 instructions, 2 R-regs, 0 H-regs.
End of program
btw, ignore the #comments as this is modified from the original compiler output, they are all pretty irrelevant anyway.
Jes