Problems with dot3 cgShader on differnd profiles

Hi,

I’ve implemented a dot3 cgShader and it works fine on my GeforceGT 6800 with vp/fp40. But i’ve got several problems with running that shader on older hardware …

After some tests i’ve figured out, that the fragmend program is the part which will not run. Here is my code:

// input structure 
struct vert2frag 
{ 
    float2 TextureCoords      :TEXCOORD0; 
    float3 TangentSpaceLightPos   :TEXCOORD1; 
}; 

void main(const vert2frag IN, out float3 oColor:COLOR, 
        float4 AmbientLight, 
        float4 LMs : TEXCOORD4, 
        const uniform sampler2D NormalMap, 
        const uniform sampler2D DecalMap ) 
{ 
   // normalize Light 
   float3 normTangentSpaceLightPos = normalize(IN.TangentSpaceLightPos); 

   // Do Dot3 calculation (N.L)  (multiplication for more reflection) 
   float3 nmap= 2*(tex2D(NormalMap,IN.TextureCoords)-0.5).rgb; 
        
    //adding decal color (multiplication for better color results) 
   oColor = dot(nmap.xyz, normTangentSpaceLightPos)*tex2D(DecalMap,IN.TextureCoords); 
}

On fp30 i got no compile failures but i see nothing. I’ve checked the code and it’s rendered … On fp20 i got the compile Error that normalize will not work … but i can’t belive that this simple and nesessary function will not work …

Knows anyone whats happend and could help?
Thanks in advance,
Christian

What driver version are you using? Also can you post an assembly output of your fragment program in both fp30 and fp40?

Have you tried the ARBfp1 profile?

Also fp20 does not support a fragment level normalization function. On that level of hardware, the only per-fragment normalization you are going to do is with a normalization cubemap lookup. I think it also helps to work with the register combiners directly for a while first so you will become familiar with what that hardware can do. Otherwise it will be very easy to write a high-level shader that is not compatable with fp20 b/c the limitations are not known (by the programmer). It really helped me in writing fp20 Cg shaders that I come from using register combiners a lot previously. Also I notice you are using TEXCOORD4 as one of the inputs. Under fp20, you only have 4 tex units to use, TEXCOORD4 is a 5th unit.

Also you will get better performance if you use the lower texture coord units first. So instead of using TEXCOORD4 in your fragment program there, use TEXCOORD2 instead (since 0 and 1 are already being used). It probably won’t make a huge difference here, but it’s just good practice b/c every little bit help.

EDIT: Typos…

-SirKnight

Thanks for your help i will test it on fp20 following your hints…

On fp30 i got this assembly output:

!!FP1.0
# cgc version 1.3.0001, build date Aug  4 2004 10:01:10
# command line args: -profile fp30
# source file: Dot3_fs.cg
#vendor NVIDIA Corporation
#version 1.0.02
#profile fp30
#program main
#semantic main.AmbientLight : COLOR
#semantic main.SpecularLight : COLOR
#semantic main.DecalMap
#semantic main.NormalMap
#var float2 IN.TextureCoords : $vin.TEX0 : TEX0 : 0 : 1
#var float3 IN.TangentSpaceLightPos : $vin.TEX1 : TEX1 : 0 : 1
#var float3 IN.HalfAngle :  :  : 0 : 0
#var float4 oColor : $vout.COL : COL : 1 : 1
#var float4 AmbientLight : COLOR :  : 2 : 0
#var float4 SpecularLight : COLOR :  : 3 : 0
#var sampler2D DecalMap :  : texunit 0 : 4 : 1
#var sampler2D NormalMap :  : texunit 1 : 5 : 1
TEX   R1.xyz, f[TEX0], TEX1, 2D;
TEX   R0, f[TEX0], TEX0, 2D;
DP3R  R1.w, f[TEX1], f[TEX1];
RSQR  R1.w, R1.w;
ADDR  R1.xyz, R1, {-0.5}.x;
MULR  R2.xyz, R1.w, f[TEX1];
MULR  R1.xyz, R1, R2;
DP3R  R1.x, R1, {2}.x;
MULR  o[COLR], R1.x, R0;
END

On fp40 i got this:

!!ARBfp1.0
OPTION NV_fragment_program2;
# cgc version 1.3.0001, build date Aug  4 2004 10:01:10
# command line args: -profile fp40
# source file: Dot3_fs.cg
#vendor NVIDIA Corporation
#version 1.0.02
#profile fp40
#program main
#semantic main.AmbientLight : COLOR
#semantic main.SpecularLight : COLOR
#semantic main.DecalMap
#semantic main.NormalMap
#var float2 IN.TextureCoords : $vin.TEX0 : TEX0 : 0 : 1
#var float3 IN.TangentSpaceLightPos : $vin.TEX1 : TEX1 : 0 : 1
#var float3 IN.HalfAngle :  :  : 0 : 0
#var float4 oColor : $vout.COL : COL : 1 : 1
#var float4 AmbientLight : COLOR :  : 2 : 0
#var float4 SpecularLight : COLOR :  : 3 : 0
#var sampler2D DecalMap :  : texunit 0 : 4 : 1
#var sampler2D NormalMap :  : texunit 1 : 5 : 1
#const c[0] = 0.5 2
PARAM c[1] = { { 0.5, 2 } };
TEMP R0;
TEMP R1;
TEMP RC;
TEMP HC;
DP3R  R0.x, fragment.texcoord[1], fragment.texcoord[1];
RSQR  R0.w, R0.x;
TEX   R0.xyz, fragment.texcoord[0], texture[1], 2D;
ADDR  R0.xyz, R0, -c[0].x;
MULR  R1.xyz, R0.w, fragment.texcoord[1];
MULR  R0.xyz, R0, R1;
DP3R  R0.x, R0, c[0].y;
TEX   R1, fragment.texcoord[0], texture[0], 2D;
MULR  result.color, R0.x, R1;
END

I’ve checked the driver for the fp30 tests i have the newest Forceware beta driver (75.90).

I’ve checked the shader on the notebook too with a ATI Radeon mobility 9700 with ARBVP1/ARBFP1 and i get the same problem like the fp30. No compile error, but nothing is rendered using the shader.

Bye,
Christian

don’t you need to saturate the dot(nmap.xyz, normTangentSpaceLightPos) call usually?
(I fail to see how it would affect the result in this case, but if you do further processing, you do not want to be using negative values)

Yeah you should change the last line to this:

oColor = max( dot(nmap.xyz, normTangentSpaceLightPos), 0 ) * tex2D(DecalMap,IN.TextureCoords);

Well the assembly code looks virtually identical. I just wondered if there was a cg compiler bug that generated some weird instruction, which I was doubting.

Your fragment program looks fine to me so it’s either a bug in your beta drivers or something in your main code is doing something weird. I’m not quite sure atm.

-SirKnight

Thanks for your help, i will change the last instruction.

Ok, i will check my application code again. What i couldn’t understand is, that the shader runs on my Geforce 6800 GT perfectly, when i change the profile from best (vp/fp40) to (vp/fp30) i got the same problems on my environment …

Another fp20 limitation is you cannot fetch from multiple textures using a single texture coordinate. If your shader uses the same texture coordinate to fetch from the decal and normal maps, you need to have your vertex shader write this texture coordinate to two TEXCOORD outputs.

The problem is solved! The failure was not in the fragment program …

I’ve used the glstate.matrix.mvp operation in the vertex program which looks like incompatible with the vp30/fp30 Shaders…

Now i got the matrix from the application as 4x4 float parameter and all works fine …

Thanks for your help
Christian

Yes, that glstate struct only works with the arbvp1 and arbfp1 profiles.