View Full Version : passing spotDirection from VS to FS works on ATI, not on nVidia?

10-25-2005, 12:26 AM
Hello folks, got a strange problem over here:

I've written a shader to perform PPL and bump/offset mapping (or parallax, call it the way you prefer ;) ) and it's working fine on my ATI 9800 Pro, but not really on the nVidia FX 5200 at the office.

I actually store the spotdirection of my light in a varying of my VS, and I use the interpolated value in my FS to perform light calculations. I need to do this to be able to transform my light vector to tangent space in my VS. The issue is that the interpolation doesn't seem to work on nVidia, and worse, I even got some strange behavior at the "edges" of my spot, the surfaces become black (as if the exponent or the cutoff were wrong or whatever... ?)

One other thing: on my 9800 Pro with a 3000+ Barton, my ppl-spot shader (even without bump) is darn slow when rendered on a Q3-format level (IBSP) optimized through the use of the bsp. Could it be normal or am I doing something wrong?


10-25-2005, 03:26 AM
To provide a little more detail, here's the code


varying vec4 diffuse, ambient, globalAmbient;
varying vec3 normal, lightDir, halfVector, sd;
varying float dist;

void main()
vec3 v_Normal = normalize(gl_NormalMatrix*gl_Normal); // normal to eye space
vec3 v_Tangent = normalize(gl_NormalMatrix*gl_MultiTexCoord2.xyz); // tangent to eye space
vec3 v_Binormal = normalize(gl_NormalMatrix*gl_MultiTexCoord3.xyz); // binormal to eye space

mat3 tangentBasis = mat3( // in column major order
v_Tangent.x, v_Binormal.x, v_Normal.x,
v_Tangent.y, v_Binormal.y, v_Normal.y,
v_Tangent.z, v_Binormal.z, v_Normal.z);

vec4 ecPos;
vec3 aux;

ecPos = gl_ModelViewMatrix * gl_Vertex;
aux = vec3(gl_LightSource[0].position-ecPos);
lightDir = aux;
dist = length(aux);

normal = normalize(gl_NormalMatrix * gl_Normal);// vertex to eye coordinates
halfVector = normalize(gl_LightSource[0].halfVector.xyz);

lightDir = tangentBasis * lightDir;
halfVector = tangentBasis * halfVector;
sd = tangentBasis * gl_LightSource[0].spotDirection;

/* Compute the diffuse, ambient and globalAmbient terms */
diffuse = gl_FrontMaterial.diffuse * gl_LightSource[0].diffuse;
ambient = gl_FrontMaterial.ambient * gl_LightSource[0].ambient;
globalAmbient = gl_LightModel.ambient * gl_FrontMaterial.ambient;

/* pass texture coords to fragment shader */
gl_TexCoord[0] = gl_MultiTexCoord0;

gl_Position = ftransform();

varying vec4 diffuse, ambient, globalAmbient;
varying vec3 normal, lightDir, halfVector, sd;
varying float dist;

//uniform sampler2D tex;
uniform sampler2D decalMap;
uniform sampler2D normalMap;

void main()
vec3 n,l,halfV;
vec4 texel;
float NdotL,NdotHV;
float att;
float spotEffect;

/* The ambient term will always be present */
vec4 color = globalAmbient;

/* a fragment shader can't write a varying variable, hence we need
a new variable to store the normalized interpolated normal */
vec2 tuv = vec2(gl_TexCoord[0].s, -gl_TexCoord[0].t);
n = 2.0 * (texture2D(normalMap, tuv).rgb - 0.5);
n = normalize(n);
l = normalize(lightDir);

/* compute the dot product between normal and ldir */
NdotL = max(dot(n,l),0.0);

if (NdotL > 0.0)
spotEffect = dot(normalize(sd), normalize(-l));
if (spotEffect > gl_LightSource[0].spotCosCutoff)
spotEffect = pow(spotEffect, gl_LightSource[0].spotExponent);
att = spotEffect / (gl_LightSource[0].constantAttenuation +
gl_LightSource[0].linearAttenuation * dist +
gl_LightSource[0].quadraticAttenuation * dist * dist);

color += att * (diffuse * NdotL + ambient);

halfV = normalize(halfVector);
NdotHV = max(dot(n,halfV),0.0);
color += att * gl_FrontMaterial.specular *
gl_LightSource[0].specular *
pow(NdotHV, gl_FrontMaterial.shininess);
//color = vec4(n.x, n.y, n.z, 1.0);

texel = texture2D(decalMap,gl_TexCoord[0].st);
color *= texel;

gl_FragColor = color;
}Here's a screenshot describing the issue. Notice the black borders. This artefact DOESN'T show up on the ATI board...

Screenshot (http://www.hardtopnet.net/misc/nVidia%20FX5200%20issue%20with%20shader%20and%20sp ot.jpg)

10-25-2005, 09:00 AM
Your vertex shader uses 9 varying parameters, try redeclaring one of your vec3 varying parameters as a vec4 and put the "dist" parameter in the w component of this vec4.

10-25-2005, 11:45 AM
I converted the code tonight, it's working on my ATI, and I'll give it a try tomorrow on the FX5200 at the office.

10-25-2005, 10:58 PM
hehe, who else could help me better than a guy from nVidia to solve a nVidia issue? :) Thanks, it worked. I declared only 8 varyings, but I assumed the predefined one (gl_texCoord[0]) was to be taken into account.

The "black borders" issue still shows up, but it could obviously be a bug with these boards.

http://www.hardtopnet.net/misc/nVidia%20FX5200%20issue%20with%20shader%20and%20sp ot.jpg

BTW, if I wanted to use more than one light, I'd be stuck with my 9 varyings, I guess? Will I have to render it with multiple passes using blending with no other choice?


10-25-2005, 11:31 PM
You can do many more lights (until you hit the fragment shader limits) as you are not writing very optimal code.

For instance:
/* Compute the diffuse, ambient and globalAmbient terms */
diffuse = gl_FrontMaterial.diffuse * gl_LightSource[0].diffuse;
ambient = gl_FrontMaterial.ambient * gl_LightSource[0].ambient;
globalAmbient = gl_LightModel.ambient * gl_FrontMaterial.ambient;

You burn up a lot of varyings fo nothing. These (and others) could all be uniforms just accessed in the fragment shader.

(Eg. Rip out the above varyings used above just use the following code in the fragment shader:

gl_LightProducts[0].specular (use instead of doing a multiply in the fragment shader)



There are many other changes you could make to further decrease instruction counts and varying usage.

10-26-2005, 01:06 AM
Indeed, these are not optimized at all, I was just trying and have the shaders operate properly. Now I guess I could improve them a lot.

Just one thing I was wondering: when I use fixed-function pipeline, I get about 50-60 FPS on the FX5200, whereas I get only 5-6 FPS when using my shaders. I know I should even have more with FFP -my C++ code needs optimization too- but 5-6 FPS even with non optimized shaders does seem REALLY slow, doesn't it? Am I missing something? The same performance hit also occurs on my ATI Radeon 9800 Pro. :confused:

PS: maybe I should start another thread for this rather sensitive topic :)

10-26-2005, 02:25 AM
Yes well un-optimized is "unoptimized" so expect slow framerates.

Also, you CANNOT compare against fixed function as you are doing lighting calculations Per-PIXEL. Fixed function does the same calculations Per-VERTEX.

So if you want to do a fair comparasion:
a) Write optimized GLSL code
b) Do all lighting calculations in the vertex shader. (Infact, don't even use a fragment shader if you don't need it)

Zulfiqar Malik
10-26-2005, 05:39 AM
I believe he was talking about "GL_ARB_texture_env_dot3" extension when he mentioned FFP (or am i wrong?). Anyways, GPU Programming Guide on nVidia's website is a good place to start. It will give you some rules of thumb that usually work for most shaders. But, as Michael Abrash would say "The best optimizer is between your ears" :) . Keep trying, these things take time to learn.

10-26-2005, 05:52 AM
Thanks for your answers guys.

Actually I was NOT talking about DOT3 extension. I managed to write my own per-pixel lighting shader to perform bump mapping (as shown on the image), and also parallax-mapping (its code is not in the shader I posted as you noticed, but it's working here).

Of course I know every single normal and light vector has to be interpolated across the whole face, which is of course much heavier than computing them per-vertex.

But I thought the performance hit was really too great for such a "small" shader, as compared to what I've seen in demos or games; even if I'm definitely not a GLSL-Guru ;) (not at all... :D ) I was hoping to see a shader running if not smoothly, at least not a tenth of the original speed...

-I've seen a guy who wrote a shader with 40+ lights, bump and parallax mapping, and everything was running fine... altough the geometry was quite small, in my code I optimize the culling by using a BSP, so I shouldn't perform calculations on such a great number of faces, I guess - :confused: