Additional calculation messes up result

hi everybody!

I’m currently facing a really weird problem within a fragment shader.
I calculate the light emitted from two kind-of neonlights (so they’re not a “real” lightsource of opengl)
I’m passing the needed information to the vertex shader by using attributes.
when I calculate only the lighting of one of the neon-lights, then everything’s fine.
when I calculate both, the shader hangs and I have to kill the application.
when I calculate both, but do not use the second result (means: the calculated variable is NOT used anywhere) it does not hang but mess up the result.
how is it possible, that it influences the result ?

can it be a problem with too many varying variables ? I use quite a few, but I get no warnings from the compiler.

thanks for answers in advance.

Not sure about messed up results, but the “hang” sounds like it could just be that it went into software.

hmm, that’s actually possible.
which imposes the question, what would cause that. there’s nothing spectacular going on…

GLSL compiler is smart enough to remove unused calculations. So… even if you calculate something in shader and you don’t use it, compiler will remove such code.
Maybe you can post shader code here and hw specification (OS, gfx card, driver version). It might be a driver bug.

yooyo

o.k, here goes the fragment shader code.
what I have is a spline that acts as a neon light.
all the vectors, vertices and stuff that make up the light is put into those varyings like base, dir, etc…

[b]
varying vec3 N;
varying vec3 L;
varying vec3 P;
varying float fogZ, g, g1;
varying vec3 base, dir, start;
varying vec3 base1, dir1, start1;

void main (void)
{
float dist, dist1;
[/b]

the next part calculates the distance of the fragment to a line segment of the glowing strip.
many preprocessing done in the vertex shader.
dist and dist1 are then simply a linear intensity value of the light.
if I remove one of the two calculations, everythings fine.

[b]
dist = 1.0 - ((length(cross(vec3(dir), vec3(start))) / length(dir)) / 3.0);
dist = clamp(g*dist, 0.0, 1.0);
dist *= (1.0-dot(N, dir.xyz));

dist1 = 1.0 - ((length(cross(dir1.xyz, start1.xyz)) / length(dir1.xyz)) / 3.0);
dist1 = clamp(g1*dist1, 0.0, 1.0);
dist1 *= (1.0-dot(N, dir1.xyz));

< standard lighting calculation goes here >

vec4 color = gl_FrontMaterial.ambientgl_LightSource[0].ambient +
gl_FrontMaterial.diffuse
gl_LightSource[0].diffusediffuse +
gl_FrontMaterial.specular
gl_LightSource[0].specular*spec;
[/b]

this does fog calculations and adds the color components of the light strips (one is green, the other one is red)


gl_FragColor = (1.0-f) * gl_Fog.color + f * (color + vec4(0, 1, 0, 1)*dist + vec4(1, 0, 0, 1)*dist1);
}

my configuration is:

  • radeon9800 pro
  • gl version: 2.0.5279 winxp release
  • win xp, sp2
  • catalyst 5.8 (latest driver that is, I also thought it’s a driver bug)

Im not expert for ATI, but it looks like you hit hardware limits in number of varyings. Try to put fogZ, g, g1 into one vec3.

yooyo

I already tried that, didn’t help :-/

I just check my NV-6800GT. It can interpolate up to 32 floats (8 * vec4).

If speed is not important, try to squeeze extra varyings by interpolating N, L and P as vec4 and use its .w for fogZ, g and g1.
Then, try to change from vec3 to vec4 base, dir and start, and use it’s .w for base1.xyz.

After this squeezing it will end up with 8 varyings. Code may work in hw but you have to deal with unpacking.

btw… This squeezing is not good for performances.

yooyo

I just check my NV-6800GT. It can interpolate up to 32 floats (8 * vec4).

If speed is not important, try to squeeze extra varyings by interpolating N, L and P as vec4 and use its .w for fogZ, g and g1.
Then, try to change from vec3 to vec4 base, dir and start, and use it’s .w for base1.xyz.
No! What good is a high level language if you’re stuck doing work for the compiler all the time?? According to the GLSL Specification, putting three floats into a vec3 should make no difference. I’m not going to quote the spec here, but it’s on page 83 in the last paragraph (Section 2.15.3 - Shader Variables). This is also the case for uniform variables.
You might have to do this on some cards, but it would be against the specification to report one number (for example ) and not allow for 16 vec2s, 16 floats plus 8 vec2s, etc. It’s not natural to pack components of multiple unrelated varyables into one just so it can fit.
GLSL is a high level language and should be treated as such. Developers shouldn’t have to do work that the compilers should be doing. In fact, the compiler should be able to optimize naturally written GLSL code to each implementer’s hardware. If certain hardware can only interpolate vec4 varyings, they should pack them behind the scenes in the driver, not force developers to code in an unnatural way!
I’m not an expert on ATI hardware, hopefully they do follow the spec on this issue. Your shader has 30 varyings as written, and thus this shouldn’t be the problem.

@kingjosh:

You are right, but unfortunatly comilers may fail in this case. I just suggest to try to squeeze varyings. If it work… the blame driver developers. If not… well, I just give a shot.

yooyo

No! What good is a high level language if you’re stuck doing work for the compiler all the time?? According to the GLSL Specification, putting three floats into a vec3 should make no difference. I’m not going to quote the spec here, but it’s on page 83 in the last paragraph (Section 2.15.3 - Shader Variables). This is also the case for uniform variables.
You should probably reread the spec.

OpenGL Specification 2.0 - Section 2.15.3

When an attribute variable declared as a float, vec2, vec3 or vec4 is bound
to a generic attribute index i, its value(s) are taken from the x, (x, y), (x, y, z), or
(x, y, z,w) components, respectively, of the generic attribute i.
The compiler thus cannot put a scalar varying in the w component (or any other unused component) of an attribute.

You lose some performance if the compiler does merging of varyings due to swizzling costing an instruction, so the compiler decides not to do it for you.

Originally posted by kingjosh:
You might have to do this on some cards, but it would be against the specification to report one number (for example ) and not allow for 16 vec2s, 16 floats plus 8 vec2s, etc.
There’s nothing saying it must run in hardware. It must work, that’s the only thing guaranteed. But if the compiler can’t take care of it, it may run in software, and if that’s an issue for you, you may have to work around it, regardless of all these "should"s.

Originally posted by al_bob:
[b]You should probably reread the spec.

[quote]
OpenGL Specification 2.0 - Section 2.15.3

When an attribute variable declared as a float, vec2, vec3 or vec4 is bound
to a generic attribute index i, its value(s) are taken from the x, (x, y), (x, y, z), or
(x, y, z,w) components, respectively, of the generic attribute i.
The compiler thus cannot put a scalar varying in the w component (or any other unused component) of an attribute.[/b][/QUOTE]Don’t confuse attributes and varyings.

As for how to get it to run in hardware, as mentioned, try packing varyings together. If any of the varyings can be mapped in the [0, 1] range, or if it can be packed in that range, you can probably use gl_Color and gl_SecondaryColor. In the worst case, perhaps some of those varyings can be computed in the fragment shader instead of the vertex shader.

You should probably reread the spec.
Uh, you should probably start reading it at the beginning. I believe you’ll find sections 4.3.4 and 4.3.6 of particular interest.

Humus, why can’t the compiler pack these behind the scenes if that’s what is required to run in hardware?

If any of the varyings can be mapped in the [0, 1] range, or if it can be packed in that range, you can probably use gl_Color and gl_SecondaryColor.
Please, do not do this. This type of coding degrades the integrity of GLSL . Wait for better compilers or get a card that can handle more varyings.

Originally posted by kingjosh:
Humus, why can’t the compiler pack these behind the scenes if that’s what is required to run in hardware?
Dunno. Probably just not implemented.

Originally posted by kingjosh:
Please, do not do this. This type of coding degrades the integrity of GLSL . Wait for better compilers or get a card that can handle more varyings.
The integrity of GLSL, what the heck is that even supposed to mean? It’s not forbidden to write semi-ugly GLSL code. You sound a bit like those datalogical academics who live in a world where everything is object oriented, no class data is public, everything is in hungarian notation, no function names contain abbreviations, compilers automagically produces optimal code regardless of input, and there’s capital punishment on the use of goto. I on the other hand live in the real world, where hardware has limited capabilities and compilers are still just a piece of software that simply can’t optimally map all the infinite number of combination of statements to hardware. If the use of goto gives me a significant speedup on x86 in a critical piece of code, I’ll go for it. If gl_Color allows me to use another vec4 varying, despite the data passed not actually being a color as implied by the name, I’d go for it. When the compiler takes care of the situation better it can be cleaned up if needed.

Frankly, I don’t think lessons on how to write nice GLSL code is what styx came here for. I think he primarily wants to get his code running in hardware. GLSL may be a high level language, but that doesn’t mean you can just forget that there’s hardware under the hood. Just like in C++, if you write code that’s close to the hardware, you’ll achieve better performance. Laying out things more explicitely can often result in better performance. I don’t consider that bad practice at all. In fact, I consider it to be good practice. If you have a scale and bias, it’s not bad practice to put that in a vec2 as opposed to two floats. This way it’s more likely that it ends up in the same constant register, and thus will likely run faster.

I apologize if I hit a sore spot Humus. I realize that Styx didn’t come for a lesson, I was hoping the point would be read by his hardware vendor.

IMHO, compilers should be better at optimizing code than the average developer is expected to be. You’re right, if this particular developer wants to run on his hardware, he’ll have to pack his varyings. My only point was that he shouldn’t have to.

GLSL may be a high level language, but that doesn’t mean you can just forget that there’s hardware under the hood.
That is, in fact, the point of a high level language. Indeed, not having to do nonsense like this was one of the selling points behind integrating a high level compiler into an OpenGL driver.

The compiler ought to be doing this kind of stuff. The reason we agreed (or the ARB agreed. I never did) to sacrifice a bunch of shader compilation/linking performance to put a high level compiler into the driver was very specific: to allow compilers to better optimize the compiled result for their hardware. That was its purpose.

To not do this is a violation of that agreement, and, to my mind, smacks of fraud. We gave up quite a bit for this advantage; if we aren’t getting it because ATi is lazy, screw them. Maybe developers will start putting “nVidia only” stickers on their products.

Or, even worse, the inconsistency between glslang implementations mean that developers simply can’t afford to ship a product that relies on it, and they abandon the language.

Originally posted by kingjosh:
IMHO, compilers should be better at optimizing code than the average developer is expected to be.
The problem is that the day GLSL was introduced the developer expectations grew by several orders of magnitude in one big step. Compilers keep improving, but it’s a very big piece of software and as developer expectations keep growing as well there will still be cases where developer expectations aren’t met. I’m not saying it shouldn’t have to take care of this case, I’m saying that it’s a limitation in the current compiler. No vendor’s compiler is optimal in all cases, and none will ever be.