Bug in Vertex Shader (NVIDIA)?

Hi Again!

I’m still getting strange results with my vertex shader. Maybe its just me not finding some stupid mistake but I can’t think of any explanation for the behaviour I am observing.

Vertex shader source:

00: #pragma optimize(off)
01: const uniform vec3 u_sec_size;   // sector size
02: const uniform vec3 u_sec_pos;    // sector position
03: #define SIZE u_sec_size.x
04: #define UNIT u_sec_size.y
05: #define IDX  gl_MultiTexCoord0.x
06: void main()
07: {
08:   vec4 pos = gl_Vertex;
09:   float my_size = 9.0;
10: //if (SIZE == 9.0) my_size = SIZE;
11:   float x = mod(IDX, my_size);
12:   if (x == my_size && my_size == 9.0)
13:     gl_FrontColor = gl_BackColor = vec4(0.0,  0.0, 0.0, 1.0);
14:   else
15:     gl_FrontColor = gl_BackColor = vec4(0.8,  0.8, 0.8, 1.0);
16: //pos[0] = x * UNIT + u_sec_pos.x;
17:   gl_Position = gl_ModelViewProjectionMatrix * pos;
18: }

Rendering with the shader as stated above yields this image which is the result that I am expecting. Uncommenting line 10 produces this image . I really don’t get the point why this line of code should change the resulting image or why mod() returns the result I’m obviosly getting here. Maybe someone could shed some light on this issue. I’m somwewhat stuck with this problem… :confused:

Card/driver configuration

Thanx for any suggestions!

Best regards,
Martin

You should never compare floating-point numbers for equality because of precision issue. Rather of X == Y use | X - Y | < e, where e is a very small number. I guess that would be your problem.

I changed the equivalence function to an epsilon based one but still the result is exactly the same.

00: #pragma optimize(off)
01: const uniform vec3 u_sec_size;   // sector size
02: const uniform vec3 u_sec_pos;    // sector position
03: #define SIZE u_sec_size.x
04: #define UNIT u_sec_size.y
05: #define IDX  gl_MultiTexCoord0.x
06: void main()
07: {
08:   vec4 pos = gl_Vertex;
09:   const float eps = 0.01;
10:   float my_size = 9.0;
11: //if (abs(SIZE - 9.0) < eps) my_size = SIZE;
12:   float x = mod(IDX, my_size);
13:   if (abs(x - my_size) < eps && abs(my_size - 9.0) < eps)
14:     gl_FrontColor = gl_BackColor = vec4(0.0, 0.0, 0.0, 1.0);
15:   else
16:     gl_FrontColor = gl_BackColor = vec4(0.8, 0.8, 0.8, 1.0);
17: //pos[0] = x * UNIT + u_sec_pos.x;
18:   gl_Position = gl_ModelViewProjectionMatrix * pos;
19:  }

Why would mod(x, y) ever return a value close to y (assuming y > 0, in my case being close to 9.0)? The values should be in the range [eps…(y-1+eps)] with abs(eps) being some small value, right?

The thing that confuses me most is that mod() seems to work well as long as I do not touch SIZE in line 11 (which should be a no-op really…).

I cannot think of any side effects causing the observed anomalies but maybe I’m missing something.

More suggestions anyone?

Regards,
Martin

Why would mod(x, y) ever return a value close to y (assuming y > 0, in my case being close to 9.0)? The values should be in the range [eps…(y-1+eps)] with abs(eps) being some small value, right?

Wrong. The mod operation is operating on float values so result can be verry close to the y because that function is calculating ( x – y ∗ floor (x/y) ) where y is float value. The “-1” in your range expression should be minimal representable difference, which for floats may be really small number.


The thing that confuses me most is that mod() seems to work well as long as I do not touch SIZE in line 11 (which should be a no-op really…).

As long as you are using constant value in the calculation, the compiler can do various optimalizations of the calculation like using special cased code which utilizes values with much higher precision that were precalculated on CPU during shader compilation (in your situation it is likely that it calculates 1/y from the mod operation on the CPU). If calculation depends on uniform the compiler is likely to avoid optimizing for special cases and will probably generate generic code operating with GPU precision. And also there may be problem in the compiler so one from those variants may be compiled in wrong way.

Wrong. The mod operation is operating on float values so result can be verry close to the y because that function is calculating ( x – y * floor (x/y) ) where y is float value. The “-1” in your range expression should be minimal representable difference, which for floats may be really small number.

The problem occurs when x and y are both close to 9. Assuming different epsilons for the paramters and calculation results doing some naive calculations yields:

As long as you are using constant value in the calculation, the compiler can do various optimalizations of the calculation like using special cased code which utilizes values with much higher precision that were precalculated on CPU during shader compilation (in your situation it is likely that it calculates 1/y from the mod operation on the CPU).

I see. So the GPU can do a faster multiply when calculating the mod() value per vertex.

If calculation depends on uniform the compiler is likely to avoid optimizing for special cases and will probably generate generic code operating with GPU precision. And also there may be problem in the compiler so one from those variants may be compiled in wrong way.

So when I compile the shader without the uniform access the epsilons in the calculation implicitly change leading to some different result. I also disabled optimization by using the #pragma optimize preprocessor directive but maybe the compiler just ignores that one.

Actually what I wanted to do was calculate vertex position and texture coordinates from one single vertex atribute (an index) to save resources. Set aside any possible compiler problems is there a reliable way to do ‘integer-style’ operations with floating point arithmetic (maybe using some bias) anyway? Otherwise I will have to forget my idea and store some more per-vertex attributes which I was trying to avoid. I really wonder how useful mod() is anyway when mod(9+eps,9+eps) yields 9+eps but maybe I am not yet familiar enough with the world of floating point arithmetic.

How big is the chance for that being some problem with the compiler?

Best regards,
Martin


Actually what I wanted to do was calculate vertex position and texture coordinates from one single vertex atribute (an index) to save resources. Set aside any possible compiler problems is there a reliable way to do ‘integer-style’ operations with floating point arithmetic (maybe using some bias) anyway?

You can do that using various biases, scales and floor/fract operations (unlike mod operation at least one from them is native to the current hw and the second one is native or imlemented using the one that is native). You have to make sure that your values stay in range where epsilon differences and precision of individual instructions do not influence result. Also some values may be not what you may initially expect. For example sampling from RGBA8 texture will yeild values that are not ( X / 255.0 ) as someone may expect.


Otherwise I will have to forget my idea and store some more per-vertex attributes which I was trying to avoid. I really wonder how useful mod() is anyway when mod(9+eps,9+eps) yields 9+eps but maybe I am not yet familiar enough with the world of floating point arithmetic.

The problem you see is probably caused by the division of value that is not power of two together with limited precision of the function that calculates the division.
On most current hw there is no instruction that calculates ( X / Y ), there is instruction that aproximates ( 1 / Y ) and likely even with precision lower than some other calculations. If Y is not power of two, it may be not possible to store that value precisely (even in case of precise calculation) so some rounding may additionally happen. This combined may cause that ( Y * ( 1.0 / Y ) ) is lower than 1.0 and the floor will return value lower than expected and you get invalid result.


How big is the chance for that being some problem with the compiler?

Probably not too big. On nVidia cards you can retrieve the assembly generated from the GLSL by use of NVemulate or NVShaderPerf tools so you can see what the compiler tries to do in both cases.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.