PDA

View Full Version : ATI problems with variable length for loops?



BrianDFS
07-12-2010, 07:33 AM
I'm currently using GLSL 1.5 and have noticed on two separate occasions that ATI cannot seem to handle for loops where the count is a uniform variable instead of a pre-defined constant.

For example, a blur where the blur radius is passed in as a uniform rather than hard coded to some fixed radius.

The effect of this appears to be that no iterations of the for loop get executed at all.

However, NVidia seems to handle this fine.

Has anyone else experienced this issue?

Aleksandar
07-12-2010, 12:11 PM
Maybe I'm wrong, but "uniform flow control" is a part of GLSL 4.xx.x specification. ATI is pretty consistent in spec implementation, so if you have declared GLSL 1.5, variable loop length might be not allowed. Uniform flow control also requires new hardware. Did you try it on the same class hardware (D3D11/GL4.0)?

BrianDFS
07-12-2010, 12:24 PM
Maybe I'm wrong, but "uniform flow control" is a part of GLSL 4.xx.x specification. ATI is pretty consistent in spec implementation, so if you have declared GLSL 1.5, variable loop length might be not allowed. Uniform flow control also requires new hardware. Did you try it on the same class hardware (D3D11/GL4.0)?

If that's the case, then my bad as I must have missed that. However, I have certainly found ATI to be consistent with their implementation of the spec.

I have not been able to test this on ATI DX11/GL4 class hardware. The NVidia hardware I've been testing on is DX11 though, so that may explain why it's working there.

Just so I'm clear though -- GL3.2 does not allow for variable loop length? i.e. it's a 4.0+ only feature?

Ilian Dinev
07-12-2010, 01:04 PM
Sounds fishy. If there's no loop control of all types, then it means there's no support for dynamic branching at all.
My bet is on a driver bug.

Btw, it's the third similar report here for the past week or so.

frank li
07-12-2010, 06:52 PM
Only sampler array that takes a variable index is a 4.0+ feature. It needs the hardware support. The driver could handle all flow control cases except that.

If you still have problems, please paste on the shader.

Aleksandar
07-13-2010, 03:01 AM
I'm sorry. It's my mistake. Loops controlled by uniforms are allowed in SM4 hardware (I have tried it on GF8600). I thought that it is still needed for loops to be unrolled by the compiler. In that case the number of iterations must be known in the compile-time. It is still the limitation of mobile devices and OpenGL ES, but obviously not for desktop GPUs.

And for the end one historical fact: ATI Radeon 9500 (R300) did not support loops in fragment shader at all. :) I bet that your card is not so old, and if it supports SM4, there is probably a bug, just like Ilian said.

BrianDFS
07-13-2010, 05:05 AM
If you still have problems, please paste on the shader.
Thanks for the feedback guys. And yes, it still does not work.

The shader itself is very simple. It's literally a for loop fetching some texels and computing an average. The count is defined by a uniform variable rather than a constant. I don't have the code in front of me at the moment, but can post it tonight when I get home.

frank li
07-13-2010, 07:05 PM
It's literally a for loop fetching some texels and computing an average. The count is defined by a uniform variable rather than a constant.
It's just the case I mentioned - Only sampler array that takes a variable index is a 4.0+ feature. You need to run it on HD5000+ series. Otherwise the driver will report something like "indirect index to sampler array is not supported on the asic.".

BrianDFS
07-14-2010, 03:15 PM
It's just the case I mentioned - Only sampler array that takes a variable index is a 4.0+ feature. You need to run it on HD5000+ series. Otherwise the driver will report something like "indirect index to sampler array is not supported on the asic.".
Just so we're clear on terminology -- it's a for loop fetching texels from a texture (single sampler), not a sampler array.

Here's the code:

void main()
{
vec4 C = texture( TEXTURE_0, LerpUV );

vec2 V = DF_ComputePixelVelocity( TEXTURE_1, TEXTURE_2, LerpUV ) * MotionBlurInfo.y;

vec2 BlurUV = LerpUV + V;

const int NUM_SAMPLES = 8; // If this comes from a uniform, it doesn't work

for( int i = 1; i < NUM_SAMPLES; i++, BlurUV += V )
C.rgb += texture( TEXTURE_0, BlurUV ).rgb;

C.rgb /= float( NUM_SAMPLES );

OutC = vec4( C.rgb, 1 );
}

frank li
07-15-2010, 12:25 AM
The shader looks good to me. I tried to compile the shader by adding the comment on the DF_ComputePixelVelocity, it works. Which hardware and driver do you use?

BrianDFS
07-15-2010, 07:15 AM
The shader looks good to me. I tried to compile the shader by adding the comment on the DF_ComputePixelVelocity, it works. Which hardware and driver do you use?

Yes, that shader as it's posted there, will indeed work. If you look at the comment on the line where I assign NUM_SAMPLES, you'll see that it breaks (i.e. does not work) if the NUM_SAMPLES value comes from a uniform variable instead of an inline constant. Note however, that even when NUM_SAMLPES is assigned a value from a uniform, the ATI shader log indicates that everything compiled and linked successfully.

I'm running either the latest or one version older than the latest ATI drivers.

frank li
07-16-2010, 02:22 AM
The bug could be reproduced now. It's related to uniform block and is fixed recently. You have to wait for about three months to try the new driver.
Two workaround ways could be taken to avoid the failure:
1. Do not use the variable length as you said.
2. Use general uniform instead of uniform block.

Sorry for the inconvenience.

hound
07-16-2010, 06:14 AM
I'm not sure it is related to uniform blocks.

I'm also trying to use a for loop with a uniform condition (specified in the general block) in a fragment shader (with a sampler2DArray), with #version 400 defined, on a HD5770 and it doesn't work. It compiles fine, but just doesn't iterate the loop at all. If I use a constant expression, it loops ok.




in vec2 TexCoords;

uniform sampler2D DefaultDetailTexture;

const int MaxDetailTextures = 12;

uniform int NumDetailTextures;
uniform sampler2DArray AlphaTextureArray;
uniform sampler2DArray DetailTextureArray;

void main(void) {
float alpha_accum = 0.0;
vec3 detail_color = vec3(0.0);
for (int i = 0; i < MaxDetailTextures; ++i) {
//if (i >= NumDetailTextures || alpha_accum >= 1.0) {
// break;
//}

float alpha = texture(AlphaTextureArray, vec3(TexCoords, i)).r;

detail_color += min(alpha, 1.0 - alpha_accum) * texture(DetailTextureArray, vec3(TexCoords, i)).rgb;
alpha_accum = min(1.0, alpha_accum + alpha);
}
detail_color += (1.0 - alpha_accum) * texture(DefaultDetailTexture, TexCoords).rgb;

gl_FragColor = vec4(detail_color, 1.0);
}



If I try NumDetailTextures instead of MaxDetail textures, or uncomment the condition w/ the break in, it doesn't work.

frank li
07-19-2010, 07:01 PM
It's definitely not related to uniform block.

Are the results the same (both wrong) when you use NumDetailTextures or uncomment the if condition?

hound
07-21-2010, 11:34 AM
Yep. In both cases the shader compiles and runs, but doesn't seem to execute any code in the loop.

BrianDFS
07-23-2010, 06:25 AM
So what's the status on this Frank? Have you guys been able to pinpoint exactly what the problem is? I think it's clear at this point that there is definitely a problem with the compiler and at the very least, with for loops that loop based off uniforms rather than constants.

frank li
07-26-2010, 02:09 AM
I just get a chance to take a look at it. It can't be reproduced. Below are some codes:


glGenTextures(2, tex);

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, tex[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 4, 4, 0, GL_RGBA, GL_UNSIGNED_BYTE, _texture1);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE);

glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D_ARRAY, tex[1]);
glTexImage3D(GL_TEXTURE_2D_ARRAY, 0, GL_RGBA, 4, 4, 2, 0, GL_RGBA, GL_UNSIGNED_BYTE, _texture2);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE);

GLuint loc;
loc = glGetUniformLocationARB(p, "DefaultDetailTexture");
glUniform1iARB(loc, 0);
loc = glGetUniformLocationARB(p, "AlphaTextureArray");
glUniform1iARB(loc, 1);
loc = glGetUniformLocationARB(p, "DetailTextureArray");
glUniform1iARB(loc, 1);
loc = glGetUniformLocationARB(p, "NumDetailTextures");
glUniform1iARB(loc, 8);

No matter how I change the shader, the results shown on amd and nvidia are the same. My environment is HD5770 + Vista + Cat10.6.

If you still have problems, please send your program to me by frank.li@amd.com

hound
07-26-2010, 10:44 AM
Well, I tried making a minimal example, and found that I couldn't reproduce the bug either. Then I went back to my original code and put the uniform loop conditional back in, and it worked fine.

Not a clue what I was doing wrong. Sorry for the waste of time.

hound
07-28-2010, 06:24 AM
It turns out the problem is still here, but it's happening intermittantly.

When I turn my computer on, the uniform conditional in my main program doesn't work. I then run my attempt at a minimal reconstruction of the problem and that does work. Then I go back to the main program and run it, and it now works fine!

Not really got a clue how to track this down further.

BrianDFS
07-29-2010, 07:53 AM
It has not been intermittent for me -- it has been 100%. I sent Frank a copy of the shader program source that was doing it for me. Not sure if he was able to test with that or not.

For now I'm getting by with hard-coding constants instead of using uniforms. Unfortunately, that is not (and cannot be) a permanent solution.

frank li
07-29-2010, 06:34 PM
I'm sorry that you are not aware of the previous reply.


The bug could be reproduced now. It's related to uniform block and is fixed recently. You have to wait for about three months to try the new driver.
Two workaround ways could be taken to avoid the failure:
1. Do not use the variable length as you said.
2. Use general uniform instead of uniform block.

Sorry for the inconvenience.

I think I have sent you the mail about it. Maybe you missed it.

CortS
07-31-2010, 01:34 PM
I'd like to chime in with a similar problem which has been 100% reproducible on my laptop (Radeon 3450, Catalyst 10.7, Windows 7). It's a ghetto multiple-point-light shader that uses neither textures nor uniform blocks. When I use a literal value for the loop count, the shader works correctly -- though the lighting model is crap, as designed :) If I use the lightCount uniform instead (set to the same value), not only is the fragment shading incorrect, but the vertex shader seems to be affected as well; my geometry gets collapses to the YZ plane.

The same shader works correctly in either scenario on my desktop (Radeon 5850, Windows 7, Catalyst 10.7).

Here's the code:

#version 150 core

in vec3 outVS_WorldPos;
in vec3 outVS_WorldNormal;
out vec4 outFS_FragColor0;

#define MAX_NUM_LIGHTS 256
struct Light
{
vec4 pos; // XYZ=pos, W=falloff distance
vec4 color;
};
uniform Light lights[MAX_NUM_LIGHTS];
uniform int lightCount;

void main(void)
{
vec4 outColor = vec4(0.1,0.1,0.1,1); // base ambient level
// Broken on 3450; replace lightCount with literal 256 and it works...
for(int iLight=0; iLight<lightCount; ++iLight)
{
vec3 lightPos = lights[iLight].pos.xyz;
float falloffDistance = lights[iLight].pos.w;
vec4 lightColor = lights[iLight].color;

vec3 toLight = lightPos-outVS_WorldPos;
float distToLight = length(toLight);
float attenuation = 1.0 - smoothstep(0, falloffDistance, distToLight);
outColor.xyz += clamp(dot(outVS_WorldNormal, normalize(lightPos-outVS_WorldPos)),0,1)
* attenuation * lightColor.xyz;
}
outFS_FragColor0.xyz = outColor.xyz;
}

Is this the same bug, Frank?

BrianDFS
08-02-2010, 08:08 AM
I'm sorry that you are not aware of the previous reply.


The bug could be reproduced now. It's related to uniform block and is fixed recently. You have to wait for about three months to try the new driver.
Two workaround ways could be taken to avoid the failure:
1. Do not use the variable length as you said.
2. Use general uniform instead of uniform block.

Sorry for the inconvenience.

I think I have sent you the mail about it. Maybe you missed it.
Frank, sorry about that. Somehow I did indeed miss that message. Thanks for the update as well as the suggested workaround. I'm looking forward to the fix.

frank li
08-03-2010, 01:43 AM
It's not the same bug. If you query the max uniform components on Radeon 3450, you will find the limitation is 1024, which means you could use 256 uniforms at most.

So look at the shader,
#define MAX_NUM_LIGHTS 256
struct Light
{
vec4 pos; // XYZ=pos, W=falloff distance
vec4 color;
};
uniform Light lights[MAX_NUM_LIGHTS];

512 uniforms are declared, the result is unexpected under the case. That's the root cause.

CortS
08-04-2010, 01:00 PM
*facepalm*
Durp.
Thanks Frank :)

hound
08-12-2010, 04:11 PM
Sorry to bring this thread up again, but would this bug also affect geometry shaders? I've been trying to run this, and having the same problem:




#version 400

layout(max_vertices = 96) out;

uniform int RES_R; // works fine with: const int RES_R = 32;
uniform float Rg, Rt;

flat out int layer;
flat out float r;
flat out vec4 dhdH;

void main() {

for (int i = 0; i != RES_R; ++i) {
float rl = float(i) / (float(RES_R) - 1.0);
rl = rl * rl;
rl = sqrt(Rg * Rg + rl * (Rt * Rt - Rg * Rg)) + (i == 0 ? 0.01 : (i == RES_R - 1 ? -0.001 : 0.0));

float dmin = Rt - rl;
float dmax = sqrt(rl * rl - Rg * Rg) + sqrt(Rt * Rt - Rg * Rg);
float dminp = rl - Rg;
float dmaxp = sqrt(rl * rl - Rg * Rg);

gl_Position = gl_in[0].gl_Position;
gl_Layer = i;
EmitVertex();

gl_Position = gl_in[1].gl_Position;
gl_Layer = i;
EmitVertex();

layer = i;
r = rl;
dhdH = vec4(dmin, dmax, dminp, dmaxp);

gl_Position = gl_in[2].gl_Position;
gl_Layer = i;
EmitVertex();

EndPrimitive();
}
}

frank li
08-12-2010, 08:32 PM
Could you try to add the input and output topology together with max_vertices as
layout(triangles) in;
layout(triangle_strip) out;
?

hound
08-13-2010, 04:46 AM
Yep. Just tried with:

layout(triangles) in;
layout(triangle_strip, max_vertices = 96) out;

and still getting the same problem.

frank li
08-15-2010, 07:30 PM
The shader works for me. Do you mean the geometry shader doesn't work in your program? Could you please try to narrow down the application and send it to me? Thanks.

vindoctor2
12-10-2010, 03:10 PM
I would really wish companies would stop bragging about what drivers are out first, but brag about the least amount of bugs with their drivers! This bug still exists today.

uniform int samplesin= 4;
.....
for(int i=0; i < samplesin; ++i)
{

}

If I hard code samplesin to not be a uniform all works fine. Its December now! please fix.

SDK INFO: GL_VERSION = 3.3.10317 Compatibility Profile/Debug Context
SDK INFO: GL_VENDOR = ATI Technologies Inc.
SDK INFO: GL_RENDERER = ATI Radeon HD 3400 Series
SDK INFO: GL_SHADING_LANGUAGE_VERSION = 3.30
SDK INFO: GLEW_VERSION = 1.5.7

frank li
12-12-2010, 07:14 AM
I think the bugs that are complained above are fixed. Could you please provide more details to reproduce your problem? We would like to fix them soon.
Sorry for the inconvenience.

vindoctor2
12-13-2010, 08:39 PM
Thanks! its as simple what I posted...what I do now is if I detect an ATI card, I use a #define in my shader to use in my loops instead of downloading a uniform (uniformi). My coworker just got a higher end card, it will be interesting to see if this issue just exists on the lower end hardware. BTW, all my other unforms get set properly. Also, we have multiple shaders and the problem exposes itself in all of them.

I don't have time to make a test app, but if you have a specific question I can answer that. BTW, we use #version 330 core in our shaders.. not sure if that matters or helps.

Porting our stuff to additionally work with ATI from the NVIDIA world has been a frustrating process, but we are making headway. If I can just figure out why the ATI drivers deadlock on glmapbuffer sometime after a gldrawelements life will get better. Another strange find...a compiled shader with no errors can sometimes make gldrawelements throw an invalid operation! go figure.. BTW, I did find we had an ARB enabled in the shader, and once removed all worked great and gldrawelements did not create an opengl error! I should write a book or at least a blog to help others in this process once I'm done.

BTW, the deadlock would cause the windows 7 OS to report it has recovered from the ATI driver...

frank li
12-14-2010, 10:35 PM
Okay, I will ask some questions about the loop expression.
1. Is the uniform "sampleIn" used to indirect index the sampler array? We have limitation on it. The feature is supported on HD5xxx and above.
2. Which shader do you use the loop expression on? Vertex? Fragment? Geometry? Do you use uniform block?
3. Is there any error/warning message reported from the ATI's compiler?

For your deadlock problem, you could use glGetError to locate where the error is.

Thanks
Frank

dukey
12-15-2010, 03:48 AM
The hardware might not be capable of loops. Loops with static lengths simply get unrolled, whilst those with dynamic lengths fail to compile.

vindoctor2
12-21-2010, 03:12 PM
I do not get any shader compile errors, but what you are saying seems to make sense with what I'm seeing. This really surprises the heck out of me as I would have thought for loops for an OpenGL 3.2 supported card would be expected.

Since I do not get any opengl errors, I would still consider this a bug on the ATI/AMD side. I'm also very troubled... does any API call exist so I can determine what AMD/ATI cards can support a for loop while others cannot? If this is an ati/amd hardware limitation, my nvidia card from 2006 can handle such....which is why I'm still having a hard time accepting this is an AMD/ATI hardware limitation and not a driver bug.

vindoctor2
12-21-2010, 03:57 PM
No GL error exists, and a deadlock is a hard core deadlock where the windows OS says it recoverd from the AMD/ATI driver crash. Even if I use the latest gdebugger (version 5.8), it deadlocks.

I have found out how to get around one deadlock, where if I use "0" for the multisampled FBOs a deadlock does not occur, and the geometry does indeed show up. I currently do not know how to get round this issue btw. I can probably tell your ati/amd customers that multisampling will not be available on certain AMD/ATI cards(until we figure this out).

As for the shaders.. I have tried uniform int as well as to sneak it into a vec4 with some other data.

#version 330 core
precision highp float;
uniform int loop;
out vec4 outcolor;

void main()
{
ivec2 pt = whereintexture
vec3 info;
vec4 yadda;

for(int i=0; i < loop;++i)
{
info = texelFetch( positionTex, pt,i ).xyz;

info does stuff with yadda
}

outcolor = yadda/loop;
}

frank li
12-21-2010, 07:10 PM
Usually the compiler will throw out the error when the hardware doesn't support the feature.
There is an known issue for texelFetch on a multi-sampled depth texture on some special hardwares. Could you please send your program to me by frank.li@amd.com? It's helpful to resolve your problem.

Thanks for your feedback
Frank