PDA

View Full Version : nvidia GLSL shader converting to AMD issues ?



GLpeon
05-12-2011, 11:33 AM
On nvidia cards, these GLSL fragment shaders work OK, but on AMD, it is getting some extreme performance hiccups, and I thought maybe it was something I was doing wrong in the conversion process, since I was getting tons of errors with the shaders.
This is with 11.5 drivers, Radeon 6870

I changed

normal_final_view += bumpmapping_factor*(2.0 * texture2D(texNormalMap, gl_TexCoord[0].st) - 1.0);
to

normal_final_view += bumpmapping_factor*(2.0 * vec3(texture2D(texNormalMap, gl_TexCoord[0].st) - 1.0));


vec4 LevelOfGrey(vec4 colorIn)
{
return colorIn.r * 0.299 + colorIn.g * 0.587 + colorIn.b * 0.114;
}
to

vec4 LevelOfGrey(vec4 colorIn)
{
return vec4(colorIn.r * 0.299 + colorIn.g * 0.587 + colorIn.b * 0.114);
}



for(int i=-kernel_size; i<=kernel_size; i++)
{
vec4 value = texture2D(texScreen, uv + vec2(pas.x*i, 0.0));
int factor = kernel_size+1 - abs((float)i);
to

for(int i=-kernel_size; i<=kernel_size; i++)
{
vec4 value = texture2D(texScreen, uv + vec2(pas.x*i, 0.0));
int factor = kernel_size+1 - abs(i);

and finally


float mask5_2[] = { 1.0/5.0, 1.0/5.0, 1.0/5.0, 1.0/5.0, 1.0/5.0};
float mask3_2[] = { 1.0/3.0, 1.0/3.0, 1.0/3.0};
...
vec4 convolH(float tab[], int size)
{
float stepX = 1.0/screenWidth;
vec4 color = 0;
int k = (size/2);
int ind = 0;
for(int i=-k; i<=k; i++)
color += tab[ind++] * texture2D(texScreen, gl_TexCoord[0].st + vec2(i*stepX, 0));
return color;
}
...
color = convolH(mask5_2,5);
...
color = convolH(mask3_2,3);


to


float mask5_2[5] = float[5](1.0/5.0, 1.0/5.0, 1.0/5.0, 1.0/5.0, 1.0/5.0);
float mask3_2[3] = float[3](1.0/3.0, 1.0/3.0, 1.0/3.0);

...
...
vec4 convolV(int size)
{
float stepY = 1.0/screenHeight;
vec4 color = vec4(0);
int k = (size/2);
int ind = 0;
if (size == 5)
{
float tab[5] = mask5_2;
for(int i=-k; i<=k; i++)
color += tab[ind++] * texture2D(texScreen, gl_TexCoord[0].st + vec2(0, i*stepY));
return color;
}
else if (size ==3)
{
float tab[3] = mask3_2;
for(int i=-k; i<=k; i++)
color += tab[ind++] * texture2D(texScreen, gl_TexCoord[0].st + vec2(0, i*stepY));
return color;
}


...

color = convolH(5);
...
color = convolH(3);



Once those were all fixed, the shaders now compile in AMD land, but the FPS is going from a smooth 60 to 1, then back up to 60.. then 1.
On Nvidia hardware, it is solid 60 all the way through.

Did I screw something up trying to convert the shaders ?

This is my first jump into GLSL, in case you haven't guessed. :)
Thanks for any info on this.

BTW, I thought GLSL is a standard, so why in the world is nvidia breaking the standard with their own mix of GLSL ? From what I read in the specs, you aren't supposed to use arrays like : vec4 convolH(float tab[], int size) [unless I read the specs wrong, which is always possible ;)]

Alfonse Reinheart
05-12-2011, 01:54 PM
BTW, I thought GLSL is a standard, so why in the world is nvidia breaking the standard with their own mix of GLSL ?

That's what they do. NVIDIA uses a modified version of their CG compiler to compile GLSL, so oftentimes they will let non-conformant GLSL code slip through.

alder
05-12-2011, 11:00 PM
Probably best to try gDEBugger or similar tool to get a better understanding how the "slow" frames differ from the "fast" frames. Perhaps you are doing an occasional state change that happens to be expensive on AMD's implementation.

In one of AMD's whitepapers they explained that on their new hardware it is better to think scalar than vector.
For example if the compiler is silly this:

normal = (factor * 2.0) * texture2D(map, coord).xyz - factor;
might be faster than this:

normal = factor * (2.0 * vec3(texture2D(map, coord) - 1.0));
Depending on what you do with the return value from LevelOfGrey(vec4 colorIn) returning a single float can be better than vec4.
You should always the pick only needed components rather than use vec4 everywhere. If you are not going to blend you can forget the alpha channel and save a lot of operations and so on.

AMD GPU Shader Analyzer can give some ideas how many cycles different versions of your shaders would need.

GLpeon
05-14-2011, 11:51 AM
Probably best to try gDEBugger or similar tool to get a better understanding how the "slow" frames differ from the "fast" frames. Perhaps you are doing an occasional state change that happens to be expensive on AMD's implementation.

In one of AMD's whitepapers they explained that on their new hardware it is better to think scalar than vector.
For example if the compiler is silly this:

normal = (factor * 2.0) * texture2D(map, coord).xyz - factor;
might be faster than this:

normal = factor * (2.0 * vec3(texture2D(map, coord) - 1.0));
Depending on what you do with the return value from LevelOfGrey(vec4 colorIn) returning a single float can be better than vec4.
You should always the pick only needed components rather than use vec4 everywhere. If you are not going to blend you can forget the alpha channel and save a lot of operations and so on.

AMD GPU Shader Analyzer can give some ideas how many cycles different versions of your shaders would need.
I tried those utilities, and it seems, if VSYNC is off, then I don't get a huge impact on framerates, but if it is on, then I get a massive slowdown.
I'll keep playing around with calls, so I can narrow it down which shader is causing the massive hit.

Thanks for the info.