Shader in software and worse

splat · January 15, 2005, 11:33pm

Hi, I have written a shader which should do pixel lighting with parallax bump mapping and shadow mapping.
I’d like to sample the depth map 6 times at a minimum. There is another depth map which i sample only one time where the objects are rendered a bit bigger, i need the value from this read for offseting the reads on the first map and getting some kind of soft shadows. That’s why the reads in the first map are dependant on the reads in the second.
This works well - in software modus on my Radeon 9800 Pro.
And in hardware when i only do 2 reads instead of 6.
But without jittering it looks not that good. If I sample a noise texture (3 times for getting some different noise values, 6 reads per 6 samples of the depth map would be even better [the depth maps are cube maps so I need 3 noise values per lookup]) the depth map lookups are dependent on 2 other samples.
And now the software mode produces crap. Everything seems to be in shadow. I’m pretty shure that this is not my fault, when i do only 2 lookups everything is ok; (i have 2 slightly offseted half-shadows with noisy edges; the offset becomes bigger if the light source is nearer to the object)
But if I add another similar read, the software modus won’t bring the desired results (but only with 2 dependencies, with one everything is ok)

What do you think about this? Is the driver forced to calculate the same in software mode on cpu?
Is this a known bug? I have Catalyst 4.12.
Thanks for trying to understand my confused english

imported_jwatte · January 17, 2005, 5:49pm

The Radeons have limitations on the length of dependent reads. They can only support a dependency chain of 4. Sometimes, you can convince the compiler that your chain really is shorter. For example, using a bunch of temporaries, maybe you could do something like this:

float foo( float2 uv ) {
  float2 displacement = texture2D( A, uv ).xy;
  float2 a = uv + displacement + float2( 0.1, 0 );
  float2 b = uv + displacement + float2( 0, 0 );
  float2 c = uv + displacement + float2( -0.1, 0 );
  float2 d = uv + displacement + float2( 0, -0.1 );
  float2 e = uv + displacement + float2( 0, 0.1 );
  float2 f = uv + displacement + float2( -0.05, 0.05 );
  float r0 = texture2D( B, a );
  float r1 = texture2D( B, b );
  float r2 = texture2D( B, c );
  float r3 = texture2D( B, d );
  float r4 = texture2D( B, e );
  float r5 = texture2D( B, f );
  return (r0+r1+r2+r3+r4+r5)/6;
}

This might be enough to tell the compiler you really only have a dependency depth of 2.

Also, I have vague recollections about only being able to sample the same texture unit 4 times, so you may need to bind the B texture to two separate units, and call it C for two of those samples.

splat · January 18, 2005, 7:15am

Thank you very much, I tried to implement it like you said, but it actually was mostly like this before.
I think i should post some code

 
vec4 extract = vec4(1.0, 0.01, 0.0001, 0.000001);

float shadowCube(vec3 v, samplerCube t) {
 vec4 col = textureCube(t, v).rgba;
 return  dot(col, extract);
}

const vec3 o1 = vec3(1.0, 0.5, 0.3); 
const vec3 o2 = vec3(0.3, 1.0, 0.5);
const vec3 o3 = vec3(0.5, 0.3, 1.0); 
const vec3 o4 = vec3(1.0, 1.0, 0.5);
const vec3 o5 = vec3(0.5, 1.0, 1.0);
const vec3 o6 = vec3(1.0, 0.5, 1.0);
 
float shadow(vec3 v, float l) {

 float res = 0.1;
 float d = shadowCube(v, shadowmap2);
// vec3 noise1 = texture2D(noise, vec2(d*10.0, d*20.0)*v.xy).rgb;
// vec3 noise2 = texture2D(noise, vec2(d*15.0, d*25.0)*v.yz).rgb;
// vec3 noise3 = texture2D(noise, vec2(d*25.0, d*35.0)*v.zx).rgb;
 vec3 noise1 = texture2D(noise, v.xy).rgb;
 vec3 noise2 = texture2D(noise, v.yz).rgb;
 vec3 noise3 = texture2D(noise, v.zx).rgb;
 float delta = (l+1.0) / (d+1.0) - 1.0;
 delta *= delta;
 noise1 = (noise1-0.5)*0.5; 
 noise2 = (noise2-0.5)*0.5;
 noise3 = (noise3-0.5)*0.5;

 vec3 offset1 = v + (o1 + noise1)* delta;
 vec3 offset2 = v + (o2 + noise2)* delta; 
 vec3 offset3 = v + (o2 + noise3)* delta; 
 vec3 offset4 = v + (o4 + noise1*noise2) * delta;
 vec3 offset5 = v + (o5 + noise2*noise3) * delta;
 vec3 offset6 = v + (o6 + noise1*noise3) * delta;

 float weight = 1.0 / 2.0;

 float d1 = shadowCube(offset1, shadowmap1);
 float d2 = shadowCube(offset2, shadowmap1);
 float d3 = shadowCube(offset3, shadowmap1);
 float d4 = shadowCube(offset4, shadowmap1);
 float d5 = shadowCube(offset5, shadowmap1);
 float d6 = shadowCube(offset6, shadowmap1);
 float d0 = shadowCube(v, shadowmap1); 


 if (d1 >= l)
  res += weight;
// else 
//  res += weight * 0.1 * pow(l+1.0, 3.0); 
 if (d2 >= l)
  res += weight;
// else 
//  res += weight * 0.1 * pow(l+1.0, 3.0); 
/*
 if (d3 >= l)
  res += weight;
// else 
//  res += weight * 0.1 * pow(l+1.0, 3.0); 
 if (d4 >= l)
  res += weight;
// else 
//  res += weight * 0.1 * pow(l+1.0, 3.0); 
 if (d5 >= l)
  res += weight;
// else 
//  res += weight * 0.1 * pow(l+1.0, 3.0); 
 if (d6 >= l)
  res += weight;
// else 
//  res += weight * 0.1 * pow(l+1.0, 3.0); 
 if ( >= l)
  res += weight;
// else 
//  res += weight * 0.1 * pow(l+1.0, 3.0); 
*/
 return res * 0.909091; 
}

This way it will work in software with correct results. If i move the " /* " 4 down, one more "if … " causes the false software mode.
And if I use
“vec3 noise1 = texture2D(noise, vec2(d10.0, d20.0)*v.xy).rgb;”
instead of
“vec3 noise1 = texture2D(noise, v.xy).rgb;”,
which I would prefer, only ONE if clause will work correct (in software). I can comment out everything except one if statement, it’s not relevant which one I use, relevant is that I have to use 1 at a maximum (or 2 with the other noise code)
You see, I don’t use any sampler more than one time after all
A #pragma optimze(off) won’t change anything.

The else parts should be included, it’s commented out only for test purposes, but it seems this doesn’t change anything too.

splat · February 5, 2005, 6:39am

Really no idea of how to make the code more Catalyst 5.1 GLSL compiler-friendly?

system · October 19, 2021, 7:48pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.