PDA

View Full Version : GLSL performance with per-pixel bump mapping = slow?



hardtop
10-26-2005, 10:54 PM
Hello folks,

I know this was discussed in another post, but I thought it could be better to put it in a new post.

Although doing per-pixel lighting leads to increased calculations, and thus performance hits, when I apply this bump-mapping shader, my performance drops from about 150fps to 15fps... this seems so much! :confused:

Here is the code:

VS:

varying vec4 lightDir;
varying vec3 normal, halfVector, spotDir;

void main()
{
vec3 tempLight;
// computing TBN matrix for tangent space ************************************************** **
vec3 v_Normal = normalize(gl_NormalMatrix*gl_Normal); // normal to eye space
vec3 v_Tangent = normalize(gl_NormalMatrix*gl_MultiTexCoord2.xyz); // tangent to eye space
vec3 v_Binormal = normalize(gl_NormalMatrix*gl_MultiTexCoord3.xyz); // binormal to eye space

mat3 tangentBasis = mat3( // in column major order
v_Tangent.x, v_Binormal.x, v_Normal.x,
v_Tangent.y, v_Binormal.y, v_Normal.y,
v_Tangent.z, v_Binormal.z, v_Normal.z);

// compute light vector ************************************************** *********************
vec4 ecPos, bb;
vec3 aux;

ecPos = gl_ModelViewMatrix * gl_Vertex;
aux = vec3(gl_LightSource[0].position-ecPos);
tempLight = aux;

// compute normal and half vector ************************************************** ************
normal = normalize(gl_NormalMatrix * gl_Normal);// vertex to eye coordinates
halfVector = normalize(gl_LightSource[0].halfVector.xyz);

// convert coordinates to tangent space ************************************************** ******
tempLight = tangentBasis * tempLight;
halfVector = tangentBasis * halfVector;
spotDir = tangentBasis * gl_LightSource[0].spotDirection;

// pass texture coords to fragment shader ************************************************** ****
gl_TexCoord[0] = gl_MultiTexCoord0;

// putting distance in w component of light ************************************************** **
lightDir = vec4(tempLight, 0.0);
lightDir.w = length(aux);

// convert vertex position ************************************************** *******************
gl_Position = ftransform();
}FS:

varying vec4 lightDir;
varying vec3 normal, halfVector, spotDir;

uniform sampler2D decalMap;
uniform sampler2D normalMap;

void main()
{
vec3 n,l,halfV;
vec4 texel;
float NdotL,NdotHV;
float att;
float spotEffect;
float dist;

// retrieve material parameters ************************************************** *************
vec4 color = gl_FrontLightModelProduct.sceneColor;
vec4 ambient = gl_FrontLightProduct[0].ambient;
vec4 diffuse = gl_FrontLightProduct[0].diffuse;
vec4 specular = gl_FrontLightProduct[0].specular;

// compute normals from normal map ************************************************** **********
vec2 tuv = vec2(gl_TexCoord[0].s, -gl_TexCoord[0].t);
n = 2.0 * (texture2D(normalMap, tuv).rgb - 0.5);
n = normalize(n);

// compute light ************************************************** ****************************
l = normalize(lightDir.xyz);
dist = lightDir.w;

NdotL = max(dot(n,l),0.0);

if (NdotL > 0.0)
{
spotEffect = dot(normalize(spotDir), normalize(-l));
if (spotEffect > gl_LightSource[0].spotCosCutoff)
{
spotEffect = pow(spotEffect, gl_LightSource[0].spotExponent);
att = spotEffect / (gl_LightSource[0].constantAttenuation +
gl_LightSource[0].linearAttenuation * dist +
gl_LightSource[0].quadraticAttenuation * dist * dist);

color += att * (diffuse * NdotL + ambient);

halfV = normalize(halfVector);
NdotHV = max(dot(n,halfV),0.0);
color += att * specular * pow(NdotHV, gl_FrontMaterial.shininess);
}
}

// apply texture ************************************************** ****************************
texel = texture2D(decalMap,gl_TexCoord[0].st);
color *= texel;

// set fragment color ************************************************** ***********************
gl_FragColor = color;
}I know this is not optimal code, but suffering such a great performance hit makes me wonder whether I did miss something...

Another point to mention is that I can't compute light in my FS because I'm not using a directional light, but a spotlight, which vectors need to be interpolated per-fragment. Anybody got a clue?

Thanks fellows!

HardTop

Korval
10-27-2005, 07:06 AM
You neglected to mention what hardware/drivers you're using.

In any case, my guess would be the use of the if-statements. Remember that conditional statements will always be executed on most hardware, and what's going on in those conditions seems pretty heavy for certain hardware.

ChiefWiggum
10-27-2005, 04:14 PM
Yeah you'd need to mention what hardware/drivers you're using. But just off the top of my head, that normalization of -l isnt required, because you normalized l already :) Wont account for the massive FPS drop though... Give us some more info and maybe we can help!

hardtop
10-27-2005, 11:38 PM
yeah you're right, I noticed it after having posted the message :)

About my hardware, I test on 2 different configurations:

AMD Barton 3000+ / 512 Mb DDR / ATI Radeon 9800 Pro
P4 2.4Ghz / 512 Mb DDR / GeForce FX 5200 crap

On both PC's I get the same performance hit. I get 5 FPS on the 5200 where I had 50 when not running the shaders; and I get 15 FPS on the Radeon where I had 150 FPS, so that's about the same proportion.

Cheers

HardTop

Zulfiqar Malik
10-28-2005, 01:17 AM
First of all, there is no need for redundant assignments (and yes they might be causing you GPU cycles, use the original variable names).

vec4 color = gl_FrontLightModelProduct.sceneColor;
vec4 ambient = gl_FrontLightProduct[0].ambient;
vec4 diffuse = gl_FrontLightProduct[0].diffuse;
vec4 specular = gl_FrontLightProduct[0].specular;Secondly if you are using uncompressed textures then there is usually no need for normalizing the normal from the normal map


vec2 tuv = vec2(gl_TexCoord[0].s, -gl_TexCoord[0].t);do this negation in the vertex shader. so that these three lines

vec2 tuv = vec2(gl_TexCoord[0].s, -gl_TexCoord[0].t);
n = 2.0 * (texture2D(normalMap, tuv).rgb - 0.5);
n = normalize(n);will become


n = 2.0 * (texture2D(normalMap, gl_TexCoord[0].st).rgb - 0.5);Likewise light direction can usually be normalized in the vertex shader. There is usually no need to do this per-fragment (normalization is EXPENSIVE!).
I fail to understand why spotDir is passed as a varying ? Its constant and should be passed as a uniform, and hence there is no need to normalize it.
Pre shader model 3.0 hardware cannot handle branch statements so they are pretty much useless. You can zero-out results by multiplying with lets NDOTL. For SM3.0 capable hardware use "discard".


att = spotEffect / (gl_LightSource[0].constantAttenuation +
gl_LightSource[0].linearAttenuation * dist +
gl_LightSource[0].quadraticAttenuation * dist * dist);There is no need to calculate the denominator per-pixel. The only variable it contains is "dist" which is a varying. You can calculate this entire term (the one in the denominator) in the vertex shader and pass it as a varying to fragment shader and be sure that the result will be the same!

There are a few other optimizations that can be done but i will leave those upto you.

Like i told you earlier, read GPU Programming Guide from the nVidia's website.

hardtop
10-28-2005, 04:06 AM
Great! Thanks a lot for the info. I don't always figure out what can be:

in the VS and interpolated to the FS calculated for each fragment in the FS handled as a uniform staying the same all the way
I'll give that a try.
Thanks!

Zulfiqar Malik
10-28-2005, 06:15 AM
So much so that you can actually pass the denominator term as
1.0/(denominator term)
in the vertex shader and pass it as a varying to the fragment shader, so that you do the expensive division in vertex shader and the relatively inexpensive multiplication in the fragment shader.

Humus
10-28-2005, 04:26 PM
Originally posted by Zulfiqar Malik:
First of all, there is no need for redundant assignments (and yes they might be causing you GPU cycles, use the original variable names).The compiler should handle that gracefully. I doubt this would ever cost any extra.


Originally posted by Zulfiqar Malik:
For SM3.0 capable hardware use "discard".Don't! Discard kills HiZ optimizations and doesn't actually early-out the shader. For SM3.0 hardware, it's better to use if-statements to branch to the end of the shader.

Humus
10-28-2005, 04:30 PM
Originally posted by hardtop:
NdotL = max(dot(n,l),0.0);Use this instead:
NdotL = clamp(dot(n,l),0.0,1.0);

That should map to a single DP3_SAT instruction, while the first one maps to DP3 and MAX (unless the compiler is smart and figures out that both vectors are normalized so a dot product can't return values above 1.0).

Humus
10-28-2005, 04:33 PM
Originally posted by Zulfiqar Malik:
the expensive divisionA scalar division isn't that expensive. It's a RCP and a MUL. Both are single cycle. A vector by vector division is expensive though as all components need their own RCP, so the total cost for a vec4 is five instructions.

Zulfiqar Malik
10-28-2005, 09:33 PM
Originally posted by Humus

The compiler should handle that gracefully. I doubt this would ever cost any extra.

Oh really, or should i say an nVidia compiler would handle it gracefully! I have seen ATI's compiler behave worse than this scenario. In one particular scenario i declared a few CONSTANTS in fragment shader. The nvidia compiler generated proper code, but the ATI's compiler did not inline the value of those constants thus resulting in a huge shader that brought my 9700pro to its knees. I don't know the situation right now as i haven't done any shader related stuff for a couple of months.



Originally posted by Humus

Don't! Discard kills HiZ optimizations and doesn't actually early-out the shader. For SM3.0 hardware, it's better to use if-statements to branch to the end of the shader.

Thanks for clearing that up. I have never, as such tested the use of discard because i didn't have SM3.0 capable hardware until recently. But i read it in some presentations that discard can help save fragment shader instructions. Can you tell me as to why discard won't early out from the shader? The specs state that it should, or is it driver dependent?



Originally posted by Humus

A scalar division isn't that expensive. It's a RCP and a MUL. Both are single cycle. A vector by vector division is expensive though as all components need their own RCP, so the total cost for a vec4 is five instructions.

True! But isn't just one MUL better than a MUL and RCP :) ? I might sound too primitive but i have written shaders for early hardware and for large scenes i had to literally take each and ever clock cycle into account. Although i agree that in this particular case it might not result in spectacular increase in performance :) .

Korval
10-29-2005, 02:27 AM
Don't! Discard kills HiZ optimizations and doesn't actually early-out the shader.I find this disappointing (about the HiZ deactivation; the other is expected). Is this only with discard, or do other discard-like effects (Alpha test, etc) also deactivate HiZ?


Can you tell me as to why discard won't early out from the shader? The specs state that it should, or is it driver dependent?It doesn't early-out because it can't.

Each fragment is not being processed independently; 4-fragment blocks are processed simultaneously, each block running the same program, running the same opcode at the same time. So if you have a conditional discard, it is more efficient to simply set a flag saying not to write that fragment and continue processing, because the other fragments in the 4-fragment block may not have taken the discard. It is also because of these fragment-quads so to speak that conditional branching is difficult.

Humus
10-29-2005, 06:23 PM
Originally posted by Zulfiqar Malik:
[QUOTE]Oh really, or should i say an nVidia compiler would handle it gracefully! I have seen ATI's compiler behave worse than this scenario. In one particular scenario i declared a few CONSTANTS in fragment shader. The nvidia compiler generated proper code, but the ATI's compiler did not inline the value of those constants thus resulting in a huge shader that brought my 9700pro to its knees. I don't know the situation right now as i haven't done any shader related stuff for a couple of months.Well, I've not seen anything like that happen since pretty much the first driver release to support GLSL, so I don't know how you managed.


Originally posted by Zulfiqar Malik:
[QUOTE]But i read it in some presentations that discard can help save fragment shader instructions. Can you tell me as to why discard won't early out from the shader? The specs state that it should, or is it driver dependent?The spec doesn't say how it should be done. As long as it's functionally equivalent it's within spec, and I don't think anyway hardware really early-outs at discard. It will kill the fragment, but the entire shader will still be executed. On the R520, if all pixels within a quad are killed, it will at least stop sampling textures, unlike previous generations. To really early-out you have to use dynamic branching.


True! But isn't just one MUL better than a MUL and RCP :) ?Yes, obviously. :) Just pointing out that it's not that expensive. Lots of people are just used to it being very slow on the CPU, so they assume it's slow on GPUs too.

Humus
10-29-2005, 06:32 PM
Originally posted by Korval:
[QUOTE]I find this disappointing (about the HiZ deactivation; the other is expected). Is this only with discard, or do other discard-like effects (Alpha test, etc) also deactivate HiZ?Alpha test also deactivates HiZ. Depth and stencil test doesn't.
Note that it only disables it for passes that uses this, so if you later turn alpha test off, HiZ will be enabled again, so you don't have to be paranoid about using it at all, just keep in mind that when you use it, things will be slower.

Zulfiqar Malik
10-29-2005, 08:35 PM
Originally posted by Humus

Well, I've not seen anything like that happen since pretty much the first driver release to support GLSL, so I don't know how you managed.

*sigh*. This was just one example my friend, ATI drivers have given me countless sleepless nights when my colleagues were always on my a** telling me to make the shaders more efficient on their machines as well. Hours and hours of debugging, only to find out that the compiler screwing around :( . It was hardly good enough to be barely usable in the beginning, but eventually got better. No offense intented, just something i experienced first hand over several months.

Most of the time my cheap 5700 Ultra was giving twice the performance of a 9700pro.

Zulfiqar Malik
10-29-2005, 08:40 PM
Originally posted by Humus

The spec doesn't say how it should be done. As long as it's functionally equivalent it's within spec, and I don't think anyway hardware really early-outs at discard. It will kill the fragment, but the entire shader will still be executed. On the R520, if all pixels within a quad are killed, it will at least stop sampling textures, unlike previous generations. To really early-out you have to use dynamic branching.

Thanks for the info. So this means that it will actually be better to use branch statements on older hardware because on those the code will be executed anyways, but on modern hardware with branch support it will perform better? This would actually be pretty good, since one will have to provide just one shader for all sorts of hardware that will perform reasonably optimally on all sorts of hardware.

hardtop
10-30-2005, 04:02 AM
Hello folks,

I've just tested the hints collected here, and the code seems much cleaner, but I don't notice much performance increase...

To the question "why do you pass spotDir as a varying?", I do the same for lightDir and lightPos: I pass them as varyings because I must project them to the tangent space before passing to the FS. I can't figure what could be cheaper GPU-wise (not because there is no way, but because I'm a n00b at shader programming ;) )

Thanks for the clue

HardTop

Zulfiqar Malik
10-30-2005, 05:05 AM
I don't understand why lightPos and spotDir need to be tangent space. The only thing that you need for per-pixel lighting is a light direction vector in tangent space so that it can be multiplied (dot product) with a normal (from the normal map). The attenuation can be done in world/eye space.



Originally posted by hardtop

I've just tested the hints collected here, and the code seems much cleaner, but I don't notice much performance increase...

That's strange! Such a compact fragment shader (at least what i have in mind :) ) should give you plenty of performance. Maybe your vertex shader is choking the vertex processor? Can you post the entire souce code i.e. vertex and fragment shader? On another note, you can look at some per-pixel lighting code (there is plenty of lighting related source code available online) and test your application's performance with that.

hardtop
10-30-2005, 11:15 AM
I convert spotDir, lightDir and lightPos to tangent space because I need to perform DOT products between spotdir and lightdir (for the spot cutoff effect) and between the half vector and the normal (for the specular component), but I might be doing something wrong. Here are the full original shaders (for bump mapping):


varying vec4 lightDir;
varying vec3 normal, halfVector, spotDir;

void main()
{
vec3 tempLight;
// computing TBN matrix for tangent space ************************************************** **
vec3 v_Normal = normalize(gl_NormalMatrix*gl_Normal); // normal to eye space
vec3 v_Tangent = normalize(gl_NormalMatrix*gl_MultiTexCoord2.xyz); // tangent to eye space
vec3 v_Binormal = normalize(gl_NormalMatrix*gl_MultiTexCoord3.xyz); // binormal to eye space

mat3 tangentBasis = mat3( // in column major order
v_Tangent.x, v_Binormal.x, v_Normal.x,
v_Tangent.y, v_Binormal.y, v_Normal.y,
v_Tangent.z, v_Binormal.z, v_Normal.z);

// compute light vector ************************************************** *********************
vec4 ecPos, bb;
vec3 aux;

ecPos = gl_ModelViewMatrix * gl_Vertex;
aux = vec3(gl_LightSource[0].position-ecPos);
tempLight = aux;

// compute normal and half vector ************************************************** ************
normal = normalize(gl_NormalMatrix * gl_Normal);// vertex to eye coordinates
halfVector = normalize(gl_LightSource[0].halfVector.xyz);

// convert coordinates to tangent space ************************************************** ******
tempLight = tangentBasis * tempLight;
halfVector = tangentBasis * halfVector;
spotDir = tangentBasis * gl_LightSource[0].spotDirection;

// pass texture coords to fragment shader ************************************************** ****
gl_TexCoord[0] = gl_MultiTexCoord0;

// putting distance in w component of light ************************************************** **
lightDir = vec4(tempLight, 0.0);
lightDir.w = length(aux);

// convert vertex position ************************************************** *******************
gl_Position = ftransform();
}
varying vec4 lightDir;
varying vec3 normal, halfVector, spotDir;

uniform sampler2D decalMap;
uniform sampler2D normalMap;

void main()
{
vec3 n,l,halfV;
vec4 texel;
float NdotL,NdotHV;
float att;
float spotEffect;
float dist;

// retrieve material parameters ************************************************** *************
vec4 color = gl_FrontLightModelProduct.sceneColor;
vec4 ambient = gl_FrontLightProduct[0].ambient;
vec4 diffuse = gl_FrontLightProduct[0].diffuse;
vec4 specular = gl_FrontLightProduct[0].specular;

// compute normals from normal map ************************************************** **********
vec2 tuv = vec2(gl_TexCoord[0].s, -gl_TexCoord[0].t);
n = 2.0 * (texture2D(normalMap, tuv).rgb - 0.5);
n = normalize(n);

// compute light ************************************************** ****************************
l = normalize(lightDir.xyz);
dist = lightDir.w;

NdotL = max(dot(n,l),0.0);

if (NdotL > 0.0)
{
spotEffect = dot(normalize(spotDir), normalize(-l));
if (spotEffect > gl_LightSource[0].spotCosCutoff)
{
spotEffect = pow(spotEffect, gl_LightSource[0].spotExponent);
att = spotEffect / (gl_LightSource[0].constantAttenuation +
gl_LightSource[0].linearAttenuation * dist +
gl_LightSource[0].quadraticAttenuation * dist * dist);

color += att * (diffuse * NdotL + ambient);

halfV = normalize(halfVector);
NdotHV = max(dot(n,halfV),0.0);
color += att * specular * pow(NdotHV, gl_FrontMaterial.shininess);
}
}

// apply texture ************************************************** ****************************
texel = texture2D(decalMap,gl_TexCoord[0].st);
color *= texel;

// set fragment color ************************************************** ***********************
gl_FragColor = color;
}Please note these shaders are not yet optimized according to your advice, but I am still testing the whole thing, so I post here what I know is running all right on my machine

hardtop
10-30-2005, 11:42 AM
instead of computing the spotDir in tangent space in the VS, and then pass it to the FS, I tried this:


spotE = dot(normalize(gl_LightSource[0].spotDirection), normalize(-tempLight));in the VS. tempLight is basically the light pos in eye space. My FS remains the same except the following lines:


spotEffect = dot(normalize(spotDir), normalize(-l));
if (spotEffect > gl_LightSource[0].spotCosCutoff)become this:


if (spotE > gl_LightSource[0].spotCosCutoff)where spotE is a varying which was calculated in the VS.

This works (in a way), but I notice no perf increase, and worse the bump is correct BUT the light is computed per-vertex and interpolated. So if I have a big quad in front of me, and I illuminate its center, I'll see no light, I have to "illuminate a vertex". no good.

shader development is definately not that easy :eek:

hardtop
10-30-2005, 11:59 AM
another test: when I perform 2 simplistic operations in my shaders:


gl_Position = ftransform();in the VS and


gl_FragColor = vec4(1.0,0.0,0.0,1.0);in the FS, I get 3x more FPS than with the full-bump thing (this means 45FPS instead of 15) BUT it's still 3x less than with fixed function (140-150FPS), so my bump shader needs optimization, but I thing there is something else.

hardtop
10-30-2005, 12:04 PM
OK my mistake, I just figured out that I was using 2 shaders (one for blinn-phong PPL-specular illumination, the other for bump mapping) and when I disable the blinn one, I get 30-33 FPS with the bump shader (not much optimized, so I think I can gain some more FPS). Maybe I should test one at a time...

Zulfiqar Malik
10-30-2005, 07:42 PM
Originally posted by hardtop

OK my mistake, I just figured out that I was using 2 shaders (one for blinn-phong PPL-specular illumination, the other for bump mapping) and when I disable the blinn one, I get 30-33 FPS with the bump shader (not much optimized, so I think I can gain some more FPS). Maybe I should test one at a time...

So does that mean that your problem has been solved?



Originally posted by hardtop

another test: when I perform 2 simplistic operations in my shaders:

code:

gl_Position = ftransform();

in the VS and

code:

gl_FragColor = vec4(1.0,0.0,0.0,1.0);

in the FS, I get 3x more FPS than with the full-bump thing (this means 45FPS instead of 15) BUT it's still 3x less than with fixed function (140-150FPS), so my bump shader needs optimization, but I thing there is something else.

Something is definitely wrong! The compiler should ignore your shader code totally and the entire shader will compile to just one MOV instruction and you should be getting a gazillion FPS on that!

hardtop
10-30-2005, 11:36 PM
actually the problem is the FPS loss, which in this cas has been *partially* solved: I'm using 2 shaders, and I think the other one (the blinn/phong illumination one) is being applied to a much larger polygon count. By disabling the blinn shader, my bump one gives me 35 FPS instead of 15. But 35 in regard to 150 is still not satisfactory.

As you said: even with just one instuction in the VS and one in the FS, I get a noticeable perf hit (50/150 FPS) which must not be normal.

I'm not through this issue yet... :( next thing I'll test is to put a polygon count into my app, sothat I can know on how many polygons my shader is currently being applied.

hardtop
10-31-2005, 12:42 AM
OK I added polygon count, here are the results:

When I tested the app and gave you FPS, it was in a situation where my BSP tree culling was displaying around 6000 tris. :eek: It seems really weird to suffer such a slowdown with such a small polygon count :confused:

To summarize:


fixed function pipeline - 6000 tris - 150 FPS
bump mapping + blinn - 6000 tris - 15 FPS
bump mapping alone - 6000 tris - 35 FPS
one instr. shader - 6000 tris - 50 FPS


by "one instr. shader", I mean the shader where I perform just the transformation in the VS and the color to red in the FS.

mogumbo
10-31-2005, 07:43 AM
I have always had better luck with tiny shaders, getting even better performance than the fixed function pipeline. So your performance numbers seem strange to me. Is it possible you are binding the shader too many times? How often do you call glUseProgramObject? That command can hurt performance if you do it a lot.

hardtop
10-31-2005, 07:55 AM
Indeed I change shaders (from blinn to bump) many times a frame -actually, according to the "user-defined material" currently rendered-.

I tried setting the shader (just blinn for the test) only once for all, and there I got a 25-30% increase. But it's still 25 FPS instead of 150. And I need to change shaders at least once per frame... I know I'll have to re-order the way I draw my faces sothat I'll have to change shaders only once per "material" type, but even then, 25 FPS is still not enough. :eek:

hardtop
11-02-2005, 11:59 PM
by the way... I know this is a little bit off-topic, but I know we have a friend from Pakistan here. I just wanted to make sure you were all right, after that terrible earthquake which shaked Pakistan and Islamabad.

Zulfiqar Malik
11-03-2005, 12:15 AM
The damage has been collosal, but things seem to be working out after a lot of support from international and local community. It really is great to see amazing people, selflessly helping others who they don't know and don't even share their ideology with. Hats off for them!

hardtop
11-06-2005, 06:37 AM
OK I finally sorted things out:

I performed a test when choosing which shader to use: now if a shader is already active, I don't reactivate it... obvious, but I never though to do it. I now calculate the attenuation denominator in the VS. I put some small-sized, frequently-accessed function as inline I display the polygon count, I usually have 5000-6000 tris per frame
To render my scene, I switch between 2 shaders: one for the blinn illumination, and one for the bump-mapping.
Without shaders, I had about 140 FPS.
With my shaders and the modifications I made based on your advice, I now have 80 FPS.

Definately a performance increase! I guess it's normal to get half the FPS when performing per-pixel calculations instead of per-vertex. I know I'll have to perform much more optimization, but it's already encouraging. One last note: I perform the tests with a single texture, at 800x600 resolution, windowed-mode. With 1024x768 fullscreen, the results are almost the same.

Thanks a LOT for your help, folks!

HardTop