Problems with Nvidia GTX460

Hi,

I have just bought a new 1GB GTX460 card after I read a lot of raving reviews about it. So far I am very disappointed. I downloaded the latest driver (258.96, OS: XP Pro SP2) and ran my usual benchmark tests, Ozone3D Soft shadows and Fur.

I compared the results to my slowest GT200 based card, the GTX260. Here are the results:

                       GTX460                    GTX260

Soft shadows: 3814 8865
Fur (12801024, No AA) 5830 4837
Fur (1280
1024, 8*MSAA) 1732 2844

These data are very disappointing to say the least, but the real reason I opened this topic is that my otherwise working shader is not working as expected on the GTX460. The shader is based on GLSL 1.2/GL 2.1

Also planar reflections, shadows and bump maps the shader otherwise supports are vanished, they simply do not appear. I do not yet know the reason, but it seems that the card or the driver does not support the same number of texture units as the previous generations (I need eight and there supposed to be 64). The features that do not work all use texture units >= 4.

Is this a known limitation of the current driver? Is GL2.1/GLSL1.2 support is limited? I apologize if this is the wrong place to ask help with this.

Update: That shader can be compiled (both vertex and fragment shaders) but linking them returns an error. I did not notice this earlier because all simplified versions (without options like shadows/reflections, etc.) work.

Can’t speak to those cards, but in our testing (with slighly higher-end cards of the same generations) GTX480 totally smokes a GTX285.

Perhaps you can isolate your perf issue to a short GLUT test program you can post, so others can confirm/refute your results?

Thanks for the reply.

Yes, that is what I expected, too. May be the test I use is wrong.

Right now my main problem is that my “übershader” with all features switched on does compile but linking the vertex and fragment shaders fails. Other shaders, most of them stripped down versions of the “übershader” still work.

I think this has to be a bug, maybe in the Cg compiler. This shader works with all previous Nvidia card generations since the G80. Can you tell me how can I get help from Nvidia?

Thanks.

Several NVidia guys read these forums. Often if you post a compelling test case, they’ll sometimes pull it right out of the forums and work it, following up here with status.

However, I believe the official “bug report” form is on http://nvdeveloper.nvidia.com.

In either case, you’ll want a short test program that illustrates the problem for reproduction/testing purposes. So I’d cook that first, and post it both places so other folks can try.

I ran into a similar problem with the Nvidia 256+ drivers when using a loop to access the gl_LightSource[] array. My shader compiled and linked fine in previous 197 and lower drivers. I would get an internal compiler error plus a lot of assembly when attempting to link the shader.

My lighting loop was something like this:

for(i=0; i<gl_MaxLights; i++)
   // some code that accessed various gl_LightSource[i] members

I worked around the issue by unrolling the light loop. This affected all Nvidia hardware I tested (Quadro, GEForce DX10/9) so I posted a problem report on Nvidia’s ndeveloper site. I don’ think it’s been resolved yet though.

Since the GTX460 requires 258.xx, you may be running into the same problem, and perhaps this workaround will work for you as well. If your issue is the same problem, you may want to submit your own report to Nvidia too.

I had a similar issue when moving from Geforce8/9 to 460, in using gl_ClipDistance… unrolling the loop by hand made the compiler happy. I was so convinced it was a driver bug (since GeForce8/9 took it with no issues, but it was MY bug. In the GLSL spec, if you use some of the built in array types (like gl_ClipDistance) you need to either declare the size of the array or make the source code so that the GLSL compiler can figure out the maximum index you access.

Yes, this sounds very much like what I could have. Unfortunately I have several loops and it would be very hard to unroll all of them but I will try.

Are you sure you only had the problem with the vertex shader and the fragment shader was OK?

In the GLSL spec, if you use some of the built in array types (like gl_ClipDistance) you need to either declare the size of the array or make the source code so that the GLSL compiler can figure out the maximum index you access.

I just tried ‘uniform struct gl_LightSource[gl_MaxLights];’ (and [8]) in the shader and it unfortunately made no difference. Thanks for the tip, though :slight_smile:

Edit: I figured a constant loop was good enough for the compiler to define the index limits; is this not the case?

Fragment info
-------------
Internal error: assembly compile error for fragment shader at offset 19551:
-- error message --
line 493, column 1:  error: binding in multiple relative-addressedarrays
-- internal assembly text --
!!NVfp4.0
  // much assembly follows....

Also, it was the fragment shader with the lighting loop, not the vertex shader. Sorry for being a little vague there.

“error: binding in multiple relative-addressedarrays”

This is exactly the error message I am getting. Do you know what it means? I have no idea. Also how do you unroll a loop involving light parameters if you pass the number of lights in your scene as a uniform? Copy the code a number of times (maxlights times) with an “if” statement like this:

if (numlights>1)
{

}
if (numlights>2)
{

}

I’m pretty sure that because it’s an internal compiler error, the message will only make sense to developers at NVidia. I could hazard a few guesses at its meaning but I don’t think it’d help much.

Yep, that’s the way you’d unroll a uniform loop. Not pretty. I used a macro to hide all the nastiness:


#define APPLY_LIGHT(i)   \
   if(i < numLights)     \
      // light loop

APPLY_LIGHT(0)
APPLY_LIGHT(1)
    :
APPLY_LIGHT(7)

I believe I had to do that rather than a function because at the time we were working around an OSX 10.5 issue that didn’t allow indexing into the light loop with function calls or loops. But I’d try a function first - appending 's gets tedious after a while :slight_smile:

Thanks, I will try a few things. Although I have more than one loops and several conditions inside them because all lights can be per vertex or per pixel.

How the hell can they screw up a compiler that worked for the past 3-4 years?

I filed a bug report on nvdeveloper two weeks ago, so far there has been no response.

Since then I implemented the workaround suggested by malexander (thanks once again) and I unrolled the for… loops that caused the compiler error. Now all my shaders compile and can be linked without errors.

Two odd things remain: when I switch on the shadow mapping option in the application, it takes a lot of time (~2-3 seconds) to anything appearing on screen. After that time, render is ‘normal’ and fast. I suspect that the shader that handles shadow mapping is being recompiled (which is slow). Is this possible even though the only thing happening is a glUseProgram?

The other is bump mapping which is handled by the same shader as shadow mapping. Bump mapping does not work at all. I do not know what can I do because there is nothing wrong with the shader, it works with previous Nvidia GTX cards.

Use ARB_timer_query and/or CPU timers to see where all your time is getting lost. To check your assumption first, time the period before reconfiguring your shaders to after the render of the first batch after the first glUseProgram.

Thanks, I will certainly do that. So far, only my gut feeling tells me it is a shader recompile thing, the same application/shader with older cards and older drivers don’t do this at all.

In the mean time, can I just ask you how do you handle the - sometimes really long - time it takes to compile all your shaders? I use 15 shaders and compiling them takes maybe 30-35 seconds before the application can start. And I guess 15 shaders are not that many.

Well, there’s interesting discussion on this going on in this thread for more detail.

But basically if you know all your shader permutations beforehand, you can just take the compile hit for them during your user’s first run of the app and stash them off on disk with ARB_get_program_binary. Then just load up the binaries next time and hand them to GL.

If you really can’t have that slowdown the first run either, then it gets sticky. If you think you’re spending a lot of your compile/link time linking the very same vert/frag shaders (or ones with compatible interfaces) to each other in different permutations, you can use ARB_separate_shader_objects to reduce some of the overhead and hope that does it, to the detriment of cross-shader optimization.

If you know something about the your vendor’s binary shader implementation, you can precompile those permutations (or those you expect your users will need) and deliver them with your application. But that’s getting into undocumented vendor voodoo.

And you can also try to compile the shaders in a background thread on a separate core and hope you don’t hit any weird lock/context swap/reentrancy issues talking to GL at the same time with multiple threads. This is an area I definitely don’t know enough about. Hopefully someone else will chime in with the “this is how you do it and it works everywhere with no perf hit” solution.

I saw in Mafia II demo that they have these files in the shader folder (around 16MB each):

ShaderCache[ATF4].sc
ShaderCache[ATI_].sc
ShaderCache[NV__].sc
ShaderCache[UNKN].sc

So I guess, they precompiled the shaders.

Unfortunately I cannot use ARB_get_program_binary and ARB_separate_shader_objects yet. My current application is based on GL 2.1 and GLSL 1.1.

I do plan to rewrite it based on 4.1 and 1.5 but that is going to take months and until I feel the benefits in terms of rendering speed are substantial enough I am reluctant to even start.

Compiling the shaders in a different thread might be a solution with the current version. Maybe the application itself can start in fixed function ‘mode’ until all shaders are compiled and linked in a separate thread and the switch to the appropriate shader takes place only when that is finished. Is this possible or can I expect weird things to happen this way?

Nvidia responded to my bug report and says the the problem is fixed in the 260 drivers. I tried the recent 260.63 beta drivers and it works. You might want to give them a spin and see if it solves your issue as well.

Thanks for letting me know. I already downloaded the new driver but did not install it yet. But now I shall and will see if all of my problems with the new line go away.

It looks like all my problems with the GTX460 are gone!

What a difference a new driver makes…