PDA

View Full Version : nVidia FP uniforms driver optimization lags



Jackis
01-22-2007, 02:22 PM
Hi all!

Some time ago we wrote about very strange problem, causing a driver to stall for some dramatic time, when FP uniforms are changed (or set for the very 1st time) and some geometry were drawn, but not even a bit of comments were got, except for other people complained about pretty the same problem and the official advice to pre-render everything (which is not fairly well in most cases).

But the problem were found and localized - nVidia drivers don't like exact numbers like +-0.0f, +-0.5f and +-1.0f in FP uniform constant!!! Changing even a bit of mantissa of these "magic" values fixes almost all our problems. To all appearances, taking into consideration FP uniforms "constant" nature, driver thinks that it can improve this shader in order to make it much more fast and powerful (sic), and it creates unique shader realisation for this FP uniform value sub-set!

So please bear in mind, that some optimizations may be wanted to take place right in the middle of your application's execution.

Hope, that helps someone, who wrecked not a single week to localize, why sometimes lags take places.

Jeff Russell
01-22-2007, 04:59 PM
Thank you, this is useful information.

Zengar
01-22-2007, 08:49 PM
I would advice you submitting a test case to Nvidia, they should fix it.

Jackis
01-22-2007, 11:26 PM
By the way, according to shader's complexity, this lag varies from 50 mS to 200 mS, which is unacceptable by all means.

Don't Disturb
01-23-2007, 11:09 AM
Thanks from me also, a 50ms pause while my app is running would screw things up spectacularly!

Dark Photon
01-24-2007, 02:39 AM
And please post a link to that test program on the forum too. I think many of us would like to try that test program to confirm it (lots of quick, free test data for you).

Jackis
01-24-2007, 08:53 AM
By the way,

The same goes to GLSTATE uniform semantics. If you want to use state uniforms directly in your fragment shader, bear all these magic things in mind.
Even for glstate.light[0].position.

Tested on GeForceFX, GeForce6, GeForce7 on 93.71 forceware (the very last official drivers)

Jackis
05-16-2007, 10:06 AM
Okey, back to this topic...

I had not enough time to make test app, but now I'm ready to post it.

I hoped, that this bug would be fixed, but it is not fixed yet, so we've got small test application, which creates some VBOs with the same shaders, which are copied number of times to force effect to appear. It renders 200 quads with 200 copies of the same shader and with unique VBO each.
Buttons 0,1,2,3,4,5 makes it to change one uniform, which presents in lighting calculation as simple additive value ('H' displays some help dialog).
0 - uniform is 0.300 (default)
1 - uniform is 0.000 exactly
2 - uniform is 0.001 exactly
3 - uniform is 0.500 exactly
4 - uniform is 0.999 exactly
5 - uniform is 1.000 exactly
After you push the button, program will measure next frame time.
As you may see, when we set this uniform to one of the "dangerous" values (0, 0.5, 1) for the first time, we've got big lag.
Nothing special, shader is very easy (if it were more complicated - delay would be much worse, but it's enough to see, that lag really takes place).

Link on test program with sources: http://slil.ru/24377623

By the way, NV30 and G80 generations are free from this issue, so, it happens on all GeForce6 and GeForce7 chips.

Zengar
05-16-2007, 02:26 PM
I can confirm your results, I see it on my 7900gs too. OS is Vista with latest beta drivers.

kon
05-16-2007, 02:56 PM
Tested on my 6800GO and I get the big delay only for the value 0.0 (key 1)!
Btw, why does the text in the menu bar changes after pressing it a second time?

Jackis
05-17-2007, 02:17 AM
Text in title bar changes because it shows what was the previous uniform value, what is it by now, and next frame duration after uniform has changed.

zed
05-18-2007, 03:08 PM
what about one of the builtin uniforms
eg lightdiffuse color
or one of an vertex attribute
eg glColor

Jackis
05-20-2007, 07:44 AM
Vertex attribute, surely, don't get such a result.

Built-in uniforms behaves like common uniforms (as I said above: "The same goes to GLSTATE uniform semantics")

Dark Photon
05-24-2007, 06:02 PM
Confirmed here on NVidia GeForce 6800 Ultra AGP8X (1.0-9773 drivers) with Athlon 64 3500+ on WinXP:


val = 0.300 -> delay = 0.014
val = 0.000 -> delay = 0.572
val = 0.001 -> delay = 0.014
val = 0.500 -> delay = 0.649
val = 0.999 -> delay = 0.014
val = 1.000 -> delay = 0.650I look forward to the NVidia explanation on that one.

I'd try this on the higher end cards at work (various GeForce 7s & 8s) but we only run Linux on those.

def
05-25-2007, 01:23 AM
(Not) confirmed on Geforce 8800GTX 768mb ForceWare 158.22 on WindowsXP 32bit.

All measurements are 0.014 with vsync enabled, 0.002 without vsync. But I do see a delay visually switching to 0.0 and 0.5...

Same results on a Geforce 8800GTS 640mb ForceWare 160.03 on WindowsXP 32bit.

Jackis
05-25-2007, 05:07 AM
def

There is nothing strange with 8800, I also don't see any lags on them and on NV30.

It doesn't happen on GeForceFX (driver is not so optimized, as for NV40) and on 8-series (they have "real" FP uniforms instead of "fake" ones, so the optimization is not necessary).

nVidia tells, that they have some routine in their driver, which optimizes FP in-place, when uniform is changed. They tell, that this optimization must be very fast, but as you see - this is not true sometimes. They don't want users to have control on this process, but I hope sometimes it would be very comfortable to have some switcher to turn this process off.

Bugspray
05-25-2007, 12:28 PM
GeForce 8800GTX - Linux-x86_64 driver 100.14.06 - Dell 690 (2 Dual core 3.2 Xeons)

Key 0: val = 0.300 -> delay = 0.001
Key 1: val = 0.000 -> delay = 0.518
Key 2: val = 0.001 -> delay = 0.001
Key 3: val = 0.500 -> delay = 0.537
Key 4: val = 0.999 -> delay = 0.159
Key 5: val = 1.000 -> delay = 0.539

Cycling through the keys again, the delays all ranged from .200 to .400 sec.

Korval
05-25-2007, 01:35 PM
they have "real" FP uniforms instead of "fake" ones, so the optimization is not necessaryThe pre-G80 cards have real FP uniforms too. It's just that there are instructions that can be used to eliminate the uniform if it is a specific constant.

Longs Peak is scheduled to have the ability to define certain uniforms as constants, so that such optimizations will be performed only if the user specifies that the uniform is const.

Jackis
05-28-2007, 04:09 AM
Originally posted by Korval:
The pre-G80 cards have real FP uniforms too.Are you really sure in it? I have another information on this.

PkK
12-08-2008, 06:24 AM
We should lobby to get an application that changes uniforms into the next Spec suite to penalize such driver behaviour.


Philipp

Dark Photon
12-13-2008, 05:15 PM
We should lobby to get an application that changes uniforms into the next Spec suite to penalize such driver behaviour.
Second that! And changes them to specific values that some drivers think they can be slick about optimizing away, to hell with first-render frame breakage!

bertgp
12-15-2008, 06:49 AM
We should lobby to get an application that changes uniforms into the next Spec suite to penalize such driver behaviour.
Second that! And changes them to specific values that some drivers think they can be slick about optimizing away, to hell with first-render frame breakage!


Well I have to disagree to some degree... The driver can give significant performance improvement in some cases by recompiling a shader for some specific predicates (uniforms, texture formats, etc.)

I don't think anybody can be against those really _if_ they don't cause other glitches. The driver could be compiling shaders all the time for all I care, as long as it is done in another lower priority thread. Compiling is a CPU only task anyway so this could be done without hampering the GPU.

Zengar
12-15-2008, 07:56 AM
I don't think anybody can be against those really _if_ they don't cause other glitches. The driver could be compiling shaders all the time for all I care, as long as it is done in another lower priority thread. Compiling is a CPU only task anyway so this could be done without hampering the GPU.

Exactly what I was going to say...

Jackis
12-15-2008, 11:02 AM
I don't think anybody can be against those really _if_ they don't cause other glitches. The driver could be compiling shaders all the time for all I care, as long as it is done in another lower priority thread. Compiling is a CPU only task anyway so this could be done without hampering the GPU.

Exactly what I was going to say...

Totally agree, but now we have glitches, unfortunately. Luckily, new generations of nVidia cards do have no such a visible lag.

[EDIT] By the way, PkK, what maked you to revive this topic?? Sudden render dalays?? I ask it, because almost 2 year has passed :)