NV freeze bug GL_MULTISAMPLE + glPolygonOffset still not fixed
I just bumped into the following bug:
Hardware: GeForce 8800 GT
Driver version: 319.32
OS: Linux-x86 (Linux Mint 13/Ubuntu 13)
1) have GL_MULTISAMPLE enabled
2) render geometry with GL_POLYGON_OFFSET_FILL enabled
3) then draw any geometry with GL_POLYGON_OFFSET_FILL disabled
4) application + OS freezes (~1 sec) intermittently, depending on scene content/view angle
* disable GL_MULTISAMPLE before rendering any geometry with GL_POLYGON_OFFSET_FILL, enable again afterwards
* disable GL_MULTISAMPLE after rendering any geometry with GL_POLYGON_OFFSET_FILL, never enable again until next glClear
I tried to write a minimal repro program, but it turns out things are a bit more complex! It appears that the GPU must be under some specific load (although trivial to reproduce). For example, I can toggle the bug behavior by opening at least one tab in my browser playing a Youtube video (even when paused). I suspect that has to do with hardware video decoding. I can reproduce the bug very reliably once it does occur, and trigger at will by enabling/disabling GL_MULTISAMPLE, enabling/disabling GL_POLYGON_OFFSET_FILL, or increasing/decreasing GPU load (e.g open/close Youtube tabs in Firefox). That's on Linux. On Windows, I found that running two instances of my application was the trigger (though probably any realtime GL/DX app would do). I am very confident it is not a bug in my program or OS. My program runs fine on a variety of platforms, Windows, Linux and ARM (Raspberry Pi fyi).
I previously ran into this same bug on Windows ~1 year ago, with an entirely different codebase. Both that and my current application are drawing simplified collision geometry on top of visible geometry for collision detection debugging purposes. This typically causes *a lot* of z-fighting over most of the viewport. Hence the use of glPolygonOffset. So it seems the best bet to reproduce this bug is to draw a lot of co-planar geometry of varying properties.
Increased screen coverage and scene complexity also appears to help trigger the bug more frequently. I'm typically experiencing a hang with interval of anything between 0 seconds to 10 seconds during regular gameplay (varies wildly, but consistently so). When I lock down the camera in a static scene, any frame that freezes, freezes all subsequently rendered frames (= FPS drops to ~1), and any frame that doesn't freeze, doesn't freeze any subsequent frames (=FPS maxes out). At any rate, it is nothing like 'microstutter', typically the entire machine locks up one second, every other few frames rendered.
Now, this is actually a older bug. According to the svn log of my older project; it was reported (by someone else) in 2005, and confirmed by Michael Gold (nVidia) in 2006. It was (then) believed that it had to do with negative arguments passed to glPolygonOffset while FSAA was enabled. As I noted, I can still reproduce the bug with both positive and negative arguments alike.
Some possibly useful notes:
* the freeze is always of the same duration, approx 1 second (my guess is that the driver catches an infinite loop with a time-out)
* it doesn't matter if the values passed to glPolygonOffset are positive or negative or zero (any combination will result in the freeze)
* depth testing is not a factor, it happens when the framebuffer has no depth buffer or GL_DEPTH_TEST is disabled for *all* geometry
* primitive type of the geometry drawn on top (GL_LINES, GL_TRIANGLES etc) doesn't matter
* curiously, even when polygon-offset geometry is clipped the bug still happens!
* occurs with 2xAA and up
* occurs likewise when all geometry is rendered with glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
* same behavior GLSL and FFP alike (not a shader issue)
* there's no flashing or graphics corruption apparent (appears nothing is cleared nor rendered to FB during freeze)
My hunch is that polygon-offset geometry clipped by the nearplane (or otherwise behind the eye origin) is in some way not entirely rejected in some way by the GPU, and subsequent non-offset geometry projected and rasterized to the same region causes some...bad stuff to happen. Like a glitch in hierarchial z-culling, since the bug occurs even if no offset geometry is rasterized (in my case all behind the camera/eye origin)...but then again, bug occurs without depth buffer being present (no early-z culling needed)... Also, what does GPU load have to do with that?
Things I have NOT tried (just thinking out loud):
* does it happen when rendering to multisampled FBO?
* just curious if NV_DEPTH_CLAMP has any influence
* measure freeze duration
* read back framebuffer after freeze to see if anything was rendered at all (i.e. does buffer swap occur at all?)
* WebGL DoS!?
Hope that helps. Any ideas to narrow it down further are welcome.
Last edited by remdul; 07-31-2013 at 03:06 PM.
Found the thread on Gamedev.net where this bug was first reported:
Anyone at nVidia read this? Remember, this bug also affects Windows, not just the Linux driver...
It would also be trivial to exploit this bug in WebGL, simply add a few iframes with youtube video's, and render geometry that triggers the bug to WebGL canvas continuously.
I don't see how the browser could catch this as the entire OS hangs. By the way, whose responsibility is it to catch such exploits? The browser (plugin) or driver?
Last edited by remdul; 08-05-2013 at 02:41 AM.
I think it would be best to file a Bug: http://nvidia-submit.custhelp.com/app/ask.