7800 / 7900 GTX driver bug or hardware problem? edit: sample included

I have an application that seems to only ‘crash’ on a 7800 GTX and two 7900 GTX cards that I have, using all of the official and leaked drivers that I can get to work. The 7800GT and two 6800GTs that I’ve tested don’t crash.

I’ll explain why I use the term crash loosely.

The only things displayed are an ocean surface rendered with GLSL and a skybox.

When the bug manifests, it looks like every other triangle is skipped, or triangles extend to infinity. The skybox textures are corrupted with a small amount of green or pink texels. Most of the time the monitor will flicker or not recognize the video signal, followed by a BSOD and an infinite loop error in nv4_disp.dll

Sometimes after a few seconds of the triangle/texture corruption, everything clears up, with the caveat that performance is halved, usually even after an application restart.

This bug only occurs when the ocean mesh is being drawn with the GLSL vertex and pixel shaders applied, and is much more likely to happen when the mesh is rendered with VBOs or display lists.

The bug is much more likely to occur as I zoom far away from the ocean, so that more vertices visible in the viewport, OR if I scale the mesh in the vertex shader such that the triangles are very large w.r.t. to the viewport.

If I severely underclock the 7800 and 7900 GTX cards, the crash doesn’t happen. I disabled OpenGL multithreading optimizations via a registry setting, but it didn’t help.

It sounds like faulty hardware to me, but it happens on three different cards and two different boxes. (Both CPUs are Athlon X2 4400s, one with an A8N SLI Deluxe, the other with an A8N SLI 32)

I’m going post a bare bones executable that isolates the problem soon.

Any advice would be greatly appreciated.

After modifying my code to use float textures as render target, I got similar problems as you reported. Usually the screen will flick and the system will not respond and I have to reboot. But such problems doesn’t always occur. Sometime it works just fine.

Okay, here’s a sample application:
http://www.effloresce.com/cat/opengl.org/CrashTest.rar

Run tide.exe, bring down the console with two ~ tilde presses, and type “script tests/crash”.

Zoom far away from the ocean with the right mouse button, but look straight down as you do so.

I tried your program. Yes, it did crash my PC :slight_smile:

May I ask what card, CPU and driver set you’re using?

try lowering the memory requirements for the app. eg turn off antialiasing, smaller screensize, texturesize.
also VBOs + DLs consume memory.
ive seen similar things to what u describe when im using quite a bit of the cards memory (no excuse though)

zed, I noticed that the crash happens more frequently if the framebuffer is large. I can’t reduce VBO space due to the nature of the application.

Hi CatAtWork,

We’re trying to repro with your app now. I’ll let you know what we find.

Thanks -
Cass

That’s very much appreciated. Let me know if you need the entire source tree.

Hmm, we’ve not been able to repro yet with 7900.
Can you provide more details about the system and drivers you’re using?

Hopefully opengl_fan will report his system configuration.

Here’s mine:

Motherboards:

A8N SLI and A8N SLI32. PEG mode on and off, no overclocking, both LAN controllers enabled.

Crashes SLI and single GPU configurations.

Crashes regardless of dual core hotfixes.

Crashes on XP and XP 64.

Windows XP Professional (5.1, Build 2600) Service Pack 2 (2600.xpsp.050928-1517)
AMD Athlon™ 64 X2 Dual Core Processor 4400+
Memory: 3072MB RAM
DirectX Version: DirectX 9.0c (4.09.0000.0904)

==========
EVGA 7900 GTX
Video Bios Version: 5.71.22.12.01
IRQ: 18
Bus: PCI Express x16
Memory: 512MB
Forceware: every single 8x.xx version I have found that works on 7900s.
INF modding the WHQL 78 and 81.85s didn’t work.

Clock frequency from Coolbits: (crashes with w/o Coolbits, too) 650 core, 800 mem.

Product: “G71 Board - p348h10”

Minor addendum:
Does NOT crash on a 6600 w/ 67.66 drivers on an A8N SLI.

Sorry for late report :slight_smile:

Here is mine, very similar to yours :slight_smile:

MB: A8N SLI Premium.
OS: XP Pro (SP2)
CPU: AMD Athlon™ 64 X2 Dual Core 4400+
Memory: 2048MB RAM

Video Cards:
EVGA 7900 GT (256M) (Two)
Video Bios Version: 5.71.22.14.04
Forceware Version: 84.56

Excellent, thanks!

The new beta 90 series Forceware drivers look to have solved crashing on my 7800GTX. I’ll report back on the two 7900GTX cards tomorrow.

The 7900 GTX system hard locks. :frowning: I’m going to assume that the eVGA 7900s are faulty for now.

EDIT: The system locks during the Canyon test in 3DMark06, and Cozmo kindly pointed out that other people are having problems with their eVGA 7900s as well. Ugh.

Originally posted by CatAtWork:
[b] The 7900 GTX system hard locks. :frowning: I’m going to assume that the eVGA 7900s are faulty for now.

EDIT: The system locks during the Canyon test in 3DMark06, and Cozmo kindly pointed out that other people are having problems with their eVGA 7900s as well. Ugh. [/b]
It’s bad news. We just bought the cards for 3 weeks :frowning:

CatAtWork, can you please clarify…
…does your 7900 GTX hard lock only running 3DMark06, or does it also hard lock running your OpenGL application from your first post in this thread?

If the hard lock produces a crash minidump, please PM me to let me know where I can get a copy of the crash minidump. Thanks.

My two 7900 GTXes hard lock running both 3DMark06 and my application.

Sometimes they can complete 3DMark06 and my application, but then they crash the system on exit.

If they don’t BSOD on exit, sometimes they flicker the display while giving me intermittent mouse control, ( 5 seconds of blank screen, .5 seconds of screen+mouse control, 1 second of screen w/o mouse control )so occasionally I can shut down the system.

Would a minidump from XP Pro x64 be okay? I don’t really want to bring down my current workstation if I can avoid it.

x64 would be fine. Please also tell me which display driver version this is run against, in case the minidump is missing the identifying information.

I’m still trying to get a BSOD to produce a minidump. The system is hanging, but it’s not producing BSODs.

Here are two Minidumps from two weeks ago; they stem from an infinite loop in nv4_disp, but I’m not sure what driver version I was using at the time.

http://www.effloresce.com/cat/opengl.org/Minidump20060516.rar