PDA

View Full Version : nVidia crash (cause yet unidentified)



andras
07-17-2005, 08:55 AM
I'm trying to render into an FBO attached texture. Everything seems fine (no GL errors, the FBO is complete), but when I call glDrawArrays, the computer hangs so bad, that I have to perform a hard reset (making debugging it a bit hard). If I comment out the glDrawArrays, then it doesn't crash (but obviously I can't see anything). Also, I've tried calling glDrawArrays with 0 as count, so it shouldn't render anything, and it still crashes!!

I don't know if this is related, but I suspect it is: I had this problem yesterday, when I got an FBO error (see this thread (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=013610) ) which I could get rid of, but it still doens't feel right.

Any ideas?

This is on nV GF6600/77.72

andras
07-17-2005, 09:50 AM
Well, as you can see in the other thread (http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=013610) , the FBO problem seems to be solved, yet the crash is still here. I'll try to isolate it a little more, just takes some time (have to reboot after every change :) ).. will report back soon.

andras
07-17-2005, 10:42 AM
Ok, I think I'm getting closer. So even though it's the glDrawArrays call that actually crashes, it seems that the problem is rooted in a VBO setup before the draw call. I have a VBO that is originally set up to hold 4k bytes. But then, at one time I only put in 32 bytes with this call: glBufferData(GL_ARRAY_BUFFER, 32, data, GL_DYNAMIC_DRAW); And this brings my system to a proper halt at the next glDrawArrays, that's sourcing data from this VBO. Another interesting thing is that I tried setting the data pointer to NULL, thinking that the driver will still allocate memory, just won't copy anything, but then glDrawArrays gave me an access violation exception.. The specs says that in this case the contents will be undefined, but to me this means that it's just random (but valid) junk inside the VBO.

Well, I'm hoping that someone from nVidia will comment on this, and in the mean time I'll try to find a workaround.

andras
07-17-2005, 12:06 PM
Ehh, this thing just doesn't want to go away. Now I try to use map/unmap instead of bufferdata. Here's what happens: I actually have two VBOs, if I only map/unmap any one of them (don't write anything), then it's fine. But as soon as I map/unmap one and then map/unmap the other too (still no writes), then it crashes. When I say map/unmap I actually mean bind, map, unmap, bind(0).

But at this point, the code is pretty much the same I have everywhere else, but this is the only place it crashes..

I'm starting to run out of ideas here... :(

andras
07-17-2005, 03:36 PM
Finally, I have managed to change the absolute system halt to a simple access violation crash. This enabled me to quickly yank out GL calls, until the crash totally disappeared. I've logged every GL call using GLIntercept, from app start until the crash. Now I have two very similar files, one working (http://web.interware.hu/bandi/download/this_works.txt) , one crashing (http://web.interware.hu/bandi/download/this_crashes.txt) , the difference is minimal, and I can't see anything wrong with it.. Can you?

EDIT: I've just trimmed down the logs even more. I don't think I can make it any shorter..

sqrt[-1]
07-17-2005, 04:58 PM
In the one that crashes it seems you are sourcing the VBO data from two different VBO's.

glBindBuffer(GL_ARRAY_BUFFER,2)
glVertexAttribPointer(0,2,GL_FLOAT,false,0,0x0000)
glEnableVertexAttribArray(0)
glBindBuffer(GL_ARRAY_BUFFER,0)
glBindBuffer(GL_ARRAY_BUFFER,3)
glVertexAttribPointer(1,2,GL_FLOAT,false,0,0x0000)
glEnableVertexAttribArray(1)
glBindBuffer(GL_ARRAY_BUFFER,0)
glDrawArrays(GL_QUADS,0,0) GLSL=6

This is very unusual (not even sure if it is valid) You also do an un-bind of the VBO before the draw call.

The other obvious thing is that the glDrawArrays has zero primitives to render.

andras
07-17-2005, 05:58 PM
In the one that crashes it seems you are sourcing the VBO data from two different VBO's.Yes, but I bind them to two different locations. I have two different vertex attributes in my shader.


This is very unusual (not even sure if it is valid) You also do an un-bind of the VBO before the draw call.Well, I have to unbind it, how else could I set the next attribute otherways? :)


The other obvious thing is that the glDrawArrays has zero primitives to render.Heh, I set it to zero prims intentionally, for debugging purposes. It shouldn't (and doesn't) make any difference.

Next? ;)

andras
07-17-2005, 06:38 PM
Hmm, this is interesting. The VBO that is attached to the crashing draw call, well, I don't know the exact size required during creation, so I give it 1024k to be safe, but I'm actually only using a fraction of that (approx 70k). So, if I cheat, and give it only 128k, when creating the VBO, then it doesn't crash! Of course, now I have to turn back everything and see if it still runs, maybe I'm just getting lucky again...

andras
07-17-2005, 07:09 PM
So far everything seems to work nicely, with the smaller VBOs, although I'll have to do some more thorough testing, but I'll leave that for tomorrow..
enough debugging for today! :)

Any words from nVidia?

Java Cool Dude
07-17-2005, 09:32 PM
Send me your source (if you don't mind) and we'll take a look at it.

andras
07-18-2005, 06:31 AM
Umm, can't you reproduce the problem using the intercepted GL calls? Do you need only some source text, or do you have to be able to compile and run it?? I don't know about the latter, because our system is pretty complicated, with external databases and all kinds of other stuff, you can't just "run" it out of the box...

EDIT: I might be able to create a smaller, stand-alone test case, but it will take some time..

andras
07-18-2005, 01:42 PM
Originally posted by Java Cool Dude:
Send me your source (if you don't mind) and we'll take a look at it.Ok, I've created a reasonably small test case, which I can send to you. We've tested it on three different computers, one PCIX16x GF6600 256MB, one PCIX16x GF6600GT 128MB and one AGP8x Quadro FX 3000 256MB, all using 77.72 WHQL drivers. Both 6600 crashes the same way, but the quadro works fine.

All computers running WinXP Pro SP2.

andras
07-21-2005, 06:23 PM
Ok, I'm starting to get a bit paranoid.. I've just switched my nVidia GF6600 to an ATI X800 with Catalyst 5.7 and I'm experiencing very similar crashes! This tells me that the bug must be in my code somewhere, but I've stepped through the code a thousand times and can't see anything wrong... Also, the crash happens inside the driver, after a totally legal call! Again, I've intercepted all the OpenGL calls, so I have two new files: one working (http://web.interware.hu/bandi/download/this_works.txt) and one crashing (http://web.interware.hu/bandi/download/this_crashes.txt) .

Please take a look at it (it's pretty short), and let me know if you see anything wrong!! Anything! I've been struggling with this for almost a week now, and I'm totally stuck...

Thank you!

EDIT: I've also put up a commented (http://web.interware.hu/bandi/download/this_crashes_commented.txt) log, and removed all the wgl* calls for clarity. Please take a look!

yooyo
07-22-2005, 05:19 AM
As far I can see, you use 2 vertex attributes in first shader and only one in second shader. You didn't disable unused attribute pointer for second shader, and this might cause problem.

In your case, you have attribute "position" in both shader, but they can be placed on different attribute location in different shader depending on attribute amount. This require to setup attribute pointer after shader switching, and this is bad (slow).

Better approach is to define attribute "position" using glBindAttributeLocation before program linking and make a sure that same attributes from all shaders will be assigned to same channel. In this case, setup pointers once and later just enable/disable clientstate.

Finally... you use attributes 1 & 2, but (im not sure is this bug or feature) attribute 0 should be locked for vertex position (gl_Vertex). This mean, hw need something to trigger vertex shader execution and this is done by writting something into channel 0 (attribute gl_Vertex, glVertexXXX calls). If this issue is true then, there is a nothing to triger shader execution and GPU got locked because there is no input shader data.

yooyo

andras
07-22-2005, 06:43 AM
As far I can see, you use 2 vertex attributes in first shader and only one in second shader. You didn't disable unused attribute pointer for second shader, and this might cause problem.Yooyo!! You are the man!!!! Disabling the arrays fixed it on the ATI, now I have to check the nVidia too, but I guess it'll be working too!

Hey, is disabling the arrays required by the GL standard?? Or is this a driver bug? I mean, one would think that if a location is not used by the shader, then it's ignored..


Better approach is to define attribute "position" using glBindAttributeLocation before program linking and make a sure that same attributes from all shaders will be assigned to same channel. In this case, setup pointers once and later just enable/disable clientstate.Yeah, I might do this later, but performance is not an issue at the moment.


Finally... you use attributes 1 & 2, but (im not sure is this bug or feature) attribute 0 should be locked for vertex position (gl_Vertex). This mean, hw need something to trigger vertex shader execution and this is done by writting something into channel 0 (attribute gl_Vertex, glVertexXXX calls). If this issue is true then, there is a nothing to triger shader execution and GPU got locked because there is no input shader data.Yes, I don't use gl_Vertex, I only use custom attributes, but I don't think that's a problem, because I had my program working without any problems with lots of different shaders before, but I never got the crash until now..

THANKS DUDE!!!

Andras

EDIT: It's working on the nVidia card too!! Well, this was it, it seems like! Thanks a lot!

andras
07-22-2005, 08:02 AM
I've been looking at the specs, and it's pretty weak about EnableClientState/DisableClientState.. They don't really say what it does, or what the requirements are, basically it says nothing, except that they enable or disable the arrays.
I know that it makes sense in the non-shader era, when everything that's enabled is used, but now that we have shader attributes and locations, it could use a bit more detailed description. Anyway, the lesson is that it's better to be safe and disable everything that's not used.