VBO Indexarray crashes on nVidia

Hi

I am hunting down bugs in my app, and now it seems to be related with index arrays in video memory, so i open up a new thread for this.

I have two VBOs. One for data, one for indices. If i bind both and issue a drawcall, it runs flawlessly on ATI, but crashes immediately on nVidia.

If i don’t put the index array into a VBO, it also runs as expected on nVidia.

Here is my Code:

Setup:
------

glGenBuffers (1, &m_uiVertexBufferID);
glBindBuffer (GL_ARRAY_BUFFER_ARB, m_uiVertexBufferID);
glBufferData (GL_ARRAY_BUFFER_ARB, m_uiElementSize * uiVertexCount, (void*) &m_pInterleavedData[0], GL_STATIC_DRAW);

glGenBuffers (1, &m_uiIndexBufferID);
glBindBuffer (GL_ELEMENT_ARRAY_BUFFER, m_uiIndexBufferID);
glBufferData (GL_ELEMENT_ARRAY_BUFFER, iIndices * sizeof (int), (void*) &m_pIndices[0], GL_STATIC_DRAW);


Bindbuffer:
-----------

glBindBuffer (GL_ELEMENT_ARRAY_BUFFER, m_uiIndexBufferID);
glBindBuffer (GL_ARRAY_BUFFER_ARB, m_uiVertexBufferID);

for (int i=1; i < max; ++i)
{
	glEnableVertexAttribArray (i);
	glVertexAttribPointer (i, ...);
}

// array 0 is set LAST, as nVidia advises in their paper about VBOs
glEnableVertexAttribArray (0);
glVertexAttribPointer (0, ...);



Render:
-------

Bindbuffer ();

// this crashes
glDrawElements (GL_TRIANGLES, uiCount, GL_UNSIGNED_INT, BUFFER_OFFSET (uiFirstVertex * sizeof (int)));

// this works just fine
glBindBuffer (GL_ELEMENT_ARRAY_BUFFER, 0);
glDrawElements (GL_TRIANGLES, uiCount, GL_UNSIGNED_INT, &m_pIndices[uiFirstVertex]);

It seems, as if the driver doesn’t like indices to lay in video memory.

This is on a Geforce 7600 GT, Forceware 93.71.

On an ATI Radeon X1600 Mobility it runs just fine. Tried several different setups, changing the order of execution, etc. didn’t change anything. The moment i put the Indexdata into a VBO it crashes on nVidia.

Thanks,
Jan.

Does this is true: uiFirstVertex + uiCount <= iIndices ?

Are VertexAttribPointer’s correctly specified?

Have you tried same code using Vertex Arrays - skip bindbuffer/bfferdata stuff, just use real memory pointers, instead of NULL. Is then your code working?

Try drivers 84.21
http://www.nvidia.com/object/winxp_2k_84.21.html

One thing, if you don’t use glDrawRangeElements, the driver may have to copy your indices back to system memory to examine them.
I’m not sure if this is entirely true, but it’s the assumption I’ve always made…hence I never use glDrawElements.

Why would driver want to examine them?

@martinsm: I check for uiFirstVertex + uiCount <= iIndices, and it is true.

VertexAttribPointers should be correct, right now i only use position, and it works, if indices are not in a VBO, so it should be ok.

I did try it without VBOs, with simple vertex arrays, though i didn’t test that on nVidia, only on ATI. Maybe i’ll try it later again.

@mfort: Why should i use this driver? Is there any significant difference?

@knackered: I know, i just use glDrawElements right now to rule out the possibility, that i might specify wrong lower and upper bounds.

Jan.

Originally posted by Jan:
I did try it without VBOs, with simple vertex arrays, though i didn’t test that on nVidia, only on ATI. Maybe i’ll try it later again.
And did it work?

@Jan: Yes, I had to downgrade NV drivers to 8x due to PBO related crashes in 9x drivers. 8x was OK. Probably they changed something. I believe VBO are implemented the same way as PBO.

I had this issue a couple of times (although I didnt try ATI). Both times I had messed up the glVertexPointer stuff (saying I had 3 components per element when I actually uploaded 4 causing a memory overrun, or similar). Worked fine without index VBO’s, crashed right away with.

Originally posted by martinsm:
Why would driver want to examine them?
I think it needs to know which vertices the indices reference, and this information becomes part of the command stream to the GPU for it to DMA the correct block of vertices if necessary.

Sometimes I saw similar thing, floating point exception in nvoglnt.dll on a draw call (independent of glDrawElements or glDrawRangedElements). Unfortunately, this was really hard to reproduce, this crash is not stable. But all the indices and vertices were fine.

Masking floating point exceptions, surely, helps, and everything draws as usual, but sometimes, when unmasking them, exception occures.

But, as I understand, it is not your case.

Originally posted by knackered:
[quote]Originally posted by martinsm:
Why would driver want to examine them?
I think it needs to know which vertices the indices reference
[/QUOTE]Analyzing the indices on the cpu during call that is not recorded into display list would be counterproductive. When using the glDrawElements the driver can always assume that all vertices inside the buffer are referenced by the indices when it needs that information somewhere.

In the past I had some VBO related crashes. I am not sure if I remember that correctly however the crash happened when several VBOs containing different amount of vertices (based on stride and buffer size) were bound simultaneously during draw call.

This happens on the first run through that code?

for (int i=1; i < max; ++i)
{
  glEnableVertexAttribArray (i);
  glVertexAttribPointer (i, ...);
}
glEnableVertexAttribArray(0);
glVertexAttribPointer (0, ...);

If max changes, do you also have the matching code disabling the arrays not used anymore or are there arrays left enabled you don’t use in following runs.

BTW, if you want to set attrib 0 last, just invert the loop counter => two lines of code eliminated.
The indices should really be unsigned int.

For a piece of paranoia, I’ve seen weird things happening with macros:

BUFFER_OFFSET (uiFirstVertex * sizeof (int)));

Do not put a white space between the macro and the bracket if the macro is defined as BUFFER_OFFSET(_X)

Originally posted by Komat:
Analyzing the indices on the cpu during call that is not recorded into display list would be counterproductive. When using the glDrawElements the driver can always assume that all vertices inside the buffer are referenced by the indices when it needs that information somewhere.
Well that gem was touted on these forums some time ago, when people were discouraged from putting indices into VBO (or VAR maybe). The reasoning was that the cpu did indeed walk the indices to gather the range information.

I wasn’t able to try it with downgraded drivers, yet, though i don’t think trying it would make sense, since it is most certainly not a driver issue (else Doom 3 etc. wouldn’t run on that hardware) and also it just needs to run on all drivers, since it is for university and i have no control over the drivers installed there. So even if it was a driver issue, i would need to work around it.

It doesn’t seem to be related to floating-point exceptions.

@Komat: I heard one can bind several VBOs simultaniously, however i don’t know how that can actually be achieved. I don’t use it, all my data is in one VBO (and indices in a second one).

@Relic: This happens every time i bind the VBO. Afaik i need to set up all the arrays everytime i bind a new VBO, since glBindBuffer resets all vertex array state. Or does it set the vertex array state to how it was, when the same VBO was bound last? In my opinion the spec quite clearly states the opposite.
I know i could invert the loop, that was just a quick and dirty way to make sure array 0 was bound last.
About the paranoia: I changed it to what you said, however there seems to be no difference.

So far i haven’t been able to get it to work on nVidia. I think for now i will work around it, by leaving the Indices in system memory and in a few days i will write a small test application to try to make it work on nVidia. I have looked and rearranged the code a hundred times now and cannot find anything, that might make it break. But then we have all been there before. I don’t believe it’s a driver bug.

Thanks for your suggestions,
Jan.

binding several buffers:-

for (int i=1; i < max; ++i)
{
        glBindBuffer (GL_ARRAY_BUFFER_ARB, m_uiVertexBufferID[i]);
	glEnableVertexAttribArray (i);
	glVertexAttribPointer (i, ...);
}

The bind command simply states that any gl**Pointer functions from now on refer to offsets within this buffer.

Well that gem was touted on these forums some time ago, when people were discouraged from putting indices into VBO (or VAR maybe). The reasoning was that the cpu did indeed walk the indices to gather the range information.
First, it was only ever nVidia who suggested this. ATi was decidedly neutral on the subject.

Second, I would hope by now that nVidia has dealt with whatever hardware issue provoked this rather silly usage pattern.

thanks for clearing that up, korval.
much appreciated.

Ok, did some thorough tests, here are the results:

On ATI everything works just fine.

On nVidia, using VBOs for vertex-data always works fine. Putting index-data into a VBO works fine for a limited amount of data. Speed is ok, it’s stable and all.

However, in my real-world app, i have 2 mio vertices, and around 700K triangles. Putting that much data AND the corresponding indices into VBOs crashes on nVidia.

That is, for now i put the vertex-data into ONE VBO and the index-data into ONE VBO. I don’t split the data into smaller chunks.

So i made a test-app. My test-app uses quads. One quad = 4 vertices = 2 triangles = 6 indices (unsigned int).

With 260K quads it runs “fast” and stable on a Geforce 7600, with latest official drivers (from November). I render it using one drawcall.

With more than 260K quads it crashes, when the indices are in a VBO. It works fine, even with 1M quads, when the indices are in RAM.

So i switched from rendering triangles to rendering quads, decreasing the number of indices needed per quad. The limit stays at 260K quads. So that seems not to matter.

My vertex-data consists of position and color information. Now i increased the VBO size by putting fake texcoords and normals into the array. I don’t fetch that data from within the shader (which just passes the color through), but it more than doubles the per-vertex data. The limit stayed at 260K quads, it did not decrease. So i assume it is not an issue regarding memory consumption.

So what i tried was:

  1. varying number of vertices
  2. varying number of indices
  3. varying size of per-vertex data
  4. disabling on GPU indices

2 and 3 seem to have no effect. 4 fixes the problem completely. 1 is the problem: if you have few enough vertices, it works, if you reach some limit, it crashes.

I uploaded my test app here.

So, you can try it yourself. I included the source-file containing the buffer-setup and the rendering code. It is a bit complicated, because of the several options and the quick-and-dirty implementation, but i am pretty sure it is error-free. If you see an error, please tell me. Would be cool, if you could give me some feedback, whether it runs on your hardware.

If you don’t see anything after loading, you need to look around, using the mouse, until you see the hippy-quads.

Any other hints / advice is very welcome.
Jan.