EXT_multi_draw_arrays + ARB_vertex_buffer_object. = insania

I have a problem with EXT_multi_draw_arrays combined with ARB_vertex_buffer_object.

SPECS: card=NVidia QuadroFX1000 driver=56.72

I have 2 paths in my code, one uses multiple glDrawElement calls, while the other (using the EXT_multi_draw_arrays path) builds up a dynamic array full of pointers to each index list and their equivalent counts.
At the end of the scene traversal, if using the EXT_multi_draw_arrays path, I issue a glMultiDrawElementsEXT call with the dynamic index arrays.
VBO is enabled, initialised with a 1024x1024 array of vertices (type=signed short) (the indices are NOT in a VBO).
This works fine, until I increase the number of vertices by a small amount in the VBO (not dynamically, I re-run program with bigger dimensions), and add some more indices into each index list to connect them. Then huge areas of triangles start flickering on and off in an apparently random fashion, even though no data is being changed from frame to frame.
I don’t have accurate figures, but bear in mind I’m well within the UNSIGNED INT index limit.
The weird thing is, if I don’t use EXT_multi_draw_arrays OR don’t use VBO it works fine.
So, to clarrify:-

EXT_multi_draw_arrays + normal vertex array = normal behaviour
normal glDrawElements + ARB_vertex_buffer_object = normal behaviour
EXT_multi_draw_arrays + ARB_vertex_buffer_object = mad behaviour beyond a certain vertex/index count

Any ideas? Is this something experienced before by anyone?

Thanks Adrian.
By the way, the VBO is created using GL_STATIC_DRAW_ARB, and I glBufferSubData the vertices into it at initialisation (3 subdata calls, for reasons probably not of interest to anyone).

Well, still haven’t figured it out. Put guard code everywhere, checking for glerror’s all over the shop…been trying to figure out if there’s vertex/index count limits that I’m breaking…and I’m not.
Odd thing, I’m hitting some kind of performance threshold with VBO when dealing with 6mb upwards of vertex data…searched google, which throws up some similar problems people were having last year when the extension was introduced, but I would have thought it would have been fixed by now.
Out of the choice between VBO and multi_arrays, multi_arrays is giving me the best performance boost…which is obviously wrong.

Sorry knackered, I checked my code and it was glDrawRangeElementsEXT I was having problems with, glMultiDrawElementsEXT is working fine (with VAR) for me.

If you use signed short (16bit) for indices it’s cover only 32767 vertices. But you are try to render 1024x1024 grid and it take 1M vertices. Try to change indices to unsigned int (32bit).

yooyo

No, I’m using unsigned int’s for indices, signed shorts for the actual vertices (I scale them down using modelview matrix).

You are using signed shorts for vertices!!!??? It is not accelerated format. Driver must convert signed shorts to native GPU format (floats?) and pass them through vertex pipeline. Maybe it is a driver bug if your vertices are not correctly transformed.

yooyo

Unlikely. Most (all?) NVIDIA transform hardware supports signed shorts for positions.

All we can do is wait until an NVIDIA employee posts to this thread, right?

Of course it’s accelerated. Never heard such rot.
It’s a standard technique to reduce memory footprint and bandwidth usage - totally legitimate and common.
I await with baited breath an advanced response to this question…yooyo, you are excused from helping me, just go about your business.

Originally posted by zeckensack:
All we can do is wait until an NVIDIA employee posts to this thread, right?
Yeah right, as if.
I bet I’ve had loads of useful replies that have been deleted by dorbie as part of his efforts to drive me insane. He just leaves me bickering with yooyo about symantics, before eventually hitting the ‘close thread’ button.

Since I’ve done a lot of work on the open-source R100 and R200 drivers for Linux, I can say with some authority that anything other than GLfloat is not likely to get a hardware accelerated TCL path. I’d really be surprised if Nvidia or later ATI chips were any different.

Basically, you want to use GLfloat for all position type data (glVertex, glNormal, glTexCoord, etc.) and either GLfloat or GLubyte for all color type data. Other formats may or may not hit a hardware TCL path on a given platform. You can be pretty sure that if GL_NV_half_float is supported that GLhalf will also get hardware TCL paths.

Originally posted by knackered:
[quote]Originally posted by zeckensack:
All we can do is wait until an NVIDIA employee posts to this thread, right?
Yeah right, as if.
[/QUOTE]I wouldn’t recommend using shorts for vertex positions, but this does sound wierd. If you send me your example I can take a look at it.

-Simon (firstinitial+lastname@nvidia.com)

I’d really be surprised if Nvidia or later ATI chips were any different.
Then be surprised. R300 hardware can handle all kinds of other formats (with specific alignment requirements). They have a document on their website that specified which ones. NV30 offers some alternative formats as well.

What is TCL?

You have 1024x1024 grid of vertices and this is exactly 1048576 vertices. VBO is something like extension of VAR. VAR on your hw have a limit of 1048576 vertices. When you add few more vertices you are pushing over the limit. Please check EXT_draw_range_elements and take a look to Proposal/Motivation…

Maybe Im wrong again…

yooyo

All I will say at this moment, as far as the short vertex format, is “jwatte told me to do it”. He advised, well…must be a few years ago now, that I use signed shorts for model vertex data and use the modelview matrix to rationalise the numbers, which seemed like a bloody good idea at the time. What I didn’t realise was, in this “optimise for doom3” climate, the hardware vendors have started dropping legitimate vertex formats to gain microseconds on the competition. OpenGL is rapidly turning into glide. Either glide or a GL miniport.
NVidia, why don’t you just give us a header with all the un-omptimised GL types removed?
Save lots of confusion, I think.

the hardware vendors have started dropping legitimate vertex formats to gain microseconds on the competition.
No, they haven’t. They never really supported them to begin with. The drivers just emulated it because things like VBO didn’t exist, so a copy had to happen anyway.

Old hardware could only handle float values. Only with modern hardware do we see the ability to use non-float vertex attributes.

knackered:

It’s not “optimize for Doom3” it’s “pick which paths you want in hardware”. When you were advised to use unsigned shorts, it was probably when TCL (TCL = transform, clip, and light) was done in software. The level of optimization for different paths was relatively equal in that case. When you only have enough chip space to implement a subset in hardware, you pick the common case. For 90% or more of commercial apps, the common case is single precision floats.

The other formats are still supported, they’re just not as fast.

The other formats are still supported, they’re just not as fast.
Another misconception.

ATi hardware supports plenty of non-float formats just fine; they have a document available that specifies which ones. However, they has some restrictions on alignment that make it a bit difficult to use some of the smaller formats. nVidia hardware, supposedly, supports some non-float formats, but they haven’t told us which ones they support.

@knackered

Im just curious… did you fix the bug?

yooyo

Korval:

Do you have a link to that document? I’d be really interested in reading it. :slight_smile: Might help with the open-source driver.