glVertexAttribDivisorARB

I read the instanced_arrays spec here a few times, and it seems simple enough:
http://opengl.org/registry/specs/ARB/instanced_arrays.txt

The import part seems to be calling glVertexAttribDivisorARB() to make the vertex attribute into an instance attribute. Here is my code:

glEnableVertexAttribArray(index)
glVertexAttribDivisorARB(index,1)

I get an unhandled memory exception when glVertexAttribDivisorARB() is called. The same thing happens if I use 0 instead of 1 (which should just be the default mode).

Has anyone used this extension successfully?

What hw are you using?

It’s not advertised on my G80 (182.06).

GEForce 8800 GTS.

The function pointer isn’t null.

Guess it needs to brew a little longer :wink:

Everything I have been told during the development of the two styles of instanced rendering for OpenGL indicates that stream frequency division was a native hardware feature on DX9 class parts such as GeForce 6 and 7, but not 8 and later. The intent of the attribute divisor extension was that it would be exported on parts that supported it natively, but that it would not be present on the newer parts which do not. Restated - it’s meant to be a GL2.1 extension only. GL3 capable parts don’t have hardware attribute frequency divisors any more… I am told.

(I have no idea what happens when you run a Windows DX9 app which uses that functionality on a newer part, there must be some kind of emulation occurring at the driver level if the behavior is not natively supported.)

My expectation would be that this extension is not going to work on G80 or later for these reasons. Seeing a non-NULL function pointer may indicate a bug rather than a sign of hope for that to be fixed.

What is the correct way to perform instanced rendering in OpenGL? ATI supports the instanced_arrays extension, and NVidia supports the bindable uniform extension. What is the correct way to upload the instance matrices to the GPU for instanced rendering? I have also tried a uniform array of mat4’s, but could never get a straight answer on how to determine the maximum array size:
http://www.opengl.org/discussion_boards/…true#Post250239

How about a texture buffer?

http://www.opengl.org/registry/specs/ARB/texture_buffer_object.txt

You could use instanced arrays for GL2 and texture buffers for GL3.

I would prefer a correct/official method that someone has actually put some thought into. Instanced rendering isn’t an obscure technique.

I just tested and I can verify that the max matrix array size is NOT given by GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB/4, as that causes a linking error.

Apparently ATI just added the bindable uniform extension, but their implementation crashes my program. I guess they are moving in the same direction, though.

Or he could use instanced arrays only (once it is ready, if at all) and let the driver determine what technique to use to emulate this functionality under the hood (let’s say texture buffers would be one of them).

Anyway, using a functionality that will not be supported on newer hardware is not very promising…

AFAIK there is no “official” way, sadly. Also you can’t just use GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB/4 matrices, because of packing most GPUs won’t be able to put all the data in there. Also most drivers need some uniforms for themselves to upload some data behind the scenes, i suppose.

I was just thinking, doesn’t GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB actually give you the number of FLOATS ? In that case, it needs to be GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB/16 anyway.

In one project i calculate the number to use something like this:
GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB-100/16. But that also depends on how many other uniforms you need.

Jan.

http://opengl.org/registry/specs/ARB/draw_instanced.txt is the more recent style for GL3 capable hardware. It doesn’t exist on pre-GL3 hardware.

http://opengl.org/registry/specs/ARB/instanced_arrays.txt corresponds to the frequency divisor capability available on pre-GL3 hardware.

There is a split in extensions because there is a split in hardware capability.

Anyway, using a functionality that will not be supported on newer hardware is not very promising… [/QUOTE]

It all depends on what your user base is like and how many code paths you want to develop and test. Some developers want to have a way to leverage the capabilities of each generation, this is one of those odd corners where generation N had a feature that N+1 lacks.

The techniques I investigated are the following:

Uniform Array
+The easiest to use.
+No extensions to check.
-No way to determine max size.
-Much smaller capacity than bindable uniforms (~60 mat4s)

Instanced Arrays
I was told once this was the proper way to do instancing, but obviously this functionality is no longer supported.

Bindable uniform
+Large capacity (~1024 mat4s)
+Precise determinable maximum size.
-As of yesterday, supported by ATI, but so far just crashes.

So I am going to do as I did before I started looking at this again: Use a bindable uniform on NVidia cards and just disable instancing on ATI cards, until their bindable uniforms are working. The determinable size is the most important advantage of these.

Yep, exactly the same situation in d3d9/10 - stream freq divisor API in 9, input layout objects in 10.

The closer the fit to the hardware the better, even if it means an occasion departure from a tried and true path.

I just looked at what the “input layout objects” do and it seems pretty much the same thing as the freq divisor scoped down to only support per-instance data. Let’s say you want to use the Instanced Arrays approach to provide one stream of data (e.g. instance positions) which varies per-instance. The input layout objects does just that (and only that).

Instanced Arrays used in the same manner as the DX10 input layout objects also have the nice bonus of not imposing the limit on the number of instances per batch that the draw_instanced extension imposes via the max size of uniform arrays.

Also, when I tested the draw_instanced extension, I also noticed that accessing a uniform array with gl_InstanceID is quite slow. I suspect that the DX10 input layout object does not suffer from this slowdown.