I get an unhandled memory exception when glVertexAttribDivisorARB() is called. The same thing happens if I use 0 instead of 1 (which should just be the default mode).
Everything I have been told during the development of the two styles of instanced rendering for OpenGL indicates that stream frequency division was a native hardware feature on DX9 class parts such as GeForce 6 and 7, but not 8 and later. The intent of the attribute divisor extension was that it would be exported on parts that supported it natively, but that it would not be present on the newer parts which do not. Restated - it’s meant to be a GL2.1 extension only. GL3 capable parts don’t have hardware attribute frequency divisors any more… I am told.
(I have no idea what happens when you run a Windows DX9 app which uses that functionality on a newer part, there must be some kind of emulation occurring at the driver level if the behavior is not natively supported.)
My expectation would be that this extension is not going to work on G80 or later for these reasons. Seeing a non-NULL function pointer may indicate a bug rather than a sign of hope for that to be fixed.
What is the correct way to perform instanced rendering in OpenGL? ATI supports the instanced_arrays extension, and NVidia supports the bindable uniform extension. What is the correct way to upload the instance matrices to the GPU for instanced rendering? I have also tried a uniform array of mat4’s, but could never get a straight answer on how to determine the maximum array size: http://www.opengl.org/discussion_boards/…true#Post250239
Apparently ATI just added the bindable uniform extension, but their implementation crashes my program. I guess they are moving in the same direction, though.
Or he could use instanced arrays only (once it is ready, if at all) and let the driver determine what technique to use to emulate this functionality under the hood (let’s say texture buffers would be one of them).
Anyway, using a functionality that will not be supported on newer hardware is not very promising…
AFAIK there is no “official” way, sadly. Also you can’t just use GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB/4 matrices, because of packing most GPUs won’t be able to put all the data in there. Also most drivers need some uniforms for themselves to upload some data behind the scenes, i suppose.
I was just thinking, doesn’t GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB actually give you the number of FLOATS ? In that case, it needs to be GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB/16 anyway.
In one project i calculate the number to use something like this:
GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB-100/16. But that also depends on how many other uniforms you need.
Anyway, using a functionality that will not be supported on newer hardware is not very promising… [/QUOTE]
It all depends on what your user base is like and how many code paths you want to develop and test. Some developers want to have a way to leverage the capabilities of each generation, this is one of those odd corners where generation N had a feature that N+1 lacks.
Uniform Array
+The easiest to use.
+No extensions to check.
-No way to determine max size.
-Much smaller capacity than bindable uniforms (~60 mat4s)
Instanced Arrays
I was told once this was the proper way to do instancing, but obviously this functionality is no longer supported.
Bindable uniform
+Large capacity (~1024 mat4s)
+Precise determinable maximum size.
-As of yesterday, supported by ATI, but so far just crashes.
So I am going to do as I did before I started looking at this again: Use a bindable uniform on NVidia cards and just disable instancing on ATI cards, until their bindable uniforms are working. The determinable size is the most important advantage of these.
I just looked at what the “input layout objects” do and it seems pretty much the same thing as the freq divisor scoped down to only support per-instance data. Let’s say you want to use the Instanced Arrays approach to provide one stream of data (e.g. instance positions) which varies per-instance. The input layout objects does just that (and only that).
Instanced Arrays used in the same manner as the DX10 input layout objects also have the nice bonus of not imposing the limit on the number of instances per batch that the draw_instanced extension imposes via the max size of uniform arrays.
Also, when I tested the draw_instanced extension, I also noticed that accessing a uniform array with gl_InstanceID is quite slow. I suspect that the DX10 input layout object does not suffer from this slowdown.