Suggesting: add a GL_INSTANCE_ARRAY_BUFFER

obidobi · May 30, 2011, 7:26am

I’m missing a way to use an instance index array when drawing instances.

Lets say I have 1000 instances of a static geometry.
I have buffer with the static positions of each instance.

Instead of using glVertexAttribDivisor to determine how often the index of the instance attribute will be incremented I want real control by using an Instance index array!

Lets say I just only want to draw 800 of the 1000 instances or maybe only draw them in another order. If I could bind an GL_INSTANCE_ARRAY_BUFFER with indexes to the instance attributes this would be a piece of cake.

Why not add an new buffer type GL_INSTANCE_ARRAY_BUFFER that one could bind and use with glDraw…Instanced?
Just using a primcount and then increment thru the instance attributes is so limiting.

Or have I missed someway to draw subsets of instances with just one draw call without having to alter the instance attribute buffers?

aqnuep · May 30, 2011, 7:30am

You can do the same thing by simply binding a buffer texture and then use it as the instance data source in the vertex shader (or in other shaders if you wish) using the GLSL built-in gl_InstanceID to select the appropriate data from the buffer.

Alfonse_Reinheart · May 30, 2011, 9:29am

Just using a primcount and then increment thru the instance attributes is so limiting.

It’s limiting because instancing is an optimization. If you did what you suggest, drawing with instances would take longer, and therefore no longer be an optimization.

Right now, all instancing does is just loop over some data and bump a number. It’s fast and cheap, in terms of hardware. For what you want, every instance would now have to read from memory. This is more complicated and expensive.

You can do the same thing by simply binding a buffer texture and then use it as the instance data source in the vertex shader (or in other shaders if you wish) using the GLSL built-in gl_InstanceID to select the appropriate data from the buffer.

Well, that’s also not exactly conducive to performance.

obidobi · May 30, 2011, 11:32am

Yes but it would only be slighty more expensive and massivly more useful.

As from the specification of glDrawArraysInstanced


        if (mode or count is invalid)
            generate appropriate error
        else {
            for (i = 0; i < primcount; i++) {
                instanceID = i;
                DrawArrays(mode, first, count);
            }
            instanceID = 0;
        }

Lets say I would bind a GL_INSTANCE_ARRAY_BUFFER of type GL_UNSIGNED_SHORT. That would work as indexer for the instance attributes.


        if (mode or count is invalid)
            generate appropriate error
        else {
            for ( short i : instanceIndexArray ) {
                instanceID = i;
                DrawArrays(mode, first, count);
            }
            instanceID = 0;
        }

instead of GL just incrementing i it would increment the pointer to the instanceIndexArray and use that value. Ofc there will be one more byte/short or int fetch from memory. But think of how dynamic the control of what instances to draw would be.

I could just update a short index array each frame to decide the order or what instances to draw.

if I would like to do that now I would have to update the entire position instance attribute buffer. To just include the instances i would like to draw and in the order i want them drawn. If I would use many more instance attributes there would be even more data that would have to be updated. At some point it would probably be more efficient with multiple draw calls.

The indexing would work just like the vertex attribute indexes. Do incremental if GL_ELEMENT_ARRAY_BUFFER isn’t bound else use indexing. I think a GL_INSTANCE_ARRAY_BUFFER would make drawing multiple instances of same geometry vastly more useful.

aqnuep · May 31, 2011, 4:33am

That is not quite true. From performance point of view there is little to no difference whether you use ARB_instanced_array or you use ARB_draw_instanced and use a buffer texture to source the instance data.

In fact, I’ve tested it last year in one of my demos and on older GPUs (e.g. on my old HD2600XT) instanced arrays were a little faster but only with about 10% but on newer GPUs (e.g. on my current HD5770) there was no performance difference between the two.

Of course, if you really want full flexibility, you need to add one more level of indirection to the buffer texture access and that has its cost, however I wouldn’t say that it is a show stopper from performance point of view.

obidobi · May 31, 2011, 5:27am

So your suggestion is something like:

Have an instance attribute buffer containing the instance indexes. Then use those to fetch the data from a texture.

Being able to bind an instance index buffer and then have the data fed to the vertex shader thru attributes would be much more clean :). Feels like it should be slighlty faster to.

aqnuep · June 1, 2011, 2:27am

Something like that, however, to be more precise, we need the following to achieve what you want:

#1 Have one or more buffers containing the data corresponding to the instances
#2 Have another buffer that actually contains a list of indices into the instance data buffer(s).

You can attach buffer(s) #1 as buffer textures and you can attach buffer #2 either as an instanced array (the indices will be available in the shader in the form of an attribute) or a buffer texture (the indices can be fetched from the buffer texture based on the value of gl_InstanceID). In both cases you can finally fetch the instance data from the buffer(s) #1 based on the determined index.
AFAIK there is no other way in current hardware to solve this problem.

You are right, that it may be more convenient to simply bind an instance index buffer, however I’m unsure whether OpenGL is for providing higher level APIs that hide the underlying implementation. Modern OpenGL is about exposing hardware capabilities in the most general way possible and I don’t think they would like to bloat the API with redundant functionalities.
About performance, I’m not convinced that any API abstraction would provide any performance benefit as from hardware point of view the two approaches would be equivalent.