Indexed Instancing

Jan · May 12, 2010, 3:02pm

Afaik there is an OpenGL extension that allows to bind a vertex-stream that is not sampled per vertex, but per instance. Though atm i can’t find the extension. The functionality definitely exists in D3D11.

Though the way D3D implements it, it is mostly useless. There is a drawcall similar to glDrawElementsInstanced, only that you can not only define the number of instance to render, but also with which instance to start.

So you could have a vertex-stream that defines the world-matrices for 100 instances, and then you could just say “render instances 23 till 37”.

The problem with this approach is, that this is still very coarse. It does not work well with viewfrustum-culling.
It would be much more interesting, if i could instead specify something like an index-buffer, but for which instances to use.

So i could store all the instances on the GPU, with their modelview-matrices, etc., cull them on the CPU and just fill an index-buffer, which instances to draw. That way i could then render instances 23,26,29,33,37 in one drawcall without reverting to using dynamically filled uniforms.

I don’t know whether there are already built-in shader variables to handle such cases, but if such a drawcall is made, one would need two different variables: gl_InstanceID that just counts which instance is currently rendered, and maybe something like gl_InstanceIndex, that would then say “although you are the 3rd instance to be rendered, you are actually instance 29 in the vertex-array”.

IMO such functionality would be extremely useful, because one often renders many identical objects, only in different locations. Till now instancing is very cumbersome to use, because one usually has to dynamically manage data just to define WHICH instances to render. With such a simple method one would only need to dynamically create the index-buffer, and maybe change the instance vertex-streams, when the position of an instance changes.

Jan.

Jan · May 12, 2010, 3:11pm

Here is the extension that i meant:

http://www.opengl.org/registry/specs/ARB/instanced_arrays.txt

Dark_Photon · May 12, 2010, 5:44pm

What about something like this serializing instance attributes for ARB_instanced_arrays?

Jan · May 12, 2010, 6:19pm

Interesting, indeed, but it does not invalidate what i proposed.

The problem with this technique is, that it does the frustum-culling on the GPU. Therefore it is much more limited. I cannot implement every wicked optimization structure to run on the GPU. So, yes it’s one possibility, but again it is only a work-around to get instancing to actually do the thing that i proposed above, only with more restrictions.

Jan.

BionicBytes · May 13, 2010, 4:26am

So i could store all the instances on the GPU, with their modelview-matrices, etc., cull them on the CPU and just fill an index-buffer, which instances to draw. That way i could then render instances 23,26,29,33,37 in one drawcall without reverting to using dynamically filled uniforms

you can achive something like this using 2 Texture Buffer Objects - one to store all per-instance data, the other is to be filled per frame with the visible indexes.
The advantage of TBo is that the array sizes can be huge and my testing shows that uploading index data to a buffer object is very quick - see “instancing sucks?” thread on the advanced forum.

You could also use Uniform Buffer Objects instead - by they seem to have much smaller sizes and as a result take more CPU dycles to dynamically populate the Buffer Objects.

I agree it would be nice to have something more black-box rather than having to manage all the Buffer objects - but at least their usage is flexible.
By the way, I had been playing with the ARB_Instanced_arrays and I wanted to store a modelmatrix as part of the instance data. The annyoing thing is that
the generic vertex attribute pointers only allow a maximum of 4 components per vertex stream, so it takes 4 vertex streams to specify a matrix as a per-instance attribute.
More useful would be to use just the one glVertexAttribPointer, but allow for any number of components (not just 1,2,3,4, BGRA), eg 16 - so that we can send a modelmatrix.