Instancing

yalmar · September 5, 2011, 7:14pm

Hi,

I need to display thousands of objects, where all the objects are the same mesh model, but each has different own orientation and size. CCurrently, I’m displaying 1000 objects using glDisplayLists, and after I iterate all the objects, applying their proper transformation, the aplication runs about 5-15pfs on a GTS450. I consider it too slow.
The mesh has 2503 vertices and 4968 faces.

Is there any way to improve this performance?

some people are talking about glDrawElementsInstancedEXT, but I never seen an concrete example of this.

help me please.

Alfonse_Reinheart · September 5, 2011, 7:47pm

The mesh has 2503 vertices and 4968 faces.

That’s 1000 objects, with ~5000 faces per object. That gives a rough estimate of 5,000,000 triangles per frame. Multiplied by 15 frames, that gives 75Mtri/sec. On a mid-grade GPU. I’m not sure you’re going to get that much more out of your GTS 450.

Instancing may help, but the purpose of instancing is to reduce setup overhead. Generally, it helps most when you’re rendering rather more than 1000 objects.

You didn’t say if you were using GL 3.x or not, so I’ll assume you are.

Instancing can generally be done one of two ways. One way is by having a uniform buffer or buffer texture or some other form of storage contain your per-instance data (in your case, orientation and size, packed as small as you possibly can). glDrawElementsInstanced will repeatedly draw the same sequence of triangles, but it will bump an instance count each time it draws a new one. This count is a per-vertex input to your vertex shader named gl_InstanceID. You use that to select which values from the per-instance storage to pick. So it would be something like this:


struct InstanceData {
  vec3 qtOrientation;  //Recover the fourth component with a square-root
  float size;
};

const int MAX_NUM_INSTANCES = 1024;

layout(std140) uniform InstanceArray{
  InstanceData instances[MAX_NUM_INSTANCES];
};

void main()
{
  vec3 qtOrientiation = instances[gl_InstanceID].qtOrientation;
  float size = instances[gl_InstanceID].size;

  //Do stuff with orientation and size.
}

Another alternative uses the same draw call, but uses vertex attributes rather than uniforms or buffer textures to store the per-instance data. This uses glVertexAttribDivisor. Essentially, if you set the divisor to the number of vertices you pass to glDrawElementsInstanced, then the attribute will only advance to the next value when an instance is finished.

The benefits of using the divisor are that you can use per-vertex attributes. These can be stored more efficiently (quaternions can be squeezed down to signed shorts, sizes could be bytes or whatever). Also, there are limits on the sizes of uniform buffers and buffer textures; there are no limits on vertex attribute sizes. There could be performance improvements as well.

The downside of the divisor approach is that the volume of per-instance data must fit within the space of 16 vertex attributes. And some of those have to be taken up by your actual mesh data.

danbartlett · September 5, 2011, 10:05pm

You could try creating lower level-of-detail (LOD) versions of the model for when the object is further away, and then choose the appropriate LOD depending on the distance from camera + the size of the object.

What does your mesh model consist of, is it a single draw call, or does it consist of multiple sub-meshes with different materials applied to each?

BionicBytes · September 6, 2011, 4:25am

You could try creating lower level-of-detail (LOD) versions

As soon as you have LOD models, then you are creating more (smaller) batches. This rather defeats the purpose of instancing and there won’t be any payback for all the effort coding it up.
However, assuming he ignores instancing as he’s only rendering 1000 instances anyway, then generally speaking having a LOD system is a good idea and will give more performance increase than any form of instancing alone.

knackered · September 6, 2011, 1:09pm

you can LOD at a coarser level than per instance. But like everyone’s saying, 1000 instances is not many instances.

that’s not true - you set the divisor to how many instances must be drawn before the attribute advances by 1. The special value of 0 means advance per vertex, not per instance (i.e. behave as a standard vertex attribute).