glDrawElements vs glDrawElementsInstanced: When to switch?

paradoxresolved · January 31, 2016, 8:14am

Hello OpenGL experts!

I recently asked about this topic as an add-on to another (related) post, but I thought the answers to this might help other struggling would-be OpenGL coders.

This issue regards the relative cutoff between glDrawElements and glDrawElementsInstanced, or simply the cutoff between when you draw things as a single object versus batch them and send them through a single DrawInstanced command.

I’ve heard that if you have more than, say, 1000 objects then it is better to instance draw them using something like glDrawElementsInstanced. Naturally, I assume this cutoff point is a soft one, such that it’s not like 999 is ok but 1000 is right out. That said, what factors affect this? Specifically:

 How does the poly size of the object in question affect this cutoff?  Each card will vary, of course, but is there a general formula that will us put is in the ballpark?

 What about if you have, say, 500 of one type of object and 500 of another?  What if your scene contains 20 different types of objects, and each one appears 200 times?

 Which would be faster: a glDrawElement call for an object with 1,000,000 polys or a glDrawElementsInstanced call for 1,000 objects each having 1,000 polys?

Thanks!

Alfonse_Reinheart · January 31, 2016, 10:29am

How does the poly size of the object in question affect this cutoff? Each card will vary, of course, but is there a general formula that will us put is in the ballpark?

The closest to a general recommendation I’ve seen is somewhere between 100 and 1000 triangles per instance.

What about if you have, say, 500 of one type of object and 500 of another? What if your scene contains 20 different types of objects, and each one appears 200 times?

Then I would say that it’s not worth worrying about.

Instanced rendering is a performance optimization. Therefore, you should only employ it when performance is known or expected to be a problem. And generally speaking, 1000 draw calls per-frame is only going to be a performance problem on the lowest end CPUs. Not unless you’re doing horrific state changes between instances (changing vertex formats, changing programs, etc).

Which would be faster: a glDrawElement call for an object with 1,000,000 polys or a glDrawElementsInstanced call for 1,000 objects each having 1,000 polys?

Instance rendering exists to lower the number of draw calls&state changes issued by the CPU and thereby lower the CPU overhead of rendering. Both of these operations represent a single draw call with a single set of state, so the CPU overhead of both are… a single draw call.

So the only thing that remains is whether the use of per-instance data (either via gl_InstanceID or instance arrays) will improve or inhibit performance relative to not doing that. It is generally assumed that instancing has some form of cost associated with it. Instanced arrays aren’t free; if they’re fetched through the same cache as per-vertex inputs, then they are taking up some quantity of cache space that could be used for other per-vertex parameters. Using gl_InstanceID to fetch uniforms from a UBO has a minor cost as well, compared to static addressing of the UBO in non-instancing cases. Not to mention, you must now use a bigger UBO, so there’s some initial overhead in copying so much data to constant registers.

Will you be able to tell the difference? There’s no way to know.