View Full Version : Instancing Performance

09-11-2009, 01:21 PM
I've implemented and tested geometry instancing on a bunch of simple, 4 polygon tree models. I'm packing the tree positions in a texture buffer. My 285 takes close to 16 ms a frame to render about 10 batches of 5,000 trees.

I've also merged all the tree geometry together, so I have the same number of batches, but I'm just calling the good ol' glDrawElements. If the tree groups are compiled into display lists, the frame time is just 5 ms. If not, then the frame time is about 10 ms.

So, in this simple case, I've found that instancing performs worse than straight GL calls. Have others reached a similar conclusion? Is there a batch size or model polygon count for which instancing would outperform regular GL calls for static objects?

Alfonse Reinheart
09-11-2009, 02:10 PM
First, what kind of instancing are you using?

Second, what hardware and drivers are you using? Have you tested with other hardware?

09-11-2009, 05:02 PM
geforce 285. I don't see any reason to test with a lesser card.

I'm using the instancing described in EXT_draw_instanced. I store the per instance data in a texture buffer object (EXT_texture_buffer_object) and access it in the vertex shader with gl_InstanceID.

Alfonse Reinheart
09-11-2009, 05:28 PM
My 285 takes close to 16 ms a frame to render about 10 batches of 5,000 trees.

Are you rendering 5,000 instances in a single draw call, and you're making 10 of them, or are you rendering 10 instances in 5,000 draw calls?

Also, since you have a GL 3.x-capable card, are you using VAOs for your vertex data?

09-11-2009, 05:45 PM
Yes, 5,000 instances in a single draw call and I'm making 10 calls. It's only 20,000 triangles per call though. I need to test with a more complex model.

I don't have the code in front of me now, but I doubt it's using VAOs.

Alfonse Reinheart
09-11-2009, 06:00 PM
Wait, each instance is only 4 triangles? The per-instance overhead is what's going to dominate performance there. You should at least have enough triangles per instance to fill up the post transform cache.

09-12-2009, 02:17 AM
I am not entirely certain whether this is "normal behavior" and whether it is so still today, but i think when i used instancing i discovered that rendering x instances is fine, and rendering >x instances became slower again.

I rendered more complex models though.