I’ve implemented and tested geometry instancing on a bunch of simple, 4 polygon tree models. I’m packing the tree positions in a texture buffer. My 285 takes close to 16 ms a frame to render about 10 batches of 5,000 trees.
I’ve also merged all the tree geometry together, so I have the same number of batches, but I’m just calling the good ol’ glDrawElements. If the tree groups are compiled into display lists, the frame time is just 5 ms. If not, then the frame time is about 10 ms.
So, in this simple case, I’ve found that instancing performs worse than straight GL calls. Have others reached a similar conclusion? Is there a batch size or model polygon count for which instancing would outperform regular GL calls for static objects?
geforce 285. I don’t see any reason to test with a lesser card.
I’m using the instancing described in EXT_draw_instanced. I store the per instance data in a texture buffer object (EXT_texture_buffer_object) and access it in the vertex shader with gl_InstanceID.
Yes, 5,000 instances in a single draw call and I’m making 10 calls. It’s only 20,000 triangles per call though. I need to test with a more complex model.
I don’t have the code in front of me now, but I doubt it’s using VAOs.
Wait, each instance is only 4 triangles? The per-instance overhead is what’s going to dominate performance there. You should at least have enough triangles per instance to fill up the post transform cache.
I am not entirely certain whether this is “normal behavior” and whether it is so still today, but i think when i used instancing i discovered that rendering x instances is fine, and rendering >x instances became slower again.