I am wondering how staticmeshes in UT2004 are implemented. These are identical meshes that are rotated, positioned, scaled, and sheared to fit into different places. They make up all the fine details of those nice high-poly maps.
My understanding is that it is much faster to merge all meshes into a single one, than to set the matrix and draw the vertex buffer over and over (for all identical meshes). However, that means you have a ton of redundant vertex data, and it is harder to work with. Is there any way to render instances as fast as they would draw if they were merged into one giant mesh?
In my experience you get excellent throughput if you repeatedly render the same mesh while only changing the transformation matrix in between each draw call. But it depends on a number of factors, such as the size of the mesh in question.
D3D9 has instances, where you setup another vertex stream which contains transforms…that reduces you down to one draw call for all your instances…you may want to look into that.
Originally posted by halo:
[b]I am wondering how staticmeshes in UT2004 are implemented. These are identical meshes that are rotated, positioned, scaled, and sheared to fit into different places. They make up all the fine details of those nice high-poly maps.
My understanding is that it is much faster to merge all meshes into a single one, than to set the matrix and draw the vertex buffer over and over (for all identical meshes). However, that means you have a ton of redundant vertex data, and it is harder to work with.[/b]
If I understood what you’re saying, you have one huge vertex buffer instead of multiple smaller ones. Yup, this alone complicates memory management IMO. And obviously you end up with duplicated meshes, so not only is granularity worse, you also spend more total memory.
If that’s a win, it’s a win because it removes the need to switch vertex buffer bindings and you end up issuing less draw calls. I wouldn’t be so sure that this tradeoff is viable for an OpenGL based renderer.
I wouldn’t even believe that this is viable for a DirectX Graphics based renderer if you go all-out on the technique … if you brute-force static mesh rendering this way, you give up the ability to cull individual meshes. To get around this, you’d either have to make your index buffer(s) dynamic, or rebind index buffers instead of vertex buffers. Then you’re effectively back to square one plus crud.
Trading a gob of redundant vertex transforms versus matrix updates is a pretty whacky solution IMO. If you’re going to do this under OpenGL, I’d suggest at least trying the “naive” approach. Benchmark it. My gut feeling tells me that it’s going to be faster, and simpler to boot.
Originally posted by halo: Is there any way to render instances as fast as they would draw if they were merged into one giant mesh?
“Faster” doesn’t quite cover it all IMO. You’re saving time spent in the driver, which is a neglibible cost in OpenGL anyway. OTOH you’re doing a lot of flat-out superfluous work on the hardware side. I wouldn’t call that “faster”.
It may be faster when you do the comparison on DirectX Graphics. But that doesn’t mean much.