Staticmeshes

I am wondering how staticmeshes in UT2004 are implemented. These are identical meshes that are rotated, positioned, scaled, and sheared to fit into different places. They make up all the fine details of those nice high-poly maps.

My understanding is that it is much faster to merge all meshes into a single one, than to set the matrix and draw the vertex buffer over and over (for all identical meshes). However, that means you have a ton of redundant vertex data, and it is harder to work with. Is there any way to render instances as fast as they would draw if they were merged into one giant mesh?

In my experience you get excellent throughput if you repeatedly render the same mesh while only changing the transformation matrix in between each draw call. But it depends on a number of factors, such as the size of the mesh in question.
D3D9 has instances, where you setup another vertex stream which contains transforms…that reduces you down to one draw call for all your instances…you may want to look into that.

Wow, that is far superior to doing it other ways.

Why doesn’t OpenGL have that?

UT2004 would not use instancing. (It was made too early - unless a patch updated it)

This was discussed recently on a OpenGL thread. There are a few points to consider:

  • (on current hardware) Instancing geometry that is > ~1000 polys is an actual performance loss.

  • The speed gain with instancing (as seen by a Humus demo) over a big buffer is greater but not massive amount (maybe 20-30%?)

  • Making less render calls is more of a D3D problem than OpenGL.

Can you point this thread out to me please? I looked a few pages back, but am not sure which one you are talking about.

Thanks!

I can’t see to find the one I was thinking about but just using the search turned up these:

http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=012239

http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011887

http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011889

Instancing was discussed in the ARB_super_buffers thread. Start here .

Originally posted by halo:
[b]I am wondering how staticmeshes in UT2004 are implemented. These are identical meshes that are rotated, positioned, scaled, and sheared to fit into different places. They make up all the fine details of those nice high-poly maps.

My understanding is that it is much faster to merge all meshes into a single one, than to set the matrix and draw the vertex buffer over and over (for all identical meshes). However, that means you have a ton of redundant vertex data, and it is harder to work with.[/b]
If I understood what you’re saying, you have one huge vertex buffer instead of multiple smaller ones. Yup, this alone complicates memory management IMO. And obviously you end up with duplicated meshes, so not only is granularity worse, you also spend more total memory.

If that’s a win, it’s a win because it removes the need to switch vertex buffer bindings and you end up issuing less draw calls. I wouldn’t be so sure that this tradeoff is viable for an OpenGL based renderer.

I wouldn’t even believe that this is viable for a DirectX Graphics based renderer if you go all-out on the technique … if you brute-force static mesh rendering this way, you give up the ability to cull individual meshes. To get around this, you’d either have to make your index buffer(s) dynamic, or rebind index buffers instead of vertex buffers. Then you’re effectively back to square one plus crud.

Trading a gob of redundant vertex transforms versus matrix updates is a pretty whacky solution IMO. If you’re going to do this under OpenGL, I’d suggest at least trying the “naive” approach. Benchmark it. My gut feeling tells me that it’s going to be faster, and simpler to boot.

Originally posted by halo:
Is there any way to render instances as fast as they would draw if they were merged into one giant mesh?
“Faster” doesn’t quite cover it all IMO. You’re saving time spent in the driver, which is a neglibible cost in OpenGL anyway. OTOH you’re doing a lot of flat-out superfluous work on the hardware side. I wouldn’t call that “faster”.
It may be faster when you do the comparison on DirectX Graphics. But that doesn’t mean much.

With DX, it is crucial that you merge things. At least, it is in BLitz3D.