What is that bottleneck

There is especially one thing that kills frame rate : it’s when we render trees. We tried two approaches, first being as in IL-2 Sturmovik (several planes rendered as layers using alpha test), second is for a closer view and we render our trees with simple geometry (we use 800 polys meshes, no billboards, only a lot of alpha tested planes for leaves). Both approaches cuts frame rate by 4 or so (from 90fps down to 10-30).

I suppose the only thing that could speed up everything would be to render front to back. Is there anything else that could improve frame rate for this specific problem ?

Another track I follow is batching :
All the planes of the 800 polys trees are different meshes, and thus are rendered with different glDrawElements, fragment / vertex program switch between each (even if they are the same). But, the frame rate is changing a lot depending on the point of view, so I suppose it comes from overdraw. Of course, it’s planned to batch everything correctly some day.

So about that front to back sorting … I try to cull each mesh at each frame, so I get a Z information after culling, if the mesh is not culled (frustum culled). I could sort according to that Z value, and I’m wondering what algorithm would be the most efficient.

1- Brute force quick sort
2- Some kind of hash table depending on Z
3- BSP on bounding volumes (geometry is static)

All suggestions are really welcome,
SeskaPeel.

I forgot to mention that our 800 polys trees have a lot of intersecting planes. Sorting these planes depending on viewpoint could become heavy computation.

I do not intend to have a perfect sort, a rough one will perfectly do the trick, as long as it is not too computationnaly heavy.

SeskaPeel.

Originally posted by SeskaPeel:
[b]There is especially one thing that kills frame rate : it’s when we render trees. …

1- Brute force quick sort
2- Some kind of hash table depending on Z
3- BSP on bounding volumes (geometry is static)

All suggestions are really welcome,
SeskaPeel.[/b]

The front-to-back thing may help if you’re fill limited. I’d suggest determining that first, because the transform/state issue may in fact be the first bottleneck. Spatial sorting may make that worse, not better, if your batches get smaller for better sorting.

On sorting, I’m not sure about #2 – I’ve never heard of such a thing but would be interested if others have. Maybe related to this is simple spatial binning, where you have N distance buckets and throw objects into those. If you go that way, you may only need to more finely sort the closest buckets.

For brute force sorting, do a google search on “radix sort.” It’s related conceptually to the spatial binning above but more general purpose.

For a more finessed approach, consider modifying your cull routine to do a small/fast sort per node of your culling tree (sorting bounding volumes of child nodes or objects). If it’s a quadtree, the sort per node is a no-brainer. If you can get your overall cull result to be close to being sorted on output without too much cost, then one or two passes of a simple sort (bubble, etc…) may be sufficient.

The third approach I’ve used (with mixed results) is the temporally-coherent sort. It’s mainly good for slow moving cameras. Keep the sorted list of visible objects from frame to frame, adding or removing objects as they change visibility (preferably inserting in the right spot). Do few passes of a bubble sort per frame to keep the list in order. But if you’re not doing back-to-front alpha, the update of the list can be fairly lazy. Z-order barely changes for forward motion and only changes significantly after 90 degrees of rotation.

Avi