uses of occlusion queries.

Im designing an occlusion culling system for a game atm. One thing is that it needs to be pretty general, since I am aiming for seamless transitions from indoor areas to large outdoor ones.
I thought about doing portals, but I dunno… Im currently favouring a pvs system, just precompute the visibility of “cells” from all visitable points on a map, and store it for reference at run time.
But then I though, why not just do it on the fly using arb_occlusion_query? occlusion queries are pretty cheap as far as i know, its just that their latency is pretty ****e, but this can be worked around pretty easily with a good scheduling system, say if you send out the occlusion query a frame before it is needed, stuff like that.
So yeah, my question is, do you guys think it would be an efficient way to do run time heirarchial occlusion queries? Like, test bounding boxes of “cells”, then if a cell is visible, proceed to cull away things inside the cell using just furstum culling.

A lot of this depends on your scene content. If you can afford the CPU and can build the data structures then culling on the CPU is best. There are many types of PVS tests, it’s a bit of an art and somewhat arbitrary picking one. Packages like dPVS exist to save you the work. They’re effectively free (as in beer) for non commercial use.

Interiors culling exterious and vice versa (through doors and windows) can benefit a lot from simple portals (eyepoint forms a frustum with the hole made by the door or window), and is almost always compatible with existing code you have.

Remember that GPUs these days draw a crap load of geometry IF you send it right and a simple sort saves you the fill through hardware zbuffer optimizations. It doesn’t make a lot of sense to split the scene too much for PVS unless that’s what you’ve already got to play with through state, animation or whatever. So be careful that you don’t lose simply through splitting the scene. You almost certainly want to be testing reasonably large chunks of geometry these days, and that in turn may mean a hierarchical system is of less interest. OTOH state changes can work to split your scene and culling can also eliminate these.

I have a demo on my site that renders portals with GL_ARB_occlusion_query:
http://esprit.campus.luth.se/~humus/?page=3D&id=44

Not sure how representative it is for what you’re going to do, but it may give you a hint how well it works out in practice.

First of all, you need to be heavily geometry limited for occlusion culling to make snese. Unless you have actually benchmarked and found this to be the case, don’t bother with occlusion culling.

Occlusion queries have two main problems:

  1. If you request the result prematurely, you will force a synchronisation between CPU and GPU that kills parallelism and is VERY costly.

  2. It is in general not possible to have simultaneous up- and downstream transfers on the AGP bus, so if you’re AGP limited, you’ll see a hit from interrupting the GPUS’s vertex hoovering by requesting small, four byte occlusion counts.

For these reasons, if you can spare the CPU (which you generally can if you’re geometry limited!) it’s often more efficient to do occlusion culling on the CPU just like Dorbie said. Read the dPVS manual here for an excellent overview of various occlusion culling algorithms and visibility in general.

If you DO want to try to use occlusion queries (after all, it is easier to implement than a complete software occlusion system) you need to hide the latency of the queries. This paper describes a good method for that, which in fact is extremely similar to the one I used in my Master’s thesis, Optimal occlusion testing which also dealt with hiding the occlusion query latency.

I’d say having some sort of hierarchical scene partitioning is critical to good performance on complex scenes, no matter what occlusion culling scheme you use, hardware or software. Otherwise you’ll quickly run into the fixed overhead cost of queries. Of course, it is pretty likely that you will never ever be geometry limited on modern hardware, so occlusion culling is probably not needed.

Thanks for the advice and the links to papers, Ill make sure to check it out.
However, im kind of sceptical of you saying that you wont be geometry limited. If youre not, then just throw more polygons at the scene until you are. You have to remember, all the pixel shader stuff like bump maps and all, it isnt as good as good ole polygons :slight_smile:

Although it’s not a popular opinion, I really reccomend doing a sofware culling solution until we fix the GPU/CPU/MEM latency problem. It’s just too much hastle right now to break the syncronization between the two. Check out the threads by YannL over @ gamedev.net for a more detailed working.

HOM and dPVS have some good takes on it, and of course, my paper
talks specificly about the benifits of creating a software rasterizer to create your depth buffer for occlusion tests.

~Main

Colt “MainRoach” McAnlis
Programmer
http://www.badheat.com/sinewave

Originally posted by MrShoe:
Thanks for the advice and the links to papers, Ill make sure to check it out.
However, im kind of sceptical of you saying that you wont be geometry limited. If youre not, then just throw more polygons at the scene until you are. You have to remember, all the pixel shader stuff like bump maps and all, it isnt as good as good ole polygons :slight_smile:

Well, you said this was for a game and in that case, you need someone (preferrably a talented artist) to actually MODEL all of those polygons. My point was just that content creation is likely to become a bottleneck long before transform rate is. But you obviously know your bottlenecks better than I :slight_smile: The common exception to the above rule is landscapes, but then you need to optimise for many visible polygons (i.e. use LOD), since that is fairly common and typically the worst case.