GL_NV_occlusion_query and portal rendering

Hi, I changed my portal renderer from basic
front facing portal -> draw adjacent cell.
to this :
draw the cell.
for each portal of the cell : launch an occlusion query test.
for each pixel-visible portal : draw adjacent cell.
It dropped from 70 fps to about 5 fps !!!
Can someone infer some reason for this tremendous decrease ?
Thank you !

What card are you using? AFAIK this feature is HW-accelerated on GeForce3/4.

kon

I assume when you say “launch an occlusion query test”, you mean you render the bounding box of the neighbouring cell, and not the whole geometry of the cell? (with occlusion query switched on)

So it is a GeForce4 Ti 4400.

for the occlusion query
I draw the portals (which are convex polygons)

While I cant say I have even looked at the specs for the occusion query extension, Im just going to make some wild speculation based on other knowledge.

My suspicion is that the occlusion query works similar to fence, in that you can tell it to check the polygon and give it an id tag, then you can do other stuff and come back later and query it by the id (“by the way, what were the result of that polygon I wanted checked for occlusion.”). If this is the case, I suspect what you are doing is requesting the card to test the polygon, sitting around and waiting for the results, and then proceeding once you get your answer. This effectively results in a very bad and nasty (ie: slow) pipeline stall. What you need to do is find some way to make the query, do some other stuff (maybe AI or physics, or doing some drawing for a different part of your world) then come back once the occlusion test is done and pick up where you left off.

Originally posted by LordKronos:
WWhat you need to do is find some way to make the query, do some other stuff (maybe AI or physics, or doing some drawing for a different part of your world) then come back once the occlusion test is done and pick up where you left off.

Sure, but doing this only adds work to the CPU, so that will not make it faster… and I get about 5 fps ! while GL draws between 2 and 10 polygons/portals per cells !
But perhaps, occlusion_query is not suited for portal rendering ?
I don’t see a way not to wait for the query results after I have drawn one cell.
Has anybody experiment with GL_NV_occlusion_queries and got good frame rate ?

Originally posted by isobel:
Has anybody experiment with GL_NV_occlusion_queries and got good frame rate ?

Yes, and LordKronos pretty much summed up how to achieve good performance by keeping your GPU and CPU work in parallel. My engine is doing anywhere from 4 to ~ 1000 queries per frame. Another way to gain performance in addition to keeping the work parallel is to not performe queries every frame, but every other frame. Or every Nth frame. Whatever floats your boat.

Originally posted by isobel:
Sure, but doing this only adds work to the CPU, so that will not make it faster… and I get about 5 fps !

Well, maybe I should have said it the other way around. Do whatever rendering you have to do, then if you still have time do some other processing. Your big holdup is the graphics pipeline, so keeping it as full as possible helps. Think of the graphics pipeline like an assembly line. You put 1 car on the line and by the time that one car is completely built, you have 100 more already started. What you are doing is equivelant to putting a car on the line and waiting until it comes off to start the next car. Highly inefficient.

while GL draws between 2 and 10 polygons/portals per cells !

Whoah nellie!!! Theres your problem…for 2 reasons. The first problem is that telling the graphics card to process a batch of 2-10 polys is pretty wasteful. That going to hurt your performance a lot, but that isnt your primary problem here.

The primary problem is that it sounds to me like you are treating your world as a set of pure convex hulls. That is NOT the way to go these days. What you want to do is break your world into sectors. Each sector has a bounding hull that is convex, but the polys in the sector are not convex. This way you can get a couple hundred to a couple thousand polys per convex hull. Thus for each portal you test, you end up drawing hundreds or thousands of polys. That will speed things up a lot.

No, here are some other optimizations you can tie into that. As someone mentioned, you can rebuild your visibility tree less than once per frame. Or, maybe you can update the tree every frame, but build the tree from the occlusion tests you performed last frame.

If you must build an up to date tree every frame, here is how I would do it. Think of your world as “levels” from your current point of view. The first level is the sector you are in and all the portals that lead to another sector. The second level is all the sectors lead to by a first level portal, and the portals out of those rooms (which lead to your third level, etc). When you start the frame, draw your first level sectors and build a list of your second level sectors and second level portals. Immediately ask the card to do occlusion test for all of your second level portals. Then while it is processing, render your second level polys. When you are done, query for the occlusion results and uses these results to build a list of your third level sectors and 3rd level portals. Then you query for the occlusion of your 3rd level portals, render the 3rd level polys while waiting, get the results of the occlusion, use it to build the 4th…you get the idea. Now, this will not be 100% perfect. You may get false negatives (where the card says a portal is not occluded, yet when you get around to drawing what is behind the portal it is occluded). However, you will NOT get false positives (where it will say a portal is occluded but it actually isnt). Thus the only downfall to this is that you will occasionally take a performance hit for drawing polys that arent visible, but that hit will be small compared to what you are seeing now, and I wouldnt worry about it.

Let’s try to be precise :slight_smile:
My world is preprocessed as follows :
Take a .map file (from quake/CS : a bunch of convex polyhedron), I compute a sector/portal representation automatically.
Sectors are generally NOT convex. and I do not compute sector’s convex hull. To know wether a particular sector has to be drawn, I draw the portal leading to the sector (firts portal lying in the sector the viewpoint is in, then recurse) with occlusion_query : if one pixel has passed depth/stencil test, then I draw the sector the portal leads to.
Basic recursing 2d clipping on protals works fine and is fast. I tried occlusion query. First experiment was very slow (5 fps).
Then I tried waht you propose. in 2 diffrent way :
1 : when rendering a sector, start occlusion_query on each of its portal. then render the sector’s geometry. then check the results of occlusion query and call the procedure recursively according to these results.
2 : the method you propose.
both method gives much better FPS, are as performant. Nut none of these gives better FPS that the original method (on CPU 2d clipping of portals)
Using occlusion query may become interesting whie the program will get more work to do on CPU, chich is not hte case at the moment (no AI, no “game” code, …)
If you have other ideas, they are welcome !
thanks.

The trick to getting good performance with NV_occlusion_query and portal rendering is to structure your code to

  • initialize the depth buffer with the static stuff in the scene
  • perform your portal queries
  • do all the cpu-related stuff you can
    – (including cpu-based portal culling)
  • check portal queries (for additional portal culling)
  • render scene

You have to structure things to work around the latency of the occlusion query.

Thanks -
Cass