Good strategy for nv_occlusion_query and Octree data structure

Hi there,

I am looking for a good strategy to hide the nv_occlusion_query latency when displaying groups of triangles organized within an octree structure.

For the time being, I have been successful in hiding the latency by sending query and processing the results one frame late. The drawing of the data simply hide the latency.
The problem is obviously that I am getting a one frame delay which is clearly visible when a large occluded disappears in the following frame.

Thanks

Eric

Originally posted by e1mffel:
[b]Hi there,

I am looking for a good strategy to hide the nv_occlusion_query latency when displaying groups of triangles organized within an octree structure.[/b]

The latency is annoying but it’s generally sub-frame there’s not much choice.

Assuming you’re rendering your octree by traversing front-to-back and your intended granularity of vis-testing is at the octree block/cell level, you can try organizing your algorithm more or less as follows:

Cull:

  1. generate linear list of potentially visible octree cells in f-to-b depth order.

Draw:

for each i (in list of cells)

  1. render simple test geom for cell[i+n] w/nv-occlusion and w/o writing to the fb
  2. if (cell[i] is visible or no test result) render details for cell[i] normally

Tune ‘n’ to be as small as possible (>0) such that your test results come in just in time.

Personally, I don’t prefer the render/test/render appproach due to frequent state changing and batching issues, but if you’re already rendering using your octree, that overhead may already be factored into your pipeline.

You also have the opportunity to use the octree itself as a test object, i.e., render the cell walls instead of the contents or their individual bounding boxes. With a very sparse database, this won’t buy much. But if you have whole cells hidden behind buildings, for example, it may be sufficient and much faster/cheaper. I don’t know if it would be beneficial to render all of the cells this way or just the leaf nodes (or any with geometry). You’ll have to test.

Avi
www.realityprime.com

[This message has been edited by Cyranose (edited 09-23-2003).]

Avi,

thanks for answering so quickly. I understand your method and I was thinking of doing so but I also am trying to benefit from the octree cells which can be used to cut down the number of test/drawing. As you mention, the efficiency depends on the model that is displayed. In my case, It is not possible to forecast. I am writing this code to display results of scientific computations. Sometimes there is a very large overdrawn ratio when displaying exterior and interior of the buildings. Occlusion test on the cell is there very beneficial. Sometimes I only display cut planes within the 3D volume. It is then not efficient.

I could try to implement your method with a list which is a mixte of octree cells and packs of triangles but I do not think that it will be efficient.

I also tried to display and send an occlusion test at the same time (displaying the set of triangles). Unfortunately, after testing it seems (on my portable Radeon mobility 9000) that it is much slower than working with the bounding boxes then displaying.

What batching issues/change of state did you mean? Also, do frequent changes of states cost CPU-wise?

Eric

Originally posted by e1mffel:
[b]Avi,

What batching issues/change of state did you mean? Also, do frequent changes of states cost CPU-wise?

Eric[/b]

On modern cards, it’s beneficial to pool objects of similar state (texture, lighting, shader) for rendering in large chunks using something like glDrawElements or the multi-draw equivalent. Assuming the vertices live in fast memory (agp, videomem, etc…), you can send off one call and the card will go render asynchronously, meaning the CPU and GPU can work better in parallel. Any time either the CPU or GPU is waiting on the other, you’re not running at peak.

I wouldn’t worry about state changes on the CPU, but realize that rendering lots of small objects seperately with state changes inbetween can add up and quickly become the bottleneck.

Anyway, what this means for octree is that it might be simple to traverse an octree and just render each cell as you find it to be visible, but for a deep tree with lots of cells, the cost can be prohibitive. It may be better to coalesce objects of similar state in drawing pools (statically, or slowly repacked as opposed to rebuilt every frame). In fact, it’s sometimes faster to draw those pools whole than it is to iterate and draw only the visible objects.

[This message has been edited by Cyranose (edited 09-23-2003).]

I am already using vertex_buffer_object and have my sets of triangles grouped by types (i.e. textured the same way etc.). Each group contains about 3000 triangles which is large enough for the vertex_buffer_object. I am displaying about 3 million triangles per frame. Out of those 3 millions, I am more or less getting 1.5 million overlaid. Of course I am not expecting a real frame rate but am, for the momment, getting about 5fps on my portable (T40 with mobility 9000) with about 1100 occlusion test. This is by getting back the result at the next frame.I am quite sure I can do better and will be trying to profile the code.

Eric

Originally posted by e1mffel:
[b]I am already using vertex_buffer_object and have my sets of triangles grouped by types (i.e. textured the same way etc.). Each group contains about 3000 triangles which is large enough for the vertex_buffer_object. I am displaying about 3 million triangles per frame. Out of those 3 millions, I am more or less getting 1.5 million overlaid. Of course I am not expecting a real frame rate but am, for the momment, getting about 5fps on my portable (T40 with mobility 9000) with about 1100 occlusion test. This is by getting back the result at the next frame.I am quite sure I can do better and will be trying to profile the code.

Eric[/b]

That’s a tough one. Most techniques for using NV_occlusion results within the same frame can roughly double the number of pixels tested (though less may be written to the fb) since they will render test objects of equal or greater size earlier in the cycle.

Correct me if I’m wrong, but it sounds like you’re enabling the occlusion test and rendering actual objects one frame, then skipping the object the next frame if it isn’t visible this frame. Is that right?

You may already be doing this, but using your current approach, you might try the following:

  1. Render all objects which were visible last frame (in depth order) w/vis tests (your 1100 vis tests).
  2. Test bboxes for objects or oct-cells which were invisible last frame, w/vis test results expected for later this frame.
  3. In the same order as #2, for any result from #2 that’s visible (its test should be done by now), render the actual object now and put it on the visible list for next frame.
  4. Any object from #1 that failed the vis test this frame gets removed from the visible list for next frame, but its bbox will still get tested then, so no popping.

Note: these steps can be interleaved and arranged more efficiently.

Does that make sense? Anyway, I can’t find the reference right now, but I recall seeing a paper online on large model visualization using HW occlusion.

[This message has been edited by Cyranose (edited 09-23-2003).]

This i my following flowtrace for the tree drawing

0- Generate n = (nbr of blocks of triangles + nbr of octree cells) occlusion query. This is done every time a new sets of groups of triangles with identical “states” is added to the tree

1- Mark as frustum culled all octree cells and groups of triangles belonging to an octree cell that intersect the frustum. Order the 8 cells according to the point of view.

2- Go through the tree from the root. Check current octree cell. If the cell has a pending query (i.e. a query was submitted at the previous frame and not retrieve), retrieve the query and mark the octree cell as occluded if any. If the octree cell was previously occluded, send an occlusion query. If the cell was visible, send a query every 5 frames.

2.1- if the octree cell is occluded, stop there
2.2- if the octree cell is not occluded, display the sets of triangles which are within the cell.
2.2.1- For each group of triangle, check if there is a pending query. If yes retrieve it (it was also submitted at the previous frame) and mark as occluded or not the set of triangles (roughly 3000 triangles). Display the set if visible.

2.2.2- If the cell has branches, go through the branches in accordance with the point of view i.e. front first, back last.

That’s about it. Basically, this method is very simple and while we submit a query at frame i, one only retrieve it at frame i+1. The frame-lag helps masking the latency.
The drawback if one has to generate a large number of occlusions even if most of them are not used.

I understand your method, the only problem is that it seems I am stalling the pipeline after sending the query on the bounding boxes. When you say for later seems to be not late enough in my case.
Of course, one can also trim down the checking of visibility of step 1 by only checking every 5 frames for the visible items.

I think the reference, you are looking at is related to the group at Chapel Hill which is working on the “large model walking” project. The usual sample they work with are the tanker and the plant. They are making intensive use of LOD as well. I have not implemented that yet in my code as I need to find a simple way of decimating the mesh on the fly (there has been several articles thet I remember). The number of pixels returned here can be usefull.

Coming back on the step 1 of your methodology, I tried the following when performing an occlusion testing

method 1:

  • disable buffer
  • begin occlusion query
  • draw bounding box
  • end occlusion query
  • enable buffer & other state related stuffs
  • draw real object

method 2:

  • do not disable buffer
  • begin occlusion query
  • draw real object
  • end occlusion query

I found the method 2 to be very slow compare to 1. Have you experienced such a thing?

Eric

The triangles are simply displayed and every 5 frames an occlusion query is