PDA

View Full Version : GL_OCCLUSION_TEST_HP = frustum culling ?



Alessandro_dup1
02-22-2004, 03:13 AM
It seems to me they do the same thing right ? Which of the two is supposed to be faster ?

DopeFish
02-22-2004, 03:28 AM
Frustum culling will remove everything thats outside of the view frustum from being drawn.

Occlusion Query lets you know if there were any fragments of an object actually rendered to the frame buffer (and depending on the exact extension, the number of fragments drawn is returned).

Alessandro_dup1
02-22-2004, 03:31 AM
"Occlusion Query lets you know if there were any fragments of an object actually rendered to the frame buffer (and depending on the exact extension, the number of fragments drawn is returned)."

Ok, and then i render what's visible and discard what is not. So it seems to me the same thing of the frustum culling method.

bunny
02-22-2004, 03:37 AM
Fragments that fail the Z test won't get drawn. Make sense yet?

Jared
02-22-2004, 05:44 AM
if you insist, yes they both have the same result with occlusion query doing a lot more and having a much higher price. high enough to be often useless even for occlusion culling. if youre thinking about using that to get around frustum culling, dont complain about horrible performance. its like making photos of millions of pebbles just to figure out which of them have a certain color, when you could just as well look at them and see for yourself.

Alessandro_dup1
02-22-2004, 06:45 AM
In my case, i have a forest of several trees (about 10000). I already applied a DLOD to them, so the closest have about 10000 triangles, and the most distant just 100. Now i'd like to exclude from the rendering the ones that are outside the field of view. I currently implemented occlusion query, testing if trees are visible againt a cube box. It works but it doesn't give the performances i would like. So that was my question: would frustum culling instead give much better performances ?

Adrian
02-22-2004, 07:00 AM
Frustum culling will be faster if thats all you need. The advantage of occlusion culling is that it will also allow you to not draw objects that are hidden by other objects.

I wouldn't use HP_OCCLUSION_TEST, that has been superceded by ARB_OCCLUSION_QUERY http://oss.sgi.com/projects/ogl-sample/registry/ARB/occlusion_query.txt
The advantages over the hp version are in the spec.

[This message has been edited by Adrian (edited 02-22-2004).]

DopeFish
02-22-2004, 07:14 AM
Originally posted by penetrator:
Ok, and then i render what's visible and discard what is not. So it seems to me the same thing of the frustum culling method.

Occlusion queries arent just for culling. Whilst they can be used to cull hidden objects, it can also be used for other effects such as sizing a flare dependant on how much of a light object is visible.

Tom Nuydens
02-22-2004, 11:49 PM
Originally posted by Jared:
if you insist, yes they both have the same result with occlusion query doing a lot more and having a much higher price. high enough to be often useless even for occlusion culling.

I disagree. It is true, however, that you need to be pretty clever about how you use occlusion queries. You have to keep in mind that

(a) You have to transform and rasterize a bounding volume for the query, so the object you're culling had better be significantly more expensive to draw than the bounding volume itself;
(b) The queries can have rather high latency, so you should structure your code so as to avoid having to go idle while waiting for the results to come back.

If used correctly, occlusion queries work really, really well in my experience. That said, they could and should still be combined with some sort of hierarchical CPU-based culling method to achieve optimal results.

Penetrator, could you provide some more details about how exactly you implemented your occlusion culling?

-- Tom

Adrian
02-23-2004, 12:49 AM
Originally posted by DopeFish:
Occlusion queries arent just for culling. Whilst they can be used to cull hidden objects, it can also be used for other effects such as sizing a flare dependant on how much of a light object is visible.

It's worth pointing out that the HP occlusion test doesn't provide that functionality, only the NV and ARB versions do.


Originally posted by Tom Nuydens:
(b) The queries can have rather high latency, so you should structure your code so as to avoid having to go idle while waiting for the results to come back.

The HP version only provides a 'Stop and Wait' model. To take full advantage of cpu/gpu parallelism he should use the ARB_OCCLUSION_QUERY.

The ARB occlusion query exists in the current WHQL NVidia drivers so I don't know why it hasn't been added to the list of extensions here: http://developer.nvidia.com/object/nvidia_opengl_specs.html

I've read a lot of posts about occlusion queries causing 'bubbles' in the pipeline. I'm not sure how this effect manifests itself and how big an impact it has. As far as I can tell occlusion queries, if used optimally, have a fill rate impact and little else.

Tom Nuydens
02-23-2004, 02:33 AM
Originally posted by Adrian:
The HP version only provides a 'Stop and Wait' model. To take full advantage of cpu/gpu parallelism he should use the ARB_OCCLUSION_QUERY.

Absolutely, I should have made that clear.


Originally posted by Adrian:
I've read a lot of posts about occlusion queries causing 'bubbles' in the pipeline. I'm not sure how this effect manifests itself and how big an impact it has. As far as I can tell occlusion queries, if used optimally, have a fill rate impact and little else.

AFAIK this simply refers to the effect of finishing an occlusion query before the result is available. Doing so causes your pipeline to go idle until all pending commands have been executed, i.e. it's equivalent to calling glFinish(). As you point out, proper use of occlusion queries will minimize this effect and the overhead beyond the inevitable fill rate cost should be negligible.

-- Tom

harsman
02-23-2004, 03:10 AM
AGP can't handle simultaneous upstream and downstream data AFAIK, so transfering the query result from the video card to sys mem could very well cause a pipeline bubble as the card cannot hoover data from memeory during that time.

This will of course be pretty isignificant compared to forcing GPU-CPU synchronization by ending a query prematurely.

Adrian
02-23-2004, 03:31 AM
Originally posted by harsman:
AGP can't handle simultaneous upstream and downstream data AFAIK, so transfering the query result from the video card to sys mem could very well cause a pipeline bubble as the card cannot hoover data from memeory during that time.

Presumably this bottleneck will disappear with PCI Express?

Jared
02-23-2004, 04:42 AM
thats why i said often. if you can tell a few good occluders and apply pretty much all tips they give in the specs it might not be too bad. i tried a lot of things with them, but in the end the gain was minimal and sometimes even slower (obviously, in some scenes doing occlusion culling is just wasted effort, no matter the method).
in the end testrendering the bounding boxes took about as much time as just really rendering them. even tried just drawing two lines connecting the corners. so it would either have meant to write everything around that extension or just ignoring it.

but i guess the real problem i have with it is that often there might be a lot easier ways you dont think about. so i consider it more of a "last chance" for situations where other methods wont work.

harsman
02-23-2004, 05:48 AM
Originally posted by Adrian:
Presumably this bottleneck will disappear with PCI Express?

Probably. PCI express is supposed to have a separate upstream link. But since moving data from GPU to CPU hardly is the common case, I suspect the drivers won't do a stellar job performance wise anyway.

Alessandro_dup1
02-23-2004, 06:07 AM
Penetrator, could you provide some more details about how exactly you implemented your occlusion culling?
-- Tom[/B]

This is the routine that render the trees:

glDepthMask(GL_FALSE);
glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
glutSolidCube(size*0.4f);
glDisable(GL_OCCLUSION_TEST_HP);
glDepthMask(GL_TRUE);
glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
glGetBooleanv(GL_OCCLUSION_TEST_RESULT_HP, &isVisible);
if (isVisible==TRUE)
{
glCallList(pine_tree);

}

V-man
02-23-2004, 06:09 AM
If you are storing all of your stuff on the card, then PCI Express or high AGP speeds won't help but I think a lot of people will.
Main RAM will be like having extended memory.
I don't remember reading about PCI Express memory but I'm sure it will have something like AGP memory.

As for glReadPixels, probably it will suck just as much as it does today if it already does.

Adrian
02-23-2004, 06:48 AM
Originally posted by V-man:
As for glReadPixels, probably it will suck just as much as it does today if it already does.

"ATI thinks PCI Express will allow multiple high performance graphics adaptors, allow new applications using the backchannel bandwidth, and increase the bar on graphics performance" http://www.theinquirer.net/?article=8000

By backchannel are they refering to readback? Also the term 'bidirectional bandwidth' is used frequently in articles about pci express. This must mean something or is it just marketing fluff.

I hoped NVidia/ATI would be clearer about what (if anything) PCI express means to developers particularly regarding readback speed. I've read a number of the technical documents and I still don't really know what its going to mean in real terms.

Tom Nuydens
02-23-2004, 11:41 PM
Originally posted by Jared:
so it would either have meant to write everything around that extension or just ignoring it.

Well yes, you'd have to design your engine around the extension to get good results out of it, but in many cases it's worth the effort IMHO.

Penetrator, unfortunately the code you posted is pretty close to the worst case scenario http://www.opengl.org/discussion_boards/ubb/smile.gif

For starters, as has been mentioned more than once before, don't use the HP extension -- use the NV or ARB one. One way to avoid the latency of the queries is to retrieve and use the results in the next frame, not the current one. Because your bounding boxes somewhat overestimate the size of your trees anyway, popping will hopefully be insignificant.

Next, I wonder in what kind of order you're drawing your trees? The best way to do it is front to back. Back to front is the worst case scenario, presumably you're somewhere in between (i.e. random order)?

Furthermore, you mention having 100-poly LODs for far away trees. You're likely to be fillrate-limited for those, so doing an occlusion query (and rasterizing a bounding box) might cost you more than just rendering the tree in those cases. For these objects, having good CPU-based hierarchical frustum culling could help.

-- Tom

Alessandro_dup1
02-25-2004, 01:07 PM
Thank you Tom, i'm going to try both frustum culling and the Arb occlusion extension. I will post some results later ...

rolfstenholm
02-28-2004, 07:53 AM
Idont think that GL_OCCLUSION_TEST_HP or the nvidia or ARB extension will work. The reason is that the test itself fails many times. The only thing that this tests promises is that objects which are visible always return true.
I have personaly tested the accuracy of a GeforceFX 5200 MX and found that all this test was good for was as a costly version of the frustum culling.
I do not think that test is even useful, even ignoring speed issues.
What I used to evaluate the test was binary tree using with front to back rendering using GL_OCCLUSION_TEST_HP. In a binary tree it doesnt make much difference if the test is fast it only needs to be accurate.
I checked my resulst by rendering all parts of the binary tree using wireframe and calculating the number of polygons clipped.
What I found that it managed perhaps to eliminate half of all polygons and the test basicaly failed to clip distant polygons. The only thing this test clipped appeared to be polygons that would have failed a frustum culling test.
My advice would be to skip that test, at least on my hardware it failed completely to be of practical use.
Has anyone else done a similar test on another GPU ? maybe the GeforceFX MX 5200 has a poor implementation of the extension ?

bunny
02-28-2004, 08:44 AM
I've tried the ARB extension and also my own software rendered solution, using SSE instructions to render 4 pixels at a time. For the city scene I'm rendering, the ARB extension was completely unusable on my Radeon 9700 Pro (about 1/4 of the frame rate). The sw solution OTOH worked incredibly well.

Some details:
http://www.gamedev.net/community/forums/topic.asp?topic_id=209249

[This message has been edited by bunny (edited 02-28-2004).]

Adrian
02-29-2004, 01:26 PM
Originally posted by rolfstenholm:
the test itself fails many times.

I've used the ARB/NV versions extensively and they work 100% as advertised on my 5900u.

Tom Nuydens
02-29-2004, 11:55 PM
I've seen the NV extension run on Radeon 9000/9700 and on GF 3/4/FX and all worked exactly as intended. If you're not getting the results or the performance you expected, check your code. If the hints given earlier in this thread don't suffice, try searching these forums -- there's been plenty of discussion on the proper usage patterns of occlusion queries.

-- Tom