PDA

View Full Version : occlusion query not 'almost free'



zed
02-14-2005, 10:34 PM
i decide to try out occlusion querys at the same time i lay down the depth/ambient pass. from what i read this will come with very little cost, but i lose about 10% framerate (this is just with the following code, ie no querying) roughly 200-300 draw calls

glBeginQuery(GL_SAMPLES_PASSED, queriesID );
// draw geometry
glEndQuery(GL_SAMPLES_PASSED);

why such a great loss?
* opengl shaders are always enabled, ie im not doing a pass with colormask off, could this have something to do with it (though in the spec the example that i follow does my method)
* queriesID is a number that ive choosen, but ive also tried with glGenQueries(..)

since from my testing with glGetQueryObjectivARB 95-100% of the stuff i send is visable, its really not worth implementing occlusion query testing,

what have other ppl experienced?

from then spec.
-----

// First rendering pass plus almost-free visibility checks
glDisable(GL_BLEND);
glDepthFunc(GL_LESS);
glDepthMask(GL_TRUE);
// configure shader 0
for (i = 0; i < N; i++) {
glBeginQueryARB(GL_SAMPLES_PASSED_ARB, queries[i]);
// render object i
glEndQueryARB(GL_SAMPLES_PASSED_ARB);
}

rgpc
02-15-2005, 01:10 AM
The first question that leaps to mind for me is how are you drawing your primitives? I would expect (I haven't used occlusions queries) that if the data is being pushed across the AGP bus then you would expect a performance hit. But that's just a guess...

EG
02-15-2005, 04:19 AM
Never found occlusion queries to be entirely free, however IME the performance loss when querying happens approx. a whole frame after the queried geometry is more in the range of a handful of percents at worst (for a few dozen different queries in a frame, for scenes that are more CPU/geometry/AGP limited than fillrate/fragment limited).

Do you reuse the same queriesID for several glBeginQuery in the same frame?
Did you try cycling several IDs over several frame?

zed
02-15-2005, 09:59 AM
The first question that leaps to mind for me is how are you drawing your primitives? I would expect (I haven't used occlusions queries) that if the data is being pushed across the AGP bus then you would expect a performance hit.im already drawing the data (as its an ambient/depth pass), thus im not actually drawing any extra data

Never found occlusion queries to be entirely free, however IME the performance loss when querying happens approx. a whole frame after the queried geometry is more in the range of a handful of percents at worst .the thing is im not actually asking for any results back the 10% loss comes just from adding the two lines
glBeginQuery(GL_SAMPLES_PASSED, queriesID );
glEndQuery(GL_SAMPLES_PASSED );
actually now i think about it its even worse than 10% (as i was ignoring the other stuff in the render frame, eg light pass, alpha passes etc) more likely closer to a 20% loss.
perhaps its something to do with me doing more than a handfull of occlusion querys then again 200-300 aint really that many are they?
also perhaps maybe it has something to do with me having a occlusion query buffer of from 1500-5000 IDs (ie each mesh gets an ID at the start), then again i did try with glGenerate/deleteIds..(..) to generate free ids per frame but it still runs slow

cass
02-15-2005, 02:58 PM
Hi Zed, what hardware and driver are you using?

zed
02-15-2005, 07:28 PM
gffx5900 70.90

LogicalError
02-15-2005, 10:22 PM
How much is your performance loss in miliseconds?
FPS is such a bad way to measure performance since it's non linear..

zed
02-18-2005, 06:12 PM
a couple of hours testing and im still not much further,
i wrote a simple glut app (but no difference between occlusion + non occlusion) (ill add shaders later see if that makes a difference)

but anyways in my app, the problem also is in immediate mode,
from all the materials i tested the only one that didnt show a difference between occlusion + non occlusion was standard pipeline wireframe tris.
even standard pipeline fill tris seemed to be 2-3% slower

with some of the materials (esp using glsl) theres a huge difference ~50% slowdown.

im downloading new drivers 71.81 to see if they make a difference, if not ill play around with the glut app a bit more


How much is your performance loss in miliseconds?
FPS is such a bad way to measure performance since it's non linear..aye milliseconds are just as bad as fps

zed
02-18-2005, 11:07 PM
new drivers no better, a further couple of hours testing no further, i couldnt replicate the problem in a glut app,
i believe its a driver problem that only exhibits itself in the particular way that im doing the rendering,
im willing to send the app cass

cass
02-20-2005, 07:29 AM
Please do send me a small app that illustrates the problem, if you don't mind, Zed.

zed
02-20-2005, 02:48 PM
Originally posted by cass:
Please do send me a small app that illustrates the problem, if you don't mind, Zed.ive sent it to your ru address (not the nvidia one)

sorry like i mentioned i couldnt replicate the problem in a glut app, so i sent a simplified version of my program.
spacekey changes occlusion on/off (1-6 changes screen resolution)
like i said its not to important at the moment for me as less than 10% of meshes are being occluded but this might change dramatically in the future, as im looking into doing occlusion of light volumes.

thanks for looking into it cass

SeskaPeel
02-21-2005, 09:51 AM
cass or zed, you would be so kind to let us know how this ends, I'm considering the same approach at this time ...

Thanks,
SeskaPeel.

cass
02-21-2005, 04:43 PM
Sure - I'll follow up with what the issue is/was for Zed's performance drop.

cass
02-23-2005, 09:05 AM
Zed, do you have any idea why you can't repro the slow-down in a GLUT program? That seems fishy.

zed
02-23-2005, 04:37 PM
yeah i know it does, but i have no idea, my apps engine is big and complicated unfortunatly thus its very hard to replicate the same situation, in a glut app.
u tried the program i assume?, did it not show a difference between the 2 states?
on my computer (5900XT) occlusion on runs ~200fps occlusion off ~300fps, the only difference is the inclusion of those 2 statements,

perhaps its something unrelated like the pointsize being set to say 20.0, and this is somehow causing the rendering to go down a different path, whenever occlusion is on.

i assume u have some driver app that lets u ignore the 2 statements, perhaps run it with that enabled thus glBeginQuery(GL_SAMPLES_PASSED, queriesID ); are ignored, if the framerate goes up perhaps that will shed light on the problem

zed
03-01-2005, 05:51 PM
any further info cass, at least confirmation that its a bug in the driver that is gonna be fixed at a later stage, so i can retain the occlusion culling in the pipeline

cass
03-04-2005, 07:44 AM
Hi Zed, no confirmation yet. Still in the queue.
I'll let you know.

Thanks -
Cass

Obli
03-09-2005, 04:05 AM
Since I'm also interested in how this turns out, I would be glad if the response could be posted on the forum... maybe as a new thread for obvious reasons.
Thank you!

zed
03-09-2005, 08:47 AM
SeskaPeel, Obli are u's also seeing a loose of framerate when u use occlusion query?
im willing to send ppl a sample app that tests for the speed lose, perhaps im the only person seeing this?, ie it would help cass out if it happens on a wide variety of nvidia hardware.

billybolluxbouncingballs at yahoo.co.uk

Obli
03-13-2005, 05:24 AM
Sorry, I am not actually using them, just planning their use in a thing I'm doing here.
I could, however, help with the testing if you need this information.
I sent a mail to the address you specified. Maybe we shall consider it as a faster/alternative way to work/communicate on this?

jide
03-15-2005, 03:25 AM
zed, I've just started to see what occlusion query is. So I read the spec of this extension. Altough I didn't finish this work, I'm almost sure you don't use them in an efficent manner.

We can read many interresting things in the spec that might revealed you missed some steps. I'm not still sure as I know them since yesterday. But we never know...

Hope this helps.

knackered
03-15-2005, 03:48 AM
:D

jide
03-15-2005, 04:06 AM
Originally posted by knackered:
:D And I'm still not sure because he stippled only few lines of code. It's only one part, not the whole.

For example, we don't know how he creates them, neither how he manages the boundings, neither how he is managing the cpu...

jide
03-15-2005, 04:15 AM
Originally posted by zed:

on my computer (5900XT) occlusion on runs ~200fps occlusion off ~300fps, the only difference is the inclusion of those 2 statements,
Well, I read almost all the thread. :)

I think, if I read correctly the spec, that this is not good. You should have more than two statements in difference.

Here is a code example we can find out in the spec:



GLuint queries[N];
GLuint sampleCount;
GLint available;
GLuint bitsSupported;

// check to make sure functionality is supported
glGetQueryiv(GL_QUERY_COUNTER_BITS_ARB, &amp;bitsSupported);
if (bitsSupported == 0) {
// render scene without using occlusion queries
}

glGenQueriesARB(N, queries);
...
// before this point, render major occluders
glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
glDepthMask(GL_FALSE);
// also disable texturing and any fancy shaders
for (i = 0; i < N; i++) {
glBeginQueryARB(GL_SAMPLES_PASSED_ARB, queries[i]);
// render bounding box for object i
glEndQueryARB(GL_SAMPLES_PASSED_ARB);
}

glFlush();

// Do other work until "most" of the queries are back, to avoid
// wasting time spinning
i = N*3/4; // instead of N-1, to prevent the GPU from going idle
do {
DoSomeStuff();
glGetQueryObjectivARB(queries[i],
GL_QUERY_RESULT_AVAILABLE_ARB,
&amp;available);
} while (!available);

glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
glDepthMask(GL_TRUE);
// reenable other state, such as texturing
for (i = 0; i < N; i++) {
glGetQueryObjectuivARB(queries[i], GL_QUERY_RESULT_ARB,
&amp;sampleCount);
if (sampleCount > 0) {
// render object i
}
}
So, whether I'm totally wrong with occlusion query (that could be true because I'm just begining them), or whether you're wrong.

With what I understand with this sample code, there are more than 2 occlusion query function calls in addition.

Can someone says if I'm wrong or not ?

zed
03-15-2005, 08:17 AM
i was using them more like in the second/third example in the spec (since im drawing objects with multiple passes)

ie this code (with about 200 draw commands)

// configure shader 0
for (i = 0; i < N; i++) {
glBeginQueryARB(GL_SAMPLES_PASSED_ARB, queries[i]);
// render object i
glEndQueryARB(GL_SAMPLES_PASSED_ARB);
}true for occlusion to be useful u have to also query it.
but for me just adding the above 2 lines drops the framerate from 300fps to 200fps, thus somehow just adding those 2 lines throws the rendering onto a slow path.

in the captain courgette game i posted u can turn occlusion on/off this is using occlusion with query as well (thus the occlusion path uses less draw commands) though even here occlusion is still slower than it is when its disabled

yooyo
03-15-2005, 09:12 AM
Why you render all faces? Instead of that, after first z-fill pass, you can render only boundary boxes of octree (or objects) using occlusion query.
This should remove a lot octree nodes or objects from further passes.

yooyo

jide
03-15-2005, 09:50 AM
yes, as yooyo and I understand that, you might forget to 'register' the occlusion test with bounding boxes.
If you draw all your objects for 'registering' your occlusion queries, then the gpu will have much work to do in order to test if a sample will pass or fail.

Also, try out the single pass.

zed
03-15-2005, 07:03 PM
you can render only boundary boxes of octree (or objects) using occlusion queryi can see many cases where this can be slower and esp in my case where 95+% of actual geometry is NOT occluded, using BBs you will have a even worse result.

anyways yooyo jide, youre missing the main point of why i posted this topic, which is.
enabling occlusion query as specifyied in the spec (if u are already drawing the object) should come at little or no cost. BUT in my app it comes at a big cost (either from bug i the drivers/falling off the fast path)

// pieces from the spec follow

In multipass rendering situations, however, occlusion queries can
almost always save fill rate, because wrapping an object with an
occlusion query is generally cheap. See "Usage Examples" for an
illustration.

im using occlusion query like the folling part in the spec

// First rendering pass plus almost-free visibility checks
...
...

Jan
03-16-2005, 01:50 AM
Actually, does anyone know something about the status of NV_conditional_render (or so) ? I thought occlusion queries could become one of the easiest and most efficient optimizations with this extension, but there seems to be no plan to implement it, yet.

Jan.

Roquqkie
03-18-2005, 05:37 AM
Hmmm...

I still have mixed feelings towards hardware occlusion-queries.

I think an *extremely* efficient software-rasterizer (only render bounding-volume geometry to a software 32bit z-buffer) combined with Hierarchical Occlusion Maps (HOMs) implemented with a 1-frame dependancy is a better choice.

Since render-calls are asyncronous you can render your scene while rendering the next frame's HOM. This will leed to almost free occlusion-culling, given that you have an efficient technique to check projected bounding-volumes against your HOM in screen-space.

Best regards,
Roquqkie

Adrian
03-19-2005, 01:08 AM
Originally posted by Jan:
Actually, does anyone know something about the status of NV_conditional_render (or so) ? I thought occlusion queries could become one of the easiest and most efficient optimizations with this extension, but there seems to be no plan to implement it, yet.

Jan.I'm interested in this to. It's been almost a year since we were told more information was coming "soon".

http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011740

bChambers
03-19-2005, 09:21 AM
Originally posted by zed:
enabling occlusion query as specifyied in the spec (if u are already drawing the object) should come at little or no cost. BUT in my app it comes at a big cost (either from bug i the drivers/falling off the fast path)
How complex is your scene (you seem to stress that it's isn't very)? The impact of enabling a feature like this would actually be inversely proportionate to the scene complexity.

To take an analogy from assembly programming, if you add 5 cycles to a 50 cycle loop, you're increasing the length of the loop by 10%. If you add 1 cycle to a 3 cycle loop, you're increasing the length by 33%.

IOW, since your scene is simple the overhead is relatively large. When your scene is more complex (ie, you're getting sub-50 fps) you probably won't notice the overhead (might drop by 1 or 2 fps).

...Chambers

Obli
03-24-2005, 12:51 AM
Originally posted by Adrian:
I'm interested in this to. It's been almost a year since we were told more information was coming "soon".
http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011740 I also was interested and I still am.
Maybe it's time to get an update from the vendor? After all, NV will be working on proto-GF8 right now.
By the way, how much people is really using OC queries? I'm quite curious.

Bryan Dudash
04-05-2005, 08:33 PM
Hi all,

So I've been pointed to this thread. I'm part of NVIDIA's developer technology group. And I figured I'd comment a little.

Occlusion queries aren't completely free, but shouldn't have that much overhead. zed, I got your sample application from Cass and ran some tests. I as able to setup a similar system to yours (AMD, FX5900U, 71.83, which is the latest beta driver not sure where you got the 71.90... :p ) and am able to repro a 1.16ms increase in frame time between the "using occlusion" and not. I also ran the sample on a GF6800U and did not see any difference between the two modes. It is possible that something wonky is going on here, and I've filed an internal bug. We'll see if we can't get to the bottom of this. Not sure how long it will take though. Just thought I'd post and mention that for now it seems to not occur on GF6xxx series.

Also, any more info on this issue that you have would of course be helpfull. :D

Thanks!

-B

zed
04-06-2005, 10:00 AM
"repro a 1.16ms increase" doesnt sound much but thats from 4-5ms total i take it, so it is a sizable chunk

heres the test app if anyone wants to try it (spacebar changes between occlusion on/off)
http://motueka.homeip.net/kea_occ.rar
it would perhaps be helpful to know if it affects other gffx cards eg gf5200,gf5700 or is it just restricted to 5900.
this has become a bit more important for me now, since ive made the camera angle more horizontal (and thus meshes have a higher chance of being occluded), also ive started using occlusion for light coronas as well

Adrian
04-06-2005, 10:05 AM
I get the error
'The procedure entry point SDL_HasRDTSC could not be located in the dynamic link library SDL.dll'

zed
04-06-2005, 08:10 PM
you must be using a different version of sdl (sorry i should of included it)
anyways sdl.dll and sdl_mixer.dll are included here
http://motueka.homeip.net/CC_ver2.rar ~600kb

Adrian
04-06-2005, 11:17 PM
Thanks, but now I get the error
"textures/terrain/heights_terrain.tga texture not found"

The textures folder was in the first download but not the second, regardless though, that particular texture isnt in either download.

zed
04-06-2005, 11:53 PM
sorry i should of explained better
http://motueka.homeip.net/kea_occ.rar is the version to run
this -> http://motueka.homeip.net/CC_ver2.rar is only so u can get sdl.dll sdl_mixer

but to make it even more straightforward ive uploaded a new version of http://motueka.homeip.net/kea_occ.rar with sdl.dll + sdl_mixer.dll included

Adrian
04-07-2005, 08:22 AM
The heights_terrain.tga texture is not in *any* of the downloads.

zed
04-07-2005, 10:07 AM
The heights_terrain.tga texture is not in *any* of the downloads.true (but so are most of the textures since i removed them to keep the download small) they arent needed to run this demo ->
http://motueka.homeip.net/kea_occ.rar
(i just downloaded it and uncompressed ir and ran it, so it does work, like a said it may complain about a lot of missing textures but thats intended it will still run)

if u really insist in having the textures then check here :)
http://motueka.homeip.net/captain_courgette.html

Jan
04-07-2005, 01:46 PM
I WANT MY MONEY BACK !

The demo runs on my ATI-card! Well, it has errors, the font-texture is used for everything, so it looks quite bad (i insisted on having all textures), but it doesn´t crash my pc and not even the app itself crashes !

I am very disappointed in you.

Jan.

zed
04-07-2005, 05:51 PM
this is turning into a farce :) oh well it is friday afternoon and ive gotten the afternoon + tommorrow off work
jan does pressing the spacebar and turning occlusion on/off change the fps rate? (+ yes the alphabet texture used everywhere is intended, if u r a masochist then run the other demo CC_ver2 (with the textures it will crash the computer if u have an ati machine)

Jan
04-08-2005, 01:51 AM
Well, there are two nervous numbers on the bottom right, which might be FPS. But since they get overdrawn by some crazy line, i am not sure.

Anyway, pressing space-bar seems not to affect them.

Jan.

zed
04-08-2005, 10:34 AM
Originally posted by Jan:
Well, there are two nervous numbers on the bottom right, which might be FPS. But since they get overdrawn by some crazy line, i am not sure.min/max fps and polys/second polys/frame, yeah it does sort of jump around a bit cause everything is running much faster than intended (im just drawing everything with a single material instead of multiple materials)


Anyway, pressing space-bar seems not to affect them.thanks for that jan, so it does seem as if gffx cards do have an issue with occlusion that doesnt affect ati or gf6xxx. ill have to dig deeper