Occlusion Query Speed

hello.
I want to discart some geometry using occlusion queries. its about 10 - 20 objects with 3.000 vertices each. how about the occlusion query speed? is it faster to always draw this stuff or is faster to test this stuff using occlusion queries? am i gonna “find” more fps using queries?

Assuming each object is rendered once per frame you’re going to be transforming between 30,000 and 60,000 vertices a frame. Modern cards can transform 600 Million vertices a second. I don’t think you need to worry about a vertex transform bottleneck.

With occlusion queries you potentially have an additonal draw call with a cpu overhead and additional fill rate overhead.

You’re better off spending time optimising the drawing of your objects. i.e. stripifying your objects to reduce the number of draw calls and using vbo.

well… I use VBO already. I add occlusion and have more fps now. Before when some object was behind a wall it was drawn. now with occlusion queries its not drawn and I get more fps. :wink:

Oh, the old ‘looking at wall’ example.
Hardly the general case though, is it?
Test the performance improvements you get with an average view on the scene - say, if you were on the ‘farcry’ terrain, you’re standing half way up a hill, looking towards the horizon, with about 20-30% of the the horizon occluded by the hill…now do your performance measurements. Each house in the distance is having its bounding box drawn tagged with an occlusion query in addition to being drawn at whatever LOD a house reduces to at 15km (a box, I imagine). You get the idea. This is just an example, don’t start getting pedantic about farcry’s LOD scheme.
Anyway, as Adrian has said (or implied), the main bottleneck these days is the number of batches being drawn, not the size of the batches. The more batches you draw (eg. glDrawRangeElement=1 batch) the more, so-called, CPU limited you become…in other words, you’re making more work for the driver, as it manages your batches, meanwhile the graphics card is waiting for the CPU to give it the next batch to draw.
Now, if you start using occlusion queries like there’s no tomorrow - drawing query bounding boxes of every bucket and bottle of beer lying around - you’re creating more batches, with precious little work in each batch.
The best way of using occlusion queries is to test the cumulative bounding volume of a large number of batches, thus potentially saving yourself a lot of redundant draw calls. So, a portal scheme, or a geomipmapping scheme for terrain, where you test a high level terrain chunk before recursing, but stop doing occlusion tests below a certain level of recursion.
You get the idea, I’m sure.

You might want to read this:
http://www.gamedev.net/community/forums/topic.asp?topic_id=153497

Yann’s own ASM, hand optimized rasterizer seems very nice idea just for occlusion queries:

  • no bothering about anything like geometry batches
  • very simple drawing algorithm (just filling z buffer)
  • clever engine design allows for CPU / GPU parallelism (so you get occlusion culling for free - while GPU draws geometry batches from previous rendering frame :slight_smile: )

Occlusion culling for free? I don’t think so. In my experience, the CPU is plenty busy enough without giving it occlusion rasterizing jobs to do.

Why not? When for instance you’re already using some kind of early z-test optimizations (which I think gets more and more popular) or shadowing algorithms based on the contents of z-buffer (shadow volumes) you get occlusion queries (almost) for free. Of course they can’t eliminate the need for a good CPU based scene management (like octree, bsp tree) but they give nice performance boost at almost no cost.

Originally posted by knackered:

Anyway, as Adrian has said (or implied), the main bottleneck these days is the number of batches being drawn, not the size of the batches. The more batches you draw (eg. glDrawRangeElement=1 batch) the more, so-called, CPU limited you become…in other words, you’re making more work for the driver, as it manages your batches, meanwhile the graphics card is waiting for the CPU to give it the next batch to draw.

I thought Batching was a problem of our dear DX programmers and OGL drivers didn’t suffer that problem, or, at least wasn’t as noticeable as it is in D3D, am i wrong, should i start to worry about batching?

Toni

Well, just set up a vertex buffer with a cube, then set up a vertex buffer with 200,000 cubes.
Now draw the single cube vertex buffer in a loop 200,000 times, and draw the 200,000 cube vertex buffer once.
Time them and compare, and you have your answer.

Im using following codepath:

  1. zfill
  2. turn on occlusion query
  3. render objects boxes
  4. do something else (upload textures, …)
  5. collect occlusion query results
  6. render scene w/o occluded objects

This work fine for me with octree data.

Also, you can use NVX_conditional_render. With this extension you don’t have to collect query result. Before second pass, just set OQ id from first pass and driver will choose to render or not to render tris.

yooyo

Originally posted by knackered:
Time them and compare, and you have your answer.
Perhaps i might state the sentence in another way. Of course a high number of batches must slow rendering process. What i meant was, in d3d u have to be very careful with batching and batching sizes but i thought u didn’t have to be as careful, or, saying it in another way, u didn’t get as great benefit in opengl as u get in d3d. So, as i don’t usually draw 200.000 times the same object and if i am right and the benefit from batching isn’t as high as in d3d, perhaps batching isn’t as necessary as it is in d3d.
Of course it’s desirable to batch, as u will get greater performance, but perhaps u can live without it, and as far as i know u can’t live without batching in d3d.

Err, just benchmark it. As I say, you’ll get your answer. Shouldn’t take you more than 20 minutes to do a direct compare between cubes in d3d and cubes in gl, and then you’ll know for sure without having to believe anyone. Make sure you use VBO in your gl test.
My renderer has both gl and d3d9 implementations, and as far as my tests go, batching is just as important in gl as it is in d3d.