Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 2 of 2 FirstFirst 12
Results 11 to 18 of 18

Thread: Occlusion Culling with FBO+PBO+glReadPixels_async

  1. #11
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,537
    That's not what that means. The GL_QUERY_COUNTER_BITS is the number of bits for the sample counter within a query object. That is, how many samples can a single query count. If it returned 24, then that means that a particular occlusion query object's query count will only be able to count up to 16,777,216 samples. So if you render more than that many samples within a single query run, then the SAMPLES_PASSED count will be undefined.

    It has nothing to do with how many query objects you can get, nor does it affect how many query objects can be active at any one time. As long as you don't try to begin a query while that query is active, you're fine.

  2. #12
    Junior Member Newbie
    Join Date
    Nov 2011
    Posts
    24
    I've tried Conditional_Render...but the result is worst.
    Logic: with conditional_render I don't need a "First Pass" to: loop though objects, wait for queries result, set "occluded" variable to "true" and later verify it during Rendering Pass....so I suppose Queries can be used in the same "Final Rendering Pass". It's right?

    I've done this test with the SAME world/objects/camera of other tests in first page of this thread.


    1. for each 3D object:
      1. disable glColorMask(...)
      2. disable DepthMask(...)
      3. BeginQuery(...)
      4. glDraw() object
      5. EndQuery()
      6. enable glColorMask(...)
      7. enable glDepthMask(...)
      8. glBeginConditionalRender(..., GL_QUERY_WAIT)
      9. glDraw() object
      10. glEndConditionalRender()



    With NV_conditional_render (in Core from OpenGL 3) I get 98fps vs 120fps of ARB_occlusion_query vs 144fps of my PBO+FBO+glReadPixels technique.

    I've also tested it with other glBeginConditionalRender parameters: GL_QUERY_NO_WAIT and even WAIT/NO_WAIT combinations with
    GL_QUERY_BY_REGION_*. Same results.

    Maybe I'm using in a wrong way...please confirm this.

    However with conditional render I'm not able to set any "occluded" variable to exclude hierarchical-object-children (which could be contained INSIDE this by octree) because my application has no any result from occlusion test but it's demanded all to GPU to simply not render previous failed Query. From Application we are unable to know if a parent is occluded and then exclude its children...

  3. #13
    Advanced Member Frequent Contributor
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    821
    Unless I read over something, you seem to neglect the fact that you can also batch multiple bounding boxes (or whatever test geometry you're using) into a single cumulated object and first test that. If this doesn't yield any passed samples, you can throw away the complete batch all at once. Unless your logic for computing the box minimum and maximum is expensive, this can substantially improve OC time with hardware occlusion queries - especially in scenes with high depth complexity. In your case, you could throw out a whole town with a single occlusion query. Not to mention loads and loads of vegetation and all the other good stuff in a virtual world.

    EDIT: The algorithm then becomes:

    a) frustum cull the scene
    b) render large occluders to lay out some depth
    c) group potentially visible objects in spatially coherent areas
    d) for all groups:
    1. render the biggest bounding volume
    2. if any samples pass, split the volume into subvolumes
    3. render subvolumes
    4. goto 2


    Of course, the worst case complexity is higher since you would have to test every single object anyway but introduce addition queries for enclosing bounding volumes. However, you can track the number of tests per frame and if the number of queries surpasses the number of objects multiple frames, you can switch to your current approach dynamically. Just don't forget to switch back every once in a while to see if you benefit.

    BTW, if an object doesn't consist of enough faces and shader complexity is simple enough, consider not testing it at all as rendering it will not incur significant cost - neither for vertex nor for fragment processing.

    All in all, occlusion culling isn't an easy task as it depends on many factors like depth complexity, camera movement speed/direction/orientation, shader and mesh complexity (cost/benefit ratio), changes of scene geometry (i.e frame 0 is cool, frame 10 isn't because something big moved relative to the camera) and so on.
    Last edited by thokra; 02-06-2013 at 05:52 AM.

  4. #14
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,847
    Perhaps you should consider just doing an end-run around explicit hardware occlusion query, conditional-render predicated or not, and just read from a depth texture directly for your occlusion test. There's nice benefits there for batching (no more shuffling around with a bazzillion little IDs). Check out the MIPed depth buffer (Hi-Z) approach used in March of the Froblins and on rastergrid's site among other places.

  5. #15
    Junior Member Newbie
    Join Date
    Nov 2011
    Posts
    24
    Quote Originally Posted by thokra View Post
    you can batch multiple bounding boxes into a single cumulated object and first test that
    In this case I need to waste many CPU cycles to do matrices multiplication CPU-side.
    I'm using UniformBufferObject to pass once matrices to Shader (which will do all multiplications, GPU skinning, etc...), but in this way I demand all this work to CPU to create a VBO which also I cannot re-use, as it is, for final render.
    Anyhow "merging" VBOs for this purpose is not a solution because a character inside an house is "never" visible (considering no windows or opened doors) from outside, so merging both VBOs is useless: the character just needs to be excluded from occlusion_query and rendering phase, not computed during occlusion test.
    I've got an hierarchy like this below:
    • Terrain
      • House
        • Saloon
          • Character 1 whatching TV
          • Sofa
          • Television
          • Stairs

        • Bedroom
          • Bed
          • Lamp
          • Character 2 bathing itself


    And if House test is failed, I'm not interested to test its children.
    This could be an example like a dungeon inside/behind a mountain-wall: I can group all the dungeon in a single hierarchical structure and decide to not test its children if this parent object is behind the mountain which I've in front of me.
    In you way I should merge all VBOs in one big one and test occlusion on whole that VBO...and I think it's not a good solution.

    Quote Originally Posted by thokra View Post
    If this doesn't yield any passed samples, you can throw away the complete batch all at once
    Yes, I understand your idea....but merging many objects in one big VBO is not a solution. Imagine a big house with many furniture or characters inside it: should I merge all of them in the test VBO? What a waste of CPU...
    And this merged-VBO is not re-usable during real render phase due to obvious problems I think I not need to explain.

    Quote Originally Posted by thokra View Post
    you could throw out a whole town with a single occlusion query
    But if the single-and-big test is passed then I should render all those objects...or I should execute additional occlusion_queries for each its child. It doesn't seems to be the best idea.

    Quote Originally Posted by thokra View Post
    b) render large occluders to lay out some depth
    How can I know, CPU-side, if the object will be BIG on screen after rendered? A small stone could be bigger (in rendered pixels) than an house if the house is 10 miles far from the stone. To know if an object will be big or small (in pixels screen) I need too much calculations about their zDepth, bounding coords and how much portion of the object still be present on screen.
    However I need to sort object from front to back...and it is not simple in many cases.

  6. #16
    Advanced Member Frequent Contributor
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    821
    Quote Originally Posted by tdname
    In this case I need to waste many CPU cycles to do matrices multiplication CPU-side.
    Quote Originally Posted by tdname
    Yes, I understand your idea....but merging many objects in one big VBO is not a solution. Imagine a big house with many furniture or characters inside it: should I merge all of them in the test VBO? What a waste of CPU...
    And this merged-VBO is not re-usable during real render phase due to obvious problems I think I not need to explain.
    You are aware that you need only compute the min and max of all enclosed objects, right? You need exactly one unit cube in a buffer object which you can then scale in the vertex shader to cover the space designated by your min/max. No uploads, no nothing. That's very cheap compared to multitude of occlusion queries. Where did you read that you merge VBOs for this purpose? I have never mentioned that you should move any of the real objects anywhere. You only need to set a few uniforms and render the scaled unit cubes.

    Quote Originally Posted by tdname
    And if House test is failed, I'm not interested to test its children.
    That's the whole point. The house's bounding box naturally encloses what's inside the house. You test the house and if it fails everything inside is out. However, there are vast areas in your scene which are spatially coherent, i.e. close to each other, but are disjunctive in nature. A tree will not contain another tree. Still, if they are close, you can simply add their bounding volumes and check this single volume and maybe cull both objects with a single occlusion query.

    Quote Originally Posted by tdname
    How can I know, CPU-side, if the object will be BIG on screen after rendered?
    Is your terrain not big? If there's only the terrain functioning as a large occluder, so be it. Nobody's talking about spatially small objects.

    Quote Originally Posted by tdname
    But if the single-and-big test is passed then I should render all those objects...or I should execute additional occlusion_queries for each its child. It doesn't seems to be the best idea.
    It doesn't? Well, if you can reduce the number of occlusion queries by 50% instead of 100% percent, isn't that a good idea? I already said above, you may likely have worse performance if you don't have a lot of occlusion. Otherwise this should almost always be a win. You don't go directly from big occlusion test object to all enclosed objects. You test one, object then 2 (or 4 or 8 ...) and so on. If you enclose 100 objects and only need to do 20 queries instead of a full hundred queries, that's a big win.

    EDIT: BTW, you know what bounds multiple objects which are disjunctive in a nice way? A spatial data structure like an octree or quadtree. This is actually the cheapest way of testing large sections. Use what you already have you can start a some level below the root (because smaller objects most likely won't intersect multiple big cells) and hierarchically refine the queries until you reach you maximum depth. Iff you actually reach maximum depth, you would need to test the contents of the whole cell, or manually group objects in the cell and do it hierarchically again as described above.

    The topic is very exhaustive and fun to play with. However, I just remembered that you don't use any spatial data structure in your application and that's not good. It ruins some perfect optimization opportunities - especially when doing frustum and occlusion culling.
    Last edited by thokra; 02-06-2013 at 07:05 AM.

  7. #17
    Junior Member Newbie
    Join Date
    Nov 2011
    Posts
    24
    Quote Originally Posted by thokra View Post
    Where did you read that you merge VBOs for this purpose?
    You said: "batch multiple bounding boxes into a single cumulated object". For me this is as "merge all VBOs into a single big VBO" (VBO formed by bousing_boxes)
    However to scale that cube I need to do special calculations to transform a cube (with dimensions [1,1,1] at position [0,0,0]) to dimensions [1.2, 0.5, 4.7] at position [x1, y1, z1]. Yeah it's "simple" and quicker than what I've understood to merge into one single VBOs, but it seems to be a dirty technique.

    In my case I'm using Interleaved VBOs with firsts 8 vertices (as offset) as bouding_box min/max: "12345678VNTP,VNTP,VNTP,..." where 12345678 are V1,V2,V3,...V8 of bounding_box vertices and VNTC is the Interleaved VBO formed by VertexNormalTexcoordPickid.
    In this way I've already stored bounding_boxes and Picking Values (for my method) in one single upload to GPU.
    When I want to draw bounding_box I just need to call glDrawArrays(GL_QUADS, 0, 8), and when I want to draw the complete object I call glDrawArrays(GL_TRIANGLES, 8, verticesCount).
    So I've already got full bounding boxes coords stored on GPU and I need just to pass translate/rotative matrix which is the same, stored, pre-calculated and re-usable by CPU of the original object.

    (PS: if I want to do a Picking Rendering Pass (using my technique with glReadPixels) I just active VBO's attribute "Color0" and change to a different Shader (which assigns colors according to Pickid) without any additional upload to GPU)

    Quote Originally Posted by thokra View Post
    You test the house and if it fails everything inside is out
    But with NV_conditional_render you are unable to do this.
    Application doesn't know nothing if previous occluded test returned a visible or a complete occluded object, so you are unable to use an IF statement (on what "occluded == true" variable?) to process or discard its children.
    Using ARB_occlusion_query is the most closed and working procedure to avoid this ConditionalRender limit.

    Quote Originally Posted by thokra View Post
    Nobody's talking about spatially small objects
    An house is a big or small object? It depends on how much you are close to it.
    If I put my face close and in front of a wall I can't see other houses, trees or other things, because the biggest object is the house and not the others. So it's all relative. In this example the House could be the biggest occluder, and not the terrain.
    Imagine a close quarters shooting FPS: most of the rendered pixels are objects and not terrains.
    I'm trying to manage all those cases and not try to solve just one simple and specified situation.

    Quote Originally Posted by thokra View Post
    you don't use any spatial data structure in your application and that's not good
    Yes I know and I prefer to test techniques differences in this environment.
    Using a sort algorithm, octree or other spacial arrangement, I could be unable to really test many different approaches.
    In fact if I order objects from front to back and use octrees, I'm sure an ARB_occlusion_query is more precise and maybe quicker than my FBO+PBO+glReadPixels method, but to order and use octrees you need initial CPU work.

    In an normal way how much things you should do to......
    1. know on which object you have the cursor
    2. zDistance from camera for many rendered pixels of that object
    3. discard objects if pixels are less than a threshold (5px for example could means too small object to be interested to render it)
    4. pass occlusion culling calculations to another thread (impossible with occlusions_queries which should be run in main thread)
    5. know if rendered object is near a viewport edge ("who are interested in this?" this is not my problem)

    ......?

    I'll try to answer:
    1. raycast using mouse coords, convert those coords in a world_space and intersect with objects to find the first by its ZDepth
    2. zDistance are calculated during frustum culling using bouding_boxes or spheres coord/radius, not for any single pixel
    3. for this point you need ARB_occlusion_query additional call. NV_conditional_render cannot help
    4. maybe possible with Multiple OpenGL Contexts, but I've not tested it. So I'll consider it as impossible.
    5. maybe with additional calculations using frustum planes (am I right?)..but I'm not sure neither I'm much interested to know


    • it is required something more:
      • sorting objects from front to back
      • zDepth pass to intersects with mouse raycast


    I think it depends of what you need.
    I'm started to use Color Picking Selection and quickly increased its efficiently to this FBO+PBO+glReadPixel method.
    Single pixel precision is not so important to me and having all data within a single readback is more sweet than implement 10 different tricks to have the same in formations after all.

  8. #18
    Advanced Member Frequent Contributor
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    821
    Quote Originally Posted by tdname
    You said: "batch multiple bounding boxes into a single cumulated object". For me this is as "merge all VBOs into a single big VBO" (VBO formed by bousing_boxes)
    You don't get it. I never ever said anything about moving any data. I said you should batch process multiple objects which simply means you don't look at single objects but at a collection and try to make a decision for this collection. In the case of bounding boxes this does only mean to determine the minimum extent in space and the maximum extent in space. This alone is enough to compute a bounding box. So far, this is all CPU stuff. If you allocate a single unit cube at application startup, you can render this cube for any bounding box in your application. All you need to do is properly transform it in the vertex shader. You don't have to allocate any new buffers and don't transfer any data - at all. Got it now?

    Quote Originally Posted by tdname
    But with NV_conditional_render you are unable to do this.
    Who said anything about conditional rendering? I was always talking hardware occlusion queries only - which makes sense since in my suggested approach you still need roundtrips to the CPU. However, you can minimize the work the application has to do and the time it has to wait for query results by effectively batching stuff.

    Quote Originally Posted by tdname
    An house is a big or small object? It depends on how much you are close to it. If I put my face close and in front of a wall I can't see other houses, trees or other things, because the biggest object is the house and not the others. So it's all relative. In this example the House could be the biggest occluder, and not the terrain. Imagine a close quarters shooting FPS: most of the rendered pixels are objects and not terrains.
    First of all, the case you depict is very simple: You can use a bounding sphere as a coarse estimate and simply make your decision a function of distance and sphere radius. This way you can determine whether an object is a large occluder or not. This is merely an augmentation of your frustum culling step. This will handle even the smallest objects properly.

    Second, in the second scenario you depict, a close quarters scenario, a spatial data structure immediately comes in handy. One simple approach is to simply render the cell the camera is currently in which will have many many other cells discarded naturally. And this does not even mean you need to apply some sophisticated batching as long as the cells in your data structure are sufficiently small. So, again, theoretically not a problem.

    Quote Originally Posted by tdname
    I'm trying to manage all those cases and not try to solve just one simple and specified situation.
    Good luck with that. You can't handle every possible case. I would go as far to suggest that perfect (as in 100% correct) occlusion culling and high performance are mutually exclusive.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •