PDA

View Full Version : 3D tile visibility



ugluk
07-22-2010, 08:30 AM
Currently I am using the following system for visibility determination of some 3D tiles, that cover the whole z = 0 plane:

- make an AABB of the frustum in world coordinates
- check for intersection with z = 0 plane, if not don't draw anything
- make AABBs of the tiles in world coordinates and check for frustum-AABB intersection in world space, if there's no intersection don't draw anything

By doing everything in world coordinates I only need to transform the frustum AABB from modelview to world space, otherwise I'd have to transform the tile AABBs into modelview coordinates.

Anyway, despite all the optimizations, my app still runs too slowly on old machines (about 20 fps) and I'd like to hear some ideas. Maybe I should be using spheres or calculate frustum-plane intersections exactly? The frustum is the linear perspective one.

Dark Photon
07-22-2010, 09:09 AM
Anyway, despite all the optimizations, my app still runs too slowly on old machines (about 20 fps) and I'd like to hear some ideas. Maybe I should be using spheres or calculate frustum-plane intersections exactly? The frustum is the linear perspective one.
Yeah, spheres are dirt cheap. Depending on your frustum, can also be a win to do a sphere-sphere intersect (one being your frustum's bsphere) first before doing a proper sphere-frustum intersect (which often involves a bunch of sphere-plane tests).

Also, having a hierarchy to your spheres/boxes such as you get from BVH or spatial partitioning scheme is usually a huge win.

carsten neumann
07-22-2010, 11:29 AM
Hierarchical culling is definitely the way to go, otherwise you only get an improvement if the render time of a tile is larger than the time to do the intersection test(s). Depending on the complexity of each tile that may be difficult to achieve.

ugluk
07-22-2010, 12:57 PM
Ok, but in what situations are then AABBs useful?

nickels
07-22-2010, 01:29 PM
I use AABB for 'broad-phase' collision detection with good results. I use the sweep and prune algorithm: Basically you keep three sorted lists of all the AABB's endpoints, by dimension (X,Y,Z).
After objects move, you sweep back through each list and do a bubble or insertion sort (I forget which...). While swapping, if a right endpoint crosses a left, then the two objects intersect in that dim. If a left end crosses a right, then you remove an intersection.
You then take the intersection of all three dimension's intersectees to get your potentiall set of intersecting objects.

If you only care about objects intersecting the frustrum, you could only add objects to the intersect pool if one of them is the frustrum....

works very fast for lots of objects on modern hardware....not sure what hope for older :p

I also have a higher level k-d tree each of whose leaves contains one of the sweep and prune sets described.... As objects intersect the outside world they are added to the adjoining k-d boxes and deleted when the leave....
I'm not sure about the frustrum, though. I haven't included it yet because: 1) It moves very fast, so many swaps?? and 2) it is very big and hence causes touching all groups of objects and enlarges the intersection test, above.

I also use view space AABB to do light occlusion culling and light screen space determination. I am a big fan of AABB !!

Is there a similar optimization like s & p for bounding spheres?

ugluk
07-23-2010, 03:21 AM
I'd like to know, as both spheres and AABBs are rather simple, in what situation to use which.

Dark Photon
07-23-2010, 06:01 AM
I'd like to know, as both spheres and AABBs are rather simple, in what situation to use which.
When you get a net perf advantage by doing so.

Spheres are cheap to test against but provide rather course bounds. So they're good "top level" tests. OBB (and sometimes AABB, with proper choice of space) can provide a tighter fit, but are more expensive.

It all depends on what inner operation you're protecting with the BVH. If it's a cheap operation, it may not be worth a lot of careful culling.

ugluk
07-24-2010, 01:06 AM
In other words, bench it and see? For now I can give these results:

- no culling: 9 fps
- culling, frustum - tile AABB: 20 fps

The other option is:

- culling, frustum - tile sphere

I thought maybe the rule of thumb was: on an old crappy comp, use spheres, on the newer ones, use AABB or OBB.

The problem is see with the frustum sphere is the need to calculate the bounding sphere for each frustum change. Frustum has 8 points, how do I select the 4 that are needed to calculate the bounding sphere directly? Can I just choose 2 points on the near and 2 on the far plane, like this:



+----
| |
----+


+ marks the spot for calculating sphere.

Dark Photon
07-24-2010, 07:51 AM
In other words, bench it and see? For now I can give these results:

- no culling: 9 fps
- culling, frustum - tile AABB: 20 fps
fps is non-linear. Use ms/frame. Says you're going from 111 ms/frame (no culling) to 50 ms/frame.


The other option is:
Oh no, there are lots of other options.


- culling, frustum - tile sphere
That's one. And by frustum I assume you mean culling against 6 planes that define the frustum, right?

So what are some other permutations?

1) frustum - tile-sphere, and then if it passes do frustum - tile-AABB
2) frustum-bphere - tile-bsphere
3) frustum-bsphere - tile-bsphere, and then if it passes do one or more of the other more expensive tests (frustum - tile-bsphere, frustum-bsphere - tile-AABB, frustum - tile-AABB).


I thought maybe the rule of thumb was: on an old crappy comp, use spheres, on the newer ones, use AABB or OBB.
It depends on the expense of the operation you're trying to protect and the expense of protecting it. That depends on more than just your CPU and GPU perf.


The problem is see with the frustum sphere is the need to calculate the bounding sphere for each frustum change.
No, this is very cheap. And even then, your frustum as defined in eye-space almost never changes. And if you cull in eye-space, then consequently the frustum bsphere almost never changes. Right? Even then, compared to culling the scene, this is a nothing in terms of cost.


Frustum has 8 points, how do I select the 4 that are needed to calculate the bounding sphere directly?
For simple symmetric frusta, I think you can usually get an exact answer by taking the midpoint of a frustum diagonal, and then just expand the bsphere by each frustum corner just to make sure.

But for the general problem of computing a bounding sphere for a set of points cheaply, google "Welzl's minimum bounding sphere".

Some other links to explore:

* View Frustum Culling Tutorial (http://www.lighthouse3d.com/opengl/viewfrustum/) (Lighthouse3D)
* Frustum Culling (http://www.flipcode.com/archives/Frustum_Culling.shtml) (Flipcode)
* Welzls Minimum Sphere Algorithm Demystified (http://www.gamedev.net/community/forums/topic.asp?topic_id=487375)

ugluk
07-25-2010, 12:26 PM
fps is non-linear. Use ms/frame. Says you're going from 111 ms/frame (no culling) to 50 ms/frame.


The problem is my that my timer is crap. It gives different results every time. I'd need to run some kind of average. Do you know of a good timing source for Linux? Anyway, the fps's I give here are just inverted timing values, i.e. 1/(time_between_the_frames_in_seconds), so I don't see a problem with them.



That's one. And by frustum I assume you mean culling against 6 planes that define the frustum, right?

So what are some other permutations?

1) frustum - tile-sphere, and then if it passes do frustum - tile-AABB
2) frustum-bphere - tile-bsphere
3) frustum-bsphere - tile-bsphere, and then if it passes do one or more of the other more expensive tests (frustum - tile-bsphere, frustum-bsphere - tile-AABB, frustum - tile-AABB).


Yes, by frustum I mean testing against all the 6 planes. As far as other permutations are concerned; I first calculate the set of tiles that fit inside a frustum AABB, and then check them all against the exact frustum, just to be sure they really are within and not just within the frustum AABB. So the frustum AABB already provides a good starting point. I suppose I need to test every possibility and arrive at a decision based on the tests.



It depends on the expense of the operation you're trying to protect and the expense of protecting it. That depends on more than just your CPU and GPU perf.


On newer cards I get more fps, if I don't do any exact frustum-tile AABB overlap tests at all, the cards are so fast, they don't mind clipping. But on older cards this is not the case. It would be interesting to find out, if the old card just isn't capable of gulping down all the geometry coming in at that resolution, despite the cull.

On a new card, I could do instancing, alas.



For simple symmetric frusta, I think you can usually get an exact answer by taking the midpoint of a frustum diagonal, and then just expand the bsphere by each frustum corner just to make sure.


This reminds me of the Ritter sphere algorithm. I kind of doubt it this gives the minimum sphere, as the sphere center is limited to the diagonal and there are more than 1 frustum diagonals, but only one minimum sphere is possible.



But for the general problem of computing a bounding sphere for a set of points cheaply, google "Welzl's minimum bounding sphere".

Some other links to explore:

* View Frustum Culling Tutorial (Lighthouse3D)
* Frustum Culling (Flipcode)
* Welzls Minimum Sphere Algorithm Demystified


One might argue, that Welzl's algorithm is not about calculating the minimum sphere, but to find out which points define it. There are up to 4 of them and if one knew in advance which 1 to 4 of the 8 were "right", then one would not need to run the Welzl's algorithm, but could directly calculate the minimum sphere.

Maybe I could make 1 experimental run of Welzl's algorithms, see what points it picks and then designate those points as always "right".

I'll just have to do some benching, I suppose.

PS: frustum - tile bsphere test also involves 6 planes and is not much cheaper than frustum - tile AABB test, I think.

ZbuffeR
07-25-2010, 05:01 PM
fps is non-linear. Use ms/frame. Says you're going from 111 ms/frame (no culling) to 50 ms/frame.
Anyway, the fps's I give here are just inverted timing values, i.e. 1/(time_between_the_frames_in_seconds), so I don't see a problem with them.
A little digression to precise the problem with FPS versus mulliseconds, say you have 3 sequential parts in a frame rendering :
A + B + C = total frame render time
Let's say :
1) 10ms + 3ms + 5ms = 18 ms = 55fps
But A is very variable, depending on the frame. In situation 2 you see instead :
2) 1ms + 3ms + 5ms = 9 ms = 111fps
Now you optimize C, reduced to 3 ms.
1) 10ms + 3ms + 3ms = 16 ms = 62fps (+7fps,+13%)
2) 1ms + 3ms + 3ms = 7 ms = 167fps (+112fps, +50%)

So you end seeing different rendering variations, whereas looking at the ms you would see in both cases the more intuitive "render time reduced by 2ms".

(of course parallelism between GPU and CPU operations makes the matter even more complicated)

Dark Photon
07-31-2010, 04:01 PM
The problem is my that my timer is crap. It gives different results every time. I'd need to run some kind of average. Do you know of a good timing source for Linux?
(Sorry for the delay. Away for the week.)

For Linux, especially for timing something as course as a whole frame, clock_gettime(CLOCK_MONOTONIC) is good. If you're not running NTP, then gettimeofday() is also a good choice:



timeval tv ;
gettimeofday ( &tv, 0 ) ;
double sec = tv.tv_sec + tv.tv_usec / 1000000.0;

Call before and after each frame, subtract, and then just average these delta times over a half a second or so and update the time you print then (every half second) -- e.g. "10.5 ms/frame". That way the result is readable.

The precision of gettimeofday() is 1 us, and nowadays on any half-recent kernel and half-recent x86 CPU, the accuracy should be 1 us too (from what I gather).

The only undesirable thing about gettimeofday() is that if you're running NTP or something which dynamically "monkeys with your system time" to keep it in-sync with an external clock base, the returned time can jump around a little. In these cases, it's better to use clock_gettime(CLOCK_MONOTONIC), which is guaranteed to be monotonically increasing (i.e. not "jump around"), and has higher precision as well.

For really precise measurements when timing elapsed time submitting draw commands to the GPU, use ARB_timer_query, which'll actually time elapsed time on the GPU with very fine precision (which is probably what you want).

ugluk
08-02-2010, 09:16 AM
For really precise measurements when timing elapsed time submitting draw commands to the GPU, use ARB_timer_query, which'll actually time elapsed time on the GPU with very fine precision (which is probably what you want).


Yeah and it's portable. Doesn't work with old GLs not having the extension :( How about Windows, what is good to use there? What is the best portable timer, other than the GL extension?

UPDATE:
I've tried clock_gettime() and the results are much better than using clock() alone. They are variable though. I guess, I'd need some kind of running average.