PDA

View Full Version : VBOs and culling?



sammie381
08-06-2009, 06:18 AM
I would like to use VBOs and culling together, however I don't know the best way to do it. I was thinking about using
frustrum culling because the camera's view could change every frame. And this is not a terrain (just general polys),
so I can't store my data in tiles. I have to cull it per triangle.

Or if someone could point me to an coding example that uses VBOs and culling, that would be ok.

zeoverlord
08-06-2009, 10:00 AM
well the thing is that your graphics card does that automatically.
I would suggest that you do something called macro culling, it's basically the same except you do it on objects and not triangles.

the way you do it is pretty simple, essentially you take the bounding box vertics and pass them trough the projection+modelview matrix, then remove anything that is fully less than 0 or larger than 1 on any axis.

sammie381
08-06-2009, 01:35 PM
well the thing is that your graphics card does that automatically.
I would suggest that you do something called macro culling, it's basically the same except you do it on objects and not triangles.

the way you do it is pretty simple, essentially you take the bounding box vertics and pass them trough the projection+modelview matrix, then remove anything that is fully less than 0 or larger than 1 on any axis.

Thanks, I know OpenGL does its own culling, but my intention is to cut down on the number of triangles I send to the GPU,
since memory is limited. The difference could be huge.

I suppose I have to create some sort of space partitioning structure of bounding boxes.
The scene is static, only the camera is moving. My only concern is that the VBO will be constantly updated every frame? I suppose that's ok?

Ilian Dinev
08-06-2009, 02:09 PM
You won't be updating the static VBOs every frame!!!
Never do back-face culling on the cpu!

Here's how modern graphics works: you have a VBO for each mesh (1-6000 tris). You upload those meshes on startup (OpenGL will send them to VRAM). Then, on every frame, you tell the gpu: "use this texture, this shader, and draw this whole mesh". Notice "whole mesh". The gpu will transform+cull the mesh so fast, that your cpu-based culling would have computed only 1%-10% of the triangles in the same time, not to mention the PCIe bandwidth you'd waste to send the geometry would be making the game even slower.
Modern gpus can compute and cull 300-700 million triangles/s . On a 3GHz cpu, you can do less than 5% of that.

Of course, it won't be nice to tell the gpu to draw the _whole_ scene. You'd want to keep some of that processing power invested in the fragment-shaders. Thus, segment your scene into medium-sized meshes, of 5000-60000 triangles, and do frustum-culling on them. Or even better, group objects in octree-like fashion, where you can cull whole collections of such medium-sized meshes with just one frustum-culling test.

sammie381
08-06-2009, 02:24 PM
Thanks, but as I mentioned, memory is limited, which forces me to do my own culling. And if I add culling, then I don't see how I can take
advantage of static VBOs. I will need to update them every frame. And the number of triangles is over a million.

dletozeun
08-06-2009, 05:28 PM
Thanks, but as I mentioned, memory is limited, which forces me to do my own culling. And if I add culling, then I don't see how I can take
advantage of static VBOs. I will need to update them every frame. And the number of triangles is over a million.

What you need is not backface culling, as others said it is already performed by the hardware so don't waste your time doing it yourself. What you need is frustum culling and it is what zeoverlord said. I think you will easily find some papers on "frustum culling", "occlusion culling" or "contribution culling".

sammie381
08-06-2009, 06:26 PM
Thanks, but as I mentioned, memory is limited, which forces me to do my own culling. And if I add culling, then I don't see how I can take
advantage of static VBOs. I will need to update them every frame. And the number of triangles is over a million.

What you need is not backface culling, as others said it is already performed by the hardware so don't waste your time doing it yourself. What you need is frustum culling and it is what zeoverlord said. I think you will easily find some papers on "frustum culling", "occlusion culling" or "contribution culling".


Read my first post, I never mentioned it. :)

Ilian Dinev
08-06-2009, 06:39 PM
If VRAM is the problem, then fret not - it's OpenGL's task to dynamically stream those meshes on demand. SysRAM always keeps a copy of each static VBO.
If you don't trust the GL implementation to nicely manage the vtx-data, you can use a streaming-VBO to do fire-and-forget streaming.
Doing the backface culling on cpu is generally silly, even if you're short on VRAM and RAM, and even if the gpu is rather old. (unless you work at InsomniacGames/NaughtyDog)

You can always partition static geometry nicely, to make those triangle-groups of 1k-60k tris, and frustum-cull whole groups.

A cpu simply cannot process 1+ million tris every frame at 60fps, so forget backface-culling and let the GPU pull data via DMA.

P.S. we're mentioning per-triangle backface-culling on cpu as being bad enough, because frustum-culling each and every triangle on cpu is even worse (I thought it's obvious).

_NK47
08-07-2009, 03:15 AM
if memory is an issue you can use bounding spheres consisting of 4 floats (position, radius) rather then OBB to perform frustum culling. while they are processed faster they generally lack good fit for meshes. additional thing is a scene graph to mimic relationship between objects where a parent node contains a whole bounding volume for its child nodes. if parent node isn't seen you can cull the whole tree at once.

Note that every graphics programmer actually deals with 2 processors at a time namely CPU and GPU. try to avoid doing things on the CPU what the GPU does a whole lot faster.

sammie381
08-07-2009, 01:27 PM
Thanks, I think I'm gonna have many problems with this one. :( I can't load the whole scene in system memory or GPU memory.

Really need a lightweight loader, renderer, everything. :)

zeoverlord
08-09-2009, 04:19 AM
in that case you have to look into dynamic streaming of your data and various lod techniques, maybe you could look into if you could reuse some of the data.

The VRAM can hold a pretty impressive amount of data so if your running out of space it must be huge.

sammie381
08-10-2009, 02:29 PM
in that case you have to look into dynamic streaming of your data and various lod techniques, maybe you could look into if you could reuse some of the data.

The VRAM can hold a pretty impressive amount of data so if your running out of space it must be huge.

Well, I think most of my memory problems comes from the fact I have to do some pre-processing before I can load the data
into GPU memory. The data on file is not in a GPU-friendly format. It's just hard to avoid unnecessary memory allocations
when you're tring to re-arrange and sort the data into something more manageable. :)

For example, with 1GB of GPU memory, I think I can get about 10 million triangles into memory no problem.
With 3 vertices/normals/uvs per triangle, that's like 30 million verts/normals/uvs. 12 bytes per vert/normal,
8 bytes per uv, comes out like 30*12 + 30*12 + 30*8 = 960 mil bytes.