PDA

View Full Version : checkout my attempt to maximize geometry throughput



bobwall
01-25-2006, 06:16 PM
Hello. I've recently wrote a volume rendering program in my graphics class and I think that I came up with something that other people can use.

See www.people.virginia.edu/~yz5d/projects/maximizing%20geometry%20throughput.html (http://www.people.virginia.edu/~yz5d/projects/maximizing%20geometry%20throughput.html)

However, my implementation is not perfect, and I'd like suggestions/comments. One major bug that I can't figure out is why I still occasionally see random triangles get drawn. I know this is due to overwriting a set of shared vertices when it's still being used, but I thought I dealt with that.

dorbie
01-25-2006, 10:47 PM
Your graphics hardware/driver is effectively running asynchronously.

Your lack of a buffer lock means you have to synchronize the rendering with the updates so you don't corrupt volatile data.

It depends how you create the VBO too, but that too has performance implications, e.g. STATIC_DRAW describes useage and is likely to be stored on the card with an expensive lock, storing elsewhere can slow rendering (although PCI Express clears things up a bit). Locking a large VBO is only expensive if it's stored on the graphics card, otherwise it's a memory synchronization mechanism rather than a copy, which is not necessarily expensive, but can appear so depending on exact execution order, ultimately it should work itself out unless you're doing something inadvisable w.r.t. updates.

Clearly you need to sync because you have a real mess there w.r.t. asynchronous updates and VBOs are the mechanism for that. You just need to tell the driver how you're using the VBO so it can store it in the appropriate memory and manage synchronization. Your job is to avoid the stall by doing something else interesting after you issue the draw instead of something silly like immediately screwing around with the buffer data. Ultimately you have no direct control over the number queued commands other than to stall and stalling on a lock is better than corrupting data, heck it's even advisable but if you want to reduce the likelyhood then create multiple buffers (a glFinish alternative would be a sledgehammer by comparrison and inevitably more expensive than a lock).

If you don't use VBOs there can be hidden penalties like memory copies that you don't see but the driver has to do because it must assume application data is volatile to avoid exactly the kind of problems you've created with your trick.

bobwall
01-26-2006, 04:27 AM
Oh, I forgot to mention, I fixed the flickering problem with a quick fix:
if (best_index == -1 || (first && writing_shared_vertices)) in OpenGLRenderer :: Allocate, but I'm not sure how bad it is for performance.

Hey dorbie, I think you're misunderstanding my design. If you looked at the code, I do lock/unlock the VBO before I use it.

The problem I think is happening is as follows:
1. primitive buffer is flushed when attempting to fill indices.
2. some more indices are filled
3. request to set new shared vertices hoses previous shared vertices before the indices using them in #2 gets flushed.

But I guard against this problem by not considering the buffer that contains active shared vertices when choosing the best fit. The problem goes away when I don't map the buffer with discard.

BTW, what are w.r.t updates?

Jan
01-26-2006, 11:44 AM
w.r.t. == "with regard/respect to"

Internet Acronyms Dictionary (http://www.gaarde.org/acronyms/?lookup=wrt)

Jan.

bobwall
01-27-2006, 05:46 PM
I fixed the problem of random triangles being drawn. This was due to the grievious mistake of carrying shared vertices which should never be done. New source posted too (not optimized though).

bobwall
02-04-2006, 10:33 AM
I found that my previous test program spends a lot of time filling vertices/indices.

I've revamped the test program by minimizing the fill time, and the triangle/sec count has now more than quintuppled across the board for uncached geometry!

bobwall
02-08-2006, 06:54 AM
I added several forgotten dependencies to the test program and made it easier to compile on Linux. Sorry if you couldn't compile it earlier.