Primitive kill or Conditional Arrays

This is an idea I’ve been toying with for a while, but I’m not sure the best form for it. So I’m looking for suggestions.

One of the big performance tips for modern hardware is to minimize the number of draw calls. This means aggregating objects of similar state into big arrays which can each be bound and rendered whole.

The problem, of course, is that if such an array contains 1000 relatively static objects, only 200 of them may be visible at any given time and it could be wasteful or even visually wrong to draw them all.

The following work-arounds seem to work:

  1. Determine if/when it’s cheaper to iterate over the 200 visible objects vs. drawing the whole buffer (note: some objects may intentionally be “turned off” and must be “zeroed out” when drawing the whole buffer).

  2. Rebuild and repack these buffers frequently, so invisible objects get “garbage collected” out of the aggregated arrays.

  3. Use glMultiDraw and trick the count array into drawinig 0 elements for invisible prims or objects.

None of these are optimal for a number of reasons. glMultiDraw, for example, can be a pain when you have lots of prims in each object or visibility group – you’d want one bit to control the entire subset.

And though VBO may minimize the cost of using more/smaller buffers, it doesn’t seem to solve the problem of minimizing the overall number of draw calls–unless I’m missing something obvious.

There are a lot of ways to approach this, but I’ll start with a simple one: an array of count/conditional pairs that can be bound or passed along in an analog to glDrawElements.

The counts (prim, vertex, or index) could be derived from object or group totals and the conditional could be as simple as an on/off bit, perhaps both of type short by default. The idea is to keep it simple and fast for the driver to iterate and for the app to adjust as needed.

Where it gets even more interesting is when the conditional part of that array can be tied to driver or HW-side tests, such as occlusion results or even user-defined server-side bounding volume tests, if future openGL versions supported such a thing.

Caveat: if the app is sending dynamic vertices each frame, then it’s better to do repack the data on the fly than use something like this. This discussion is mainly for the case of relatively static verts where they’re not all visible all the time, such as, say, a dense city.

Thoughts?

Avi
www.realityprime.com

Draw calls in OGL are fairly inexpensive, so reducing their number is much much less of an issue than in (say) D3D.

Originally posted by al_bob:
Draw calls in OGL are fairly inexpensive, so reducing their number is much much less of an issue than in (say) D3D.

Define “fairly.”

I’ve conducted tests with large aggregated buffers (in this case, static AGP and video resident vertices) and I can get up to 25% more tris/second by drawing a buffer whole vs. iterating per object. It was a while ago, but I recall 10000 objects, 12-100 shared verts each, GF3, using glDrawElements.

Doing the iteration may or may not have as big an impact if you’re T&L bound, but it makes a huge difference if you’re CPU or API limited.

Avi
www.realityprime.com

[This message has been edited by Cyranose (edited 08-28-2003).]