VBO Performance on nv3x hardware

Hello
I am having severe performance problems with a pointbased renderer. All the vertex data (3 floats for position and 4 ubytes for color) is stored within one vbo. furthermore the indices are stored in a separate vbo.
Now the problem:
Everything runs wonderfully for datasets with a few hundred thousand points. but the new dataset that contains roughly 2 million points performs really extremely slow on a geforcefx 5950 with 256 mb ram and presently available drivers… (its even slower than with a geforce3 with 64mb ram…). presently the only workaround is, to use simple vertex arrays instead of the vbos if the dataset size is too large… (this rendering way is much faster than the vbo way if the aformentioned 2 mio cell dataset is used… in cases with smaller sets the vbos are much faster). Another strange thing is, that, if simple gldrawarrays is used the performance with the vbo is just fine again (even with the large dataset). So it simply seems, that rendering primitives from a vbo that contains around 2mio vertices using an index array takes a severe performance hit.
I know, that vbos underlie some limitations (for example, that on an 128 mb r3xx everything larger than 32 mb doesnt go into video memory but agp memory and thus slows down the rendering …) but im quite sure i didnt violate any (alignment should be ok and the vbo size shouldnt be a problem as well since only 24mb are used for the vertex data)
so im wondering, if this is some strange bug in the driver, that stalls some part of the vertex processing pipeline if large vbos are used in combination with index arrays (btw just drawing some 100000 cells from the large dataset with an indexarray is terribly slow as well) and if there is a proper workaround for it.

How are you accessing your vertices with your indices ? Maybe it’s too random ? I would suggest if it’s possible, to write a small FIFO queue for your vertices and to reorganize your triangles to maximize cache usage.

Ex.: if you have 100000 vertices, drawing a triangle with indices 0, 1, 99999 might be slower than the same triangle with indices 0, 1, 2.

Y.

You’ll hit a really slow path if your vertex buffer is greater than 6mb. You’ll have to split your data into multiple vertex buffers and render them separately.

since its a point based renderer using orthogonal projection im drawing GL_POINTS primitives so sorting them for cache efficiency isnt really necessary :wink: (sorry i didnt mention this)
still thanks for the fast reply

and is 6 mb really the maximum size for vbos to be fast on nv3x class hardware?? that sounds awfully small :frowning: (on my radeon 9700pro i was able to use vbos with a size of up to 32mb without any performance hit…)
the problem with splitting the vertexdata into separate vbos is, that the points have to be drawn depth sorted and that was simply easier if everything can be accessed at once :slight_smile:

Silly question: why do you need indices if you’re drawing points?

-Won

Originally posted by Won:
[b]Silly question: why do you need indices if you’re drawing points?

-Won[/b]
the points have to be sorted back to front because they are mostly semi transparent.
using indices is the best way to draw them sorted (otherwise i would have to sort the vertex data and upload it again, which would be a huge overhead and far too slow)