VBO+Vertex Arrays: AGP Memory access speed

Hi,

I am currently considering to add VBO support to my metaball routine. That is, at each cube of the grid, i am appending a small amount of vertex and indices to a stream that will be rendered once. I think STREAM_DRAW is the correct usage hint to adopt, and mapping buffer as WRITE_ONLY is correct too.

The matter is accessing the memory of the buffer object within my routines. Considering the caracteristics of AGP memory, i should write only contiguous blocks of memory to take advantage of AGP write combining mechanism, as that area os memory is uncacheable.

If I use normal vertex arrays, without any stride, in the same memory buffer, i will have to write a small amount of data there and there at each iteration of the algorithm, which will slighly decrease performance. A solution would be to use interleaved arrays, but the format has to match a internal format of the video card to get optimal performances… And i still would have 2 areas of AGP memory to access at each iteration: the indice array and the element array.

On the other side i could append my streams in central “CPU-cacheable” memory, as it is done now, and then let the driver copy data, which means, i have no potential performance gain by using VBO compared to Vertex arrays…

Any tips, ideas to get maximum performance?

regards,

A solution would be to use interleaved arrays, but the format has to match a internal format of the video card to get optimal performances… And i still would have 2 areas of AGP memory to access at each iteration: the indice array and the element array.

There are 2 types of interleved arrays. One of them is to use an explicit interleved format. That is, you use the interleved functionality in the vertex array specification.

You should probably avoid that because it’s a lot of trouble, and completely unnecessary. Instead, you should use the stride to create implicit interleving. That is, you have to set up the stride of the arrays such that your vertex attributes are interleved. This is hardly difficult, and is the common method for interleving vertex data.

In any case, as long as your vertex data doesn’t contain formats the hardware can’t handle (floats for everything, but colors can be in a uint32) you should get all the performance you could.

The stuff about using particular vertex formats for optimial performance is, generally, no longer true with modern drivers.

On the other side i could append my streams in central “CPU-cacheable” memory, as it is done now, and then let the driver copy data, which means, i have no potential performance gain by using VBO compared to Vertex arrays…

Actually, depending on how VBO’s are implemented with STREAM_DRAW, this could be slower than straight vertex arrays.

I am coming to the conclusion that i can append my indices to central memory, and append my vertex data to a stream in agp memory by mapping the buffer directly. This way i can prevent an extra data copy for vertex, while letting the driver do it’s job with glDrawElements.

Perhaps it’s not a too bad solution i guess.

thanks for hints anyway
regards,

Data from mapped buffer is transfered to AGP mem as soon as I call glUnmapBuffers, am I right?
And then if I mapped buffer to pointer, can I change the returned adress of this pointer, so that it points to existing array?

If you map a buffer, you’re likely to be writing TO the AGP memory itself.

Therefore, align your writes on at least 64 byte boundaries, and write at least 64 byte chunks at a time (also, rounded up).

Yes, it’s faster to write a few more 0 bytes to pad up to 64 bytes, than it is to not write those last few bytes; especially if you make a small write (on Pentium III, use “32” instead of “64”).