System memory use of client side vertex arrays and VBOs

I have a 600MB vertex array (40 million points) that I normally render with a single DrawArrays call.

[ul]
[li]If I split the call into small 64k batches by adjusting the first index parameter DrawArrays(POINTS, first_index, 65536) then dramatically less system memory is used under both NVIDIA and ATI drivers. This makes sense, as there’s probably a window or temporary buffer that the driver reads the vertices from. Maybe the vertex data has to be in a correct internal format for alignment purposes?[/li][li]I don’t see the high system memory use when using static VBOs (this makes sense as the entire array is stored on the card)[/li][/ul]

I run close to Windows XP’s 2GB address space limit when I use client side arrays with this data and a 32-bit executable. Should I be manually managing the batch size on this platform? Should I rely on the driver instead?

I think you shouldn’t rely on client side storage anyway. If you’re targeting GPUs with at least 1GB of VRAM and you don’t approach the 400 MB left with other stuff, why would you bother storing 40 Mio. points in client memory?

I wish I could target GPUs with 1GB of RAM. Only client side arrays must be available, and this will run on integrated graphics in most cases.

Hmm, you could always try to use a dynamic VBO, upload a portion of data that fits the available memory, render and update. You would have to benchmark if this approach runs fine in your case and if the implementation your running on handles VBO data transfer differently from vertex arrays.

I’d try this: chunk-up into 600 1MB static_draw VBOs. The implementation should handle swapping them in and out of VRAM, if there’s not enough vram. This will make the driver consume 600MB in sysram all the time (the driver is supposed to keep such VBOs’ data intact all the time), but will let the driver do the memory-management and acceleration for you.