Texture buffers are much slower than uniform buffers

glnoob · November 11, 2018, 4:34pm

Texture buffers are nice because they allow an enormous size of random-access data. I have been using them to store object 4x4 matrices persistently in video memory, in order to reduce the amount of data being fed across the PCIE bridge each frame.

However, texture buffers appear to incur a severe performance penalty and are much slower than uniform buffers, which do not allow nearly asmuch storage space. I believe shader storage buffer objects (SSBO) are just a different interface for the texture buffer interface, and they appear to operate at the same speed.

Is there any type of memory or buffer that allows random access and provides lots of storage?

GPU is a GEForce 1080 GTX.

Alfonse_Reinheart · November 11, 2018, 5:17pm

SSBOs and buffer textures do “allow random access”.

I have been using them to store object 4x4 matrices persistently in video memory, in order to reduce the amount of data being fed across the PCIE bridge each frame.

Um… how does that work, exactly? If the CPU needs to update those matrices, then it needs to update those matrices. The bandwidth usage isn’t caused by whether they’re “persistent” or not; it’s caused by the CPU needing to update the data. So unless you’re using GPU processes to update the data, then you’re still using bandwidth.

However, texture buffers appear to incur a severe performance penalty and are much slower than uniform buffers, which do not allow nearly asmuch storage space.

How much is “severe”? And what is the difference in how you’re using/uploading to them? Are you properly double-buffering you texture buffers, or are you overwriting data that is potentially in-use by them?

glnoob · November 14, 2018, 6:01am

The idea is that 99% of objects in a scene are not in any motion at all, so that data can just be left in a texture buffer on the GPU and accessed with an integer ID. If you have hundreds of thousands of objects this can make a big difference. I got better results by uploading the IDs in a uniform buffer and leaving the matrices in a texture buffer.

However, on the AMD R9 200 I am testing with this seems much slower than just uploading the matrices in a uniform buffer, at least with a small set of objects.

I am not doing any double-buffering. I guess this means you would use two buffers and switch back and forth between them each frame?

Alfonse_Reinheart · November 14, 2018, 6:30am

The idea is that 99% of objects in a scene are not in any motion at all, so that data can just be left in a texture buffer on the GPU and accessed with an integer ID.

So long as the moving/not-moving objects either use separate buffers or use rigidly separated sections of the same buffer, fine. But making changes to a buffer that is in use is almost always a bad idea. At least, not without some form of multiple buffering.

When you make any (non-persistent-mapped) changes to a buffer, if those changes manipulate regions of the buffer that prior rendering commands are using, then the implementation must impose some form of synchronization. That is, the GPU (at least) must wait until those prior commands have completed before doing the transfer. That’s bad.

Multiple buffering, whether via 2 actual buffers or with two regions of the same buffer, allows you to build rendering data for the next frame while commands issued by the previous frame can still read their data. This keeps the implementation from having to synchronize things. And persistent-mapped buffers can make that even faster and more guaranteed, since the implementation can’t do synchronization (so you have to do it yourself).

glnoob · November 14, 2018, 1:00pm

Thanks for the tips! There are some strange little nuances I am coming to grips with. I get much better performance if I do as few of buffer updates as possible (one per frame per buffer). I have double-buffering enabled and I’m alternating data buffers each frame, but I don’t see any change in performance from that.