glMapBufferRange() behaves strangly

saski · February 2, 2011, 12:37pm

Hi folks,

I’ve experienced some odd behaviour using glMapBufferRange() on GF9600GT (GL 3.3) hardware with recent drivers.

I’ve created a pretty large IBO (approx 30000 elements).

to speed mapping of parts of the IBO range (usually 50-100 elements) up I use glMapBufferRange().

Interesting thing is that passing the flag
GL_MAP_INVALIDATE_RANGE_BIT
causes a massive (~99%) framedrop while my system is under some CPU stress, like a GCC job (all cores). This doesn’t sound odd so far but when I pass
GL_MAP_INVALIDATE_BUFFER_BIT
there is no noticable framedrop at all under same conditions!

Under no stress the CPU load and game framerate doesn’t change between both flags.

The Specs state that GL_MAP_INVALIDATE_RANGE_BIT throws away just the requested range of the buffer while GL_MAP_INVALIDATE_BUFFER_BIT invalidates the whole buffer.

So shouldn’t INVALIDATE_RANGE_BIT be faster?

Could someone shed some light on the issue?

I’m confused:(

mhagain · February 2, 2011, 12:43pm

It’s often quicker to just give you a new chuck of memory and throw away the old when it’s no longer being used; allocating memory is a cheap and fast operation with more or less fixed overhead irrespective of how much.

When you’re invalidating a range however your driver will have to perform some additional acrobatics, as the rest of the buffer could in theory still be in use. This is what causes the performance drop.

saski · February 2, 2011, 12:49pm

Ok, I see your point. However this is not what some coders would have expected - especially such a huge performance impact under CPU stress since IBOs should be handled by the GPU.

Btw. what about memory fragmentation over time? does the GPU driver usually takes care of that?

Regards
Saski

mhagain · February 2, 2011, 1:07pm

I’d assume it’s a non-issue but I don’t know any technical details of how drivers would handle it. This kind of usage pattern was very common with vertex and index buffers in D3D9 for example, so it’s been something that GPU vendors will have been aware of and tuning for since at least 2002. But that’s a guess.

The huge performance drop you’re seeing when your CPU is also busy elsewhere suggests that your driver is also going through a software emulated path when you MapBufferRange with GL_MAP_INVALIDATE_RANGE_BIT. Also a guess though.

Incidentally, and as a general rule, updating a resource (or partial resource) that’s still in use, then using it again, then updating it again, and so on is a performance killer with other resource types too; it’s not just buffer objects.

Alfonse_Reinheart · February 2, 2011, 1:12pm

When you’re invalidating a range however your driver will have to perform some additional acrobatics, as the rest of the buffer could in theory still be in use. This is what causes the performance drop.

It isn’t so much acrobatics as just what it would usually do if you didn’t set the INVALIDATE_BUFFER_BIT at all. Basically, if you’re rendering from the same patch that you’re updating to, INVALIDATE_BUFFER_BIT may not be helpful to you.

Then again, it might be. It really all depends on the driver and the implementation. It’s possible that some drivers will allocate a piece of memory for you to write to, then DMA it up asynchronously later. Of course, if you’re going to immediately use it after unmapping the buffer, it’s still going to be slower since the GPU will have to wait for the DMA.

Btw. what about memory fragmentation over time? does the GPU driver usually takes care of that?

It will take care of it in the sense that there’s nothing you can do about it one way or another GPU memory management is done at the behest of the driver (and in more recent OS versions, the OS itself). There’s not much you can do one way or another.

However, if you do find that you are deeply concerned about memory fragmentation (and the only legitimate reason you should be is if you have actual profiling data that tells you that memory fragmentation is a problem), you could always just allocate 2x the buffer object space and switch between the two halves. It’s like double-buffering, only you have to do it manually.