glPrioritizeTextures, ...in practice

Let’s say you have some FBO render targets (textures). What’s the best way to make sure the driver keeps them hot in GPU memory and does “not” try to swap them off the board?

Got a problem where on one vendor’s driver, the driver is (apparently – a guess:) kicking the shadow maps off the board and dropping performance down to 1Hz. Delete and realloc the render targets at run-time (nasty kludge; hangs for a bit and breaks frame; unacceptable), and the frame rate is nominal for a little bit…but it’ll drop drop back down to 1Hz again later. Rinse, repeat.

What I really want is to lock the texture(s) in GPU memory, and come hell or high water, they’re not moving. Even if GPU memory gets tight.

Back in the day, glPrioritizeTextures() might have been an option. But I’ve always heard this is a no-op. Besides that, it was deprecated in GL3.0 and removed in 3.2 core, so that pretty well suggests it was/is a legacy no-op. Right?

Looking for other options… In NVidias GPU Programming Guide (GeForce 8 edition, the last one), they allude that textures created earliest are least likely to be bumped off the board, suggesting it’s all about creation order.

Going to try this. Can anyone confirm this behavior? Is this just a feature of NVidia’s drivers? Are other vendor’s using this too?

Really want to lock the texture(s) in GPU memory… But how?

Far as I’m aware texture prioritization was never anything more than just a hint to the driver.

What I’d try doing is drawing a 1x1 pixel somewhere in the bottom-left or other corner of the window, using each texture and each frame. Just pick a region that’s going to be overdrawn by something else (GUI elements or whatever) later on that frame; that should keep the textures “hot”.

If there’s no such available region, draw it on something else appropriate; the point is to just make sure that the texture is used every frame so that the driver thinks it needs to be kept resident.

That loading order seems a mite influenced by D3D, by the way, where the general recommendation is to load default pool resources before managed pool resources (under D3D default pool resources are always in “driver optimal” memory whereas managed pool resources may be swapped in or out as required). The sorting bit seems slightly dubious (but presumably the author has a reason for it), but I don’t see how it could prevent a render target from being swapped out under OpenGL if the driver’s resource management scheme determines that it should be swapped out.

Whooo… I know there might have no other possibilities but having to do such a thing should make people ask for something new to the ARB. Or am I the only one thinking this way ?

Here’s the thing. The big shadow map array texture that’s being kicked off (or moved someplace where it is darn inefficient to access) is already being rendered to (written) and read every single frame, many, many times!

So frequency of use being read or written is apparently not a criteria being used (best guess given the evidence I’ve got).

Initial results indicate that just creating your render target textures first does not shield them from this problem.

After creation, I even bound them to an FBO and did a glClear() to try and force them to be allocated on the GPU.

Still getting skippage with the shadow maps apparently being kicked off the board when GPU memory gets tight, even though they’re probably the most used texture (reads and writes) every frame.

Any ideas?

Any info reported by debug info extension?

Do you want to say the vendor/card/OS and driver version?

How big is this texture (memory size wise - and have you verified that it is what you think?) and how much memory is on the card?

You mean ARB_debug_output? I haven’t seen that in a driver yet.

Do you want to say the vendor/card/OS and driver version?

NVidia, GeForce GTX285 1GB, Linux, 260.19.44.

Trying to kick these to the curb and go 1.5GB GTX480s or better, but not quite there yet.

How big is this texture (memory size wise - and have you verified that it is what you think?) and how much memory is on the card?

The shadow map (2D depth texture array) in question is 64MB, only 6% of the total GPU memory (1GB). However, if the driver is selecting the biggest fish to kick off the board, this texture is definitely the biggest. Have re-verified the size several times, analytically and empirically.

Wish I had some insight into what the driver was doing, and how I can influence it.

You mean ARB_debug_output? I haven’t seen that in a driver yet.[/QUOTE]
you have to create a debug context, then it is available (at least since the r280 drivers). starting with the r285 drivers the extensions reports useful information, but i did not yet see performance issues being reported regarding preemption of textures and buffers.

It is available since R259.09, at least for Windows. There is no reason no to be supported in R260 for Linux.

Do you remember NVX_gpu_memory_info extension? Why don’t you keep tracking what is going on with the memory consumption. Unfortunately, NVX_gpu_memory_info doesn’t have an ability to retrieve the largest free memory block (like AMD does). Maybe you have a huge memory fragmentation and driver decides to evict the largest block. This is just a guess.

As well as the 64MB allocated for the actual depth data, there could also be extra space being allocated for early z testing, eg. a tip from ATi’s Depth-In-Depth white paper:

Use as few depth buffers are possible
Something to keep in mind, especially on pre-HD 2000 series hardware, is that HiZ utilizes an on-chip
buffer to store some of its info. This buffer is limited, so if you create a lot of depth buffers, some may
not be able to use HiZ. It’s recommended that you create your most important depth buffers first, and
keep the number of buffers to a minimum. Reuse as much as you can. If you’re doing most of your
drawing to render targets (rather than the backbuffer) it may be better not to create the backbuffer depth
buffer with the device. If you still need a backbuffer depth buffer, create it later to avoid having it take
HiZ space from other more important depth buffers. With the Radeon HD 2000 series, HiZ can use offchip memory, so more depth buffers can be used than before.

I’m not sure how much extra is allocated, I guess they could make reasonable use of about an extra 1.5X times the depth texture size.
This shouldn’t be enough to slow it down to a crawl though, just prevent you from using HiZ if memory is tight, but maybe it’s just the combined size it has a problem with. Does splitting it up help? Does it have similar problems with large color texture arrays?

I don’t think you’ll really find a proper solution without help from the vendor.

Ah! I will have to try that later – thanks. I’ve always treated glxinfo as the final word on what extensions are supported, but apparently it’s not so.

…but i did not yet see performance issues being reported regarding preemption of textures and buffers.

Oh well. I’ll still give it a shot.

Actually, I do. I love that extension, especially the evicted number. Finally a single metric that says when you’re blowing past GPU memory capacity!

As I mentioned earlier, GPU memory in this scenario is tight. From this extension, I know that GPU memory consumption exceeds GPU memory capacity by ~50MB (5%). However, I know that my working set size is a good bit smaller than GPU memory. As evidenced by my deleting and reallocating the shadow map at run-time temporarily clearing up the performance problem.

I really would just like a way to prioritize up my render targets so they’re the last thing that’d ever be kicked off the board. The driver’s free to move them around to defragment memory. But it should avoid kicking them off altogether like the plague!

Unfortunately, NVX_gpu_memory_info doesn’t have an ability to retrieve the largest free memory block (like AMD does). Maybe you have a huge memory fragmentation and driver decides to evict the largest block. This is just a guess.

You could be right there. Though I’m not deleting anything (only creating), who knows how the driver is allocating memory.

Long term solution is of course to prune back the database into not using more memory than exists on the GPU (on top of the engine’s needs). But you know how it goes with developers and artists – the latter are often not reporting to the former, and occasionally get a little wild with the size of their art assets :slight_smile: So, just looking for ways to make this kind of situation degrade gracefully when it happens (versus right now, where perf goes from excellent to totally unusable [a few Hz]).