Out of memory problem...

UT666 · July 6, 2006, 7:31am

hello,

i need some experts to shine some light on the following problem i am having.

I have a volume rendering application (how popular are these nowadays ) which is rendering the following data,

512x512x512 RGB texture.

I upload this as GL_RGB, so thats 24bit per voxel right?
So the dataset is 384mb right?

I then use a 1x256 lookup table to compute the alpha values on the fly in a shader.

Am I correct in saying that a 512mb graphic card should easily be able to handle this?

My problem is that when uploading the 3d texture, it will sometimes fail. GL_ERROR reports an OUT_OF_MEMORY execption.

This problem only occurs on a Nvidia Quadro 4500FX (drivers 84.21), however on my home 7900GTX (drivers 91.31) i have no issues what-so-ever, never get the out-of-memory issues.

what is going on? bad drivers? mem fragmentation?
also how can i tell how much resources on the GPU my shader program will take?

many thanks, sorry for the sooo many questions, but i am out of patience with this problem.

ut.

Powerangel · July 6, 2006, 8:25am

I upload this as GL_RGB, so thats 24bit per voxel right?

No. Because of memory alignment RGB textures are converted to RGBA when uploading to GPU memory.

So the dataset is 384mb right?

No. Because of 1) it’s 512MB

Am I correct in saying that a 512mb graphic card should easily be able to handle this?

No, because of 1) and 2)

what is going on? bad drivers? mem fragmentation?

No. The 3D texture does not fit into GPU memory (because the GPU memory is used for the framebuffer and other stuff too), so the only way to render the volume for the GPU is to render it from AGP/PCIe memory.

also how can i tell how much resources on the GPU my shader program will take?

You can’t, since NVIDIA won’t tell you how they store shaders in GPU memory.

Klaus

Relic · July 6, 2006, 8:55am

Hey, you copied my answers.
This is the deleted thread.

Powerangel · July 6, 2006, 8:59am

No, the other one is the deleted thread

Klaus

UT666 · July 6, 2006, 9:32am

thank you very much Klaus & Relic for the replies, this clarifies some of my misunderstandings

damn, i thought having split the data as RGB + a lookup table for Alpha would have reduced the mem requirements, but as you point out it doesnt.

How would you approach this then, to get the highest resolution 3d texture possible on a 512mb card?

I can think of four options,

try some texture compression… although will have to be careful with lossy methods, and the success of this will be dependent on the input data.
try adding support for non-power-of-two textures, so i can have some in-between resolutions for 256^3 and 512^3 resolutions. do non-power-of-two textures work with texture3d? never tried them.
Reduce the number of bits used for the datatype… (yuck)
Move to a texture memory independent rendering approah, ie: bricks.

ut.

UT666 · July 6, 2006, 9:45am

previously posted by Relic in response to orignal msg:

1.) No. They are downloaded as GL_RGBA8 according to that table: http://developer.nvidia.com/object/nv_ogl_texture_formats.html

2.) No. 512 MB.

3.) No. That’s one continuos(!) block of memory and that’s not easily handable.
PCI-Express memory can hold it as seen on your 7900GTX.

4.) “Nvidia Quadro 4500FX (drivers 84.21), however on my home 7900GTX (drivers 91.31)”
You compare apples vs. oranges. What about the same new drivers?

You mean shader microcode size? Just don’t care.

akaTONE · July 6, 2006, 6:43pm

The simplest solution would be to create three 3d textures, one for each color channel and then combine them in the shader. This allows the driver to, if necessary, throw one of the smaller textures into AGP space. However, since it is three seperate textures that have total byte sizes of 384MB, it will more than likely fit in VRAM and be fast enough.

UT666 · July 7, 2006, 3:51am

thanks akaTONE for ur suggestion.

I now upload my 512^3 RGB data as three GL_LUMINANCE 3d-textures.

Meaning total size on GPU should now be 384mb.

Ive tested this on my 7900GTX with the NVPerfKit drivers (85.96), and I monitor the OGL Vidmem usage and OGL AGP/PCI-E usage.

The driver tells me that i am using 260mb of video memory, and 130 of PCI-E memory. This of course is slowing my rendering big time.

Why is it not storing it all in video memory?? I only require 390mb of memory the gpu has 512mb! Thats 122mb free video ram!

Note my mem figures are a bit bigger then what it should be as i also have 2 x small FBOs (512^2 and 256^2) and 3 x 2d textures (512^2). Also i am using quite a long shader program, which has 5 texture lookups, if statements and a bit of maths …

Is all this “other” stuff i have requiring so much memory and preventing the correct upload in VRAM of my data?

will try and cut out the “other” stuff I have and see if this changes anything…

ut.

Relic · July 7, 2006, 4:09am

You mean you now have three textures with GL_LUMINANCE8 for the RGB channels and two of them are stored in VID and one in PCI-E memory.
So that suggestion worked.
It needs three times the texture lookups, of course this is slower.

You’re not considering that VID mem is used for a lot of other stuff, front, back, depth, multisample, GDI bitmaps, VBOs, etc.

IIRC, there is a switch in the Display Control Panel -> Performance and Quality -> Advanced Settings which says “Maximize Texture Memory”. Give that a try.

UT666 · July 7, 2006, 4:52am

Do you know of any way to check how much memory is used up by this “other stuff” (GDI, etc… ie: external to my OGL app)?

Surely the speed hit i am getting is mainly due to the usage of PCI-E memory as opposed to 3 x texture3d lookups?

ut.

Relic · July 7, 2006, 6:44am

But you said the GF7900GTX could work with the RGBA data before and there is no way that a single 512 MB texture could have been in VID mem, that must have been in PCI-E and you compare against that speed.

As I said before you need continuous chunks of memory and you probably didn’t get enough due fragmentations or I don’t know what.

There is no way to query that AFAIK, and due to possible memory fragmentation the most interesting information would be the biggest continuous free block, not the sum of free or allocated space.

Read Klaus’ websites for advanced methods how to deal with such limitations. Now we’re at bricking again.

tamlin · July 9, 2006, 7:51am

While this may be impractical, could the 512^3 be split up, in e.g. 8 256^3 RGB(implicit A)?

Sure, the sum would still be enough to force the driver to “swap out” vid to PCI-E, but then I suspect the driver would swap out a whole texture not fitting, and be able to replace that exact memory with the one needed. This would however probably require some careful consideration of render order, to let the driver swap in/out textures as few times as possible.

Also, as it would reduce the contigous-memory requirements, it could be enough to fit one such 256^3 RGB(A) (64MB) into a chunk previously not possible to fit 512^3 L (128MB).

++luck;

UT666 · July 14, 2006, 7:35am

Here are some more of my findings;

As said/recomended above i switched to 3 x GL_LUMINANCE 3d textures as oposed to 1 x GL_RGB as the GL_RGB gets converted on nvidia hardware internally to GL_RGBA, resulting in a waste of texture memory. (problem for my big datasets)

However the 3 x GL_LUMINANCE approach still chaches over the pci-e bus, meanign im getting a big slowdown. As described earlier, this is due to either/both overhead of other stuff in memory and/or memory fragmentation, ie: cant get the continious chuncks i need.

Ive also tried using compressed textures,

If I use GL_COMPRESSED_RGB_ARB for the single 3d texture approach, the card does not cache over the pci-e bus, and renders nice and fast, the driver reports my 512^3 datasets using about 70mb of texture memory.

This seems great, however it will sometimes just fail, ie: i just get a white cube, ie: i think no 3d texture in memory…

I have tried this approach on two identical dells (Quadro 4500FX) with the same latest drivers, and this will work erractically, ie: sometimes yes, sometimes no.

On my 7900GTX with latest drivers at home this works fine however.

I have two questions here;

Why will it fail so erractically?
How can i catch if it is failing?

I have also tried GL_COMPRESSED_LUMINANCE_ARB for the 3 x luminance approach, when i check the driver perfmon, for a 512^3 dataset this results in no saving of texture memory, ie: still have ~256mb in texture memory and 128mb in pci-e.

why is GL_COMPRESSED_LUMINANCE_ARB not saving me anything?

Thanks in advance for all the suggestions/help,
ut.

michael.bauer · July 15, 2006, 10:58am

Hi,

I can’t see why there is no better memory management in the graphics drivers and GPUs. A graphics board is not an extremely complex thing, everything is controlled and managed by the driver. Every “malloc” is managed by the driver, so an “out of memory” should not happen that easily. Why don’t graphics boards use virtual memory and paging (which is well known from any real operating system + CPU around).

Even though it’s not as easy as I see it, but the GPU “designers” should take some ideas that are took for granted in other areas (e.g. good old CPUs with MMUs etc.). The bad stability (e.g. ATI+linux) and bad engineering (e.g. horribly designed cooling fans) and wrong policies (e.g. drivers optimized for speed, but not stability) leads to lots of people that don’t take the “fabless firms” seriously.

rgpc · July 16, 2006, 7:42pm

Originally posted by mlb:
[b] Hi,

I can’t see why there is no better memory management in the graphics drivers and GPUs. A graphics board is not an extremely complex thing, everything is controlled and managed by the driver. Every “malloc” is managed by the driver, so an “out of memory” should not happen that easily. Why don’t graphics boards use virtual memory and paging (which is well known from any real operating system + CPU around).
[/b]
Is whatever you are smoking legal? The GPU is the MOST complex device in your PC. They don’t use Virtual Memory and paging because that would make them SLOW. We have GPU’s because we want the extra performance that they provide. Virtual memory is for data that is not frequently accessed - and a 3D texture that is going to be drawn onto the current display is going to be accessed every frame. Even reading it from system RAM is generally going to be too slow, let alone from virtual memory.

Originally posted by mlb:
Even though it’s not as easy as I see it, but the GPU “designers” should take some ideas that are took for granted in other areas (e.g. good old CPUs with MMUs etc.). The bad stability (e.g. ATI+linux) and bad engineering (e.g. horribly designed cooling fans) and wrong policies (e.g. drivers optimized for speed, but not stability) leads to lots of people that don’t take the “fabless firms” seriously.
Lay off the sauce! You obviously have no understanding at all of what a GPU actually is. The CPU designers are playing catchup (dual core? - how many “cores” are on todays GPU’s?). If you don’t like the cooling fan on your card then buy a better brand. If you don’t like ATI with Linux then try nVidia (or maybe a better Linux distribution). Again, a GPU is a performance device - and the majority of users world wide have no issues with stability.

Korval · July 17, 2006, 12:35am

They don’t use Virtual Memory and paging because that would make them SLOW.
A common misconception. Virtual memory wouldn’t make them slow. Indeed, some (Carmack, but I don’t like appeals to authority) suggest that it would make them faster. And it’s not an unreasonable suggestion.

If you’re not using the top mip level of a texture, it’s wasting space when it’s on the GPU. Having a virtual memory system would allow the GPU to only have the texture data that are needed. Which means you’re now more likely to find your texture to be resident now. Which means you can use more textures without a significant performance penalty.

The upload would be somewhat slow, and the GPU would have to wait for it to complete before continuing, but it wouldn’t be so bad. Certainly compared to the alternatives.

What we have now is a thrashing system. If you try to use more textures in a frame consistently than you have memory for, you thrash a whole bunch, thus hurting performance. Plus, the CPU (driver) has to get involved, so you’re hurting CPU performance too.

Now, the current plain memory approach is faster for the best-case: all textures in use are resident. But if you dare use more textures than you have memory for, you’re immediately screwed far more than virtual memory would hurt you.

So, in effect, you’re looking at a situation as something like this:

Plain memory, no thrash: 100%.
Cache memory, no thrash: 99.9% (the VM cost when not paging in is virtually negligable).
Plain memory, thrash: 25%.
Cache memory, thrash: 75% (bringing in less data and not involving the CPU. Much.).

And do note that, because the cache case stores less data per texture (only what is needed), it’s going to thrash much later (in terms of number of textures) than the plain memory case.

The virtual memory approach gives you the developer the right to break the cardinal rule of GPU development: never use more textures than you have memory for.

dual core? - how many “cores” are on todays GPU’s?
1

To be fair, all the parallelism and so forth in a GPU doesn’t stop it from still rendering one triangle at a time. And, there’s no concurrency because GPUs are specifically designed to make race conditions impossible. That’s the primary reason you can’t read from a pixel.

A true core must be a completely independent processing unit. And GPUs only have one independent processing unit: the GPU. It may have many different pieces, some in parallel with one another. But that’s not much different from the pipelines in modern CPUs.

While I agree with your general ideal that “mlb” is being rather unfair in his exaggerated portrayal of the graphics card industry, you’re simply doing the same thing, but from the other direction.

michael.bauer · July 17, 2006, 5:27am

> The GPU is the MOST complex device in your PC.

Oh, really? :rolleyes:

> They don’t use Virtual Memory and paging because that would make them SLOW.

Before we continue, have a look at this:
http://en.wikipedia.org/wiki/Virtual_memory

> We have GPU’s because we want the extra performance that they provide.

… and we have Ferraris because we want the extra power that they provide

> Virtual memory is for data that is not frequently accessed - and a 3D texture that is going to be drawn onto the current display is going to be accessed every frame.

That’s a good example – but in favour of virtual memory: imagine you render only a single 3d-textured triangle, why should you have the complete texture block in graphics memory. It would be a waste of memory…

> Even reading it from system RAM is generally going to be too slow, let alone from virtual memory.

Wait, the idea of “virtual memory” might not be as simple as you see it, virtual memory is not only about paging to hard disk

> You obviously have no understanding at all of what a GPU actually is.

Maybe, but I know what virtual memory is

P.S. Even Nvidia themselves laugh about their “FX Flow” cooling fans, so I’ll not have to take back my critics about bad engineers in this point.

michael.bauer · July 17, 2006, 5:39am

>While I agree with your general ideal that “mlb” is being rather unfair in his exaggerated portrayal of the graphics card industry,

The bad stability of the ATI linux stuff is not my problem, and not my invention. I use nvidia stuff and their OpenGL implementation is rather good.

system · July 17, 2006, 6:39am

As was said (Korval), with the current method, if you have too many texture or VBOs, then you get a big slowdown.

With VM applied to the mipmap levels, it could be better. If your character moves around the scene too much, then some of those mipmaps will need to be moved to VRAM and some back to RAM.
If you have a few huge 3D textures for your voxel renderer, and you rotate the view, again …

You have to admit that the “One True Solution” is to add more VRAM.
Additionally, texture compression is a good idea (compress float maps, compress normal maps)

wrong policies (e.g. drivers optimized for speed, but not stability)
gaming cards or the high end stuff?

michael.bauer · July 17, 2006, 12:41pm

> As was said (Korval), with the current method, if you have too many texture or VBOs, then you get a big slowdown.

Exactly, that’s one of the motivations for virtual memory: You could do a “manual” paging based on texture objects if it wouldn’t cause that slowdown.

> You have to admit that the “One True Solution” is to add more VRAM.

No, it’s not a question of the absolute size of VRAM.

> gaming cards or the high end stuff?

Bad stability (like ATI+linux) is never OK, no matter of the price of the product, and the stability is a software problem in the first place. (windows vs. linux in the ATI case)