Mipmap levels and video memory

I think it’s common knowledge nowadays that all consumer cards store the whole mipmapping chain in video memory as soon as the texture is used at all. In other words, if you’ve got a 4096x4096x4 mipmapped texture and only an area of 64x64 pixels is visible on screen, the whole 80 Mb of data will be uploaded into video memory, even though the rasterizer will only need to access 16 Kb of these. I know in practise there is other considerations (trilinear filtering needs access to the closest mipmap chains), but why the hell can’t the drivers manage this memory better ? The amount of textures you could have in your scenes would increase tremendously, and even performance should increase (no need to upload all this data from system memory if it’s not yet in video memory). So is there a specific reason why this isn’t possible ?

Y.

Well, it is possible and 3Dlabs cards already do this. They perform demand paging of texture data and view video memory as a “cache” of AGP memory. Future commercial (ATI/NVIDIA) cards might do this soon, too. It’ll be more worthwhile when superbuffers factors most memory operations to orthogonal operations. Carmack suggested it a while ago (add TLB to GPU) but I think that’s what the AGP GART actually is. With PCI-express cards, it’ll be slightly different.

-Won

Yes there’s a reason drivers can’t manage this under some circumstances.

At any instant your code could generate fragments that requires that high resolution image. OpenGL has no idea when or if this might happen. You made that texture, hopefully for a reason. The performance penalty for this could be severe and defeat any potential optimization.

About the only thing you could do is perhaps use MIN LOD to give the driver a chance to ignore the higher resolution if you don’t need it. Some implementations may consider it, I had a lengthy debate with a driver developer at NVIDIA over this some time ago now, he was basically saying that they ignore this and it’s a bad idea, maybe he’s right (don’t think so for smart developers). There are things that implementations can do to manage texture memory (and some do AFAIK) like storing tiled fragment groups and paging them when not resident etc I don’t think they like discussing this kind of stuff because it’s proprietary and some of it’s patented. It’s always short of the potential for applications to be smart about paging requirements, but if the drivers ignore the few mechanisms we have to hint at memory priorities it is a bit of a downer.

[This message has been edited by dorbie (edited 02-06-2004).]

I agree there might be some tricky situations where the full mipmap hierarchy is needed, but i’m sure a smart driver could detect it, and you must admit that’s not something you do for 99% of your textures. Unless i’m missing something, it could be implemented with the current generation of graphics card, or even older. I mean, the tendency today is to add more and more transistors, to duplicate the onboard memory, etc… and to me it just seems that field of optimization is left abandonned. Wouldn’t you like to “automagically” multiply by 10 the amount of textures you can use in your scenes ?

Y.

but i’m sure a smart driver could detect it, and you must admit that’s not something you do for 99% of your textures.

Glslang gives a fragment program the ability to compute the fragment’s LOD bias on a per-fragment basis. This, pretty much, kills any hope of allowing the driver to have any clue as to whether or not any particular mip LOD will be used.

The video-memory-as-cache mechanism could be quite useful, and I’m surprised that major vendors don’t implement it. It would have to be done in hardware, as a software version of this mechanism would rely on the hardware firing some kind of interrupt to the driver, and therefore potentially stalling more than it needs to.

But, perhaps card makers feel that this mechanism would actually be slower. Maybe they’ve done some hardware tests, and found that it isn’t as good as one mught think.

I think this is something that might make it into the next DirectX standard; the one witch will be released along with Longhorn.
There have been some discussions about using a virtual memory system. The effect would be that we can use video memory, AGP memory and system memory in the exact same way. And then the driver will manage the textures and vertex data or anything you throw at it, in the best possible way.

This is similar to what we have when dealing with system memory. You can’t really tell when you use the ram memory or the page file… Now I know you guys are OpenGL fans, but if it makes it into the standard of DirectX then the developer have to make sure that the hardware is not the limiting factor. So then it can’t be too hard to get it into OpengGL as well! And that’s nice. Instead of using 80mb chunks, the texture can be split by the memory subsystem to pieces of maybe 4 kbyte… If you’re interested take a look at this:

Beyond3D article

Glslang gives a fragment program the ability to compute the fragment’s LOD bias on a per-fragment basis. This, pretty much, kills any hope of allowing the driver to have any clue as to whether or not any particular mip LOD will be used.

But again, how often do you think that feature will be used ? The driver could easily parse the shader to find the mipmap bias instruction, and then flag the bound texture as needing all the mipmaps.

Y.

Ysaneya, even if you exclude glslang the only way I can see the driver handling the easier cases is by doing software vertex transforms to calculate eye z values. This would be prohibitively expensive.

There are basically only two ways that the driver could do it all by itself. One is the 3dlabs way, which has already been mentioned. What I didn’t see mentioned is the fact that 3dlabs has at least one patent on the technology. I suspect that’s why no other vendor has used that technique.

The other would be for the driver to calculate lamda value for every polygon using a given texture. In order to do that, not only would the driver have to transform and clip all of the polygons, but it would practically have to rasterize the polygons (to compute the min/max rho values).

Do you hear that flushing sound? It’s performance going bye-bye…

The only reason I know this is because I just finished looking at doing something like this in the open-source 3D drivers for XFree86. As a side note, there is a 3rd way to tackle some of this, but it requires a little hardware support and a little application support. It’s called clipmaps. Check out the SGI extension spec and the original paper on the subject. It’s a pretty clever idea.

[This message has been edited by idr (edited 02-06-2004).]

[This message has been edited by idr (edited 02-06-2004).]

Originally posted by Ysaneya:
But again, how often do you think that feature will be used ? The driver could easily parse the shader to find the mipmap bias instruction, and then flag the bound texture as needing all the mipmaps.

Even without using GLslang, the driver has no idea what the location of the transformed vertices or computed texture coordinates will be. A triangle may be small, but if the texture coordinates are close together, then you may need a higher resolution mipmap anyway.

If the driver has to compute vertex positions and texture coordinates, then you might as well be doing software vertex processing.

You’re right, i didn’t consider that point.

However, an extension to let the user specify the min & max mipmap levels (with clipping when outside these ranges) could be usefull. Would require hardware assistance, and no longer work “automatically”, but that’d be better than nothing…

Y.

Originally posted by Ysaneya:
[b]You’re right, i didn’t consider that point.

However, an extension to let the user specify the min & max mipmap levels (with clipping when outside these ranges) could be usefull. Would require hardware assistance, and no longer work “automatically”, but that’d be better than nothing…

Y.[/b]
It’s already there: http://oss.sgi.com/projects/ogl-sample/registry/SGIS/texture_lod.txt

The interface could be similar to this extension; i’ve read the spec but i haven’t found any reference to the driver being able to NOT upload to video memory the mipmaps outside the range… which is the whole point of the discussion

Y.

Similar!!! The extension exists as has been pointed out but it was also mentioned earlier in this thread. Drivers optimizing for memory residency using this extension should be transparent. Clients never get to strictly specify hardware policy of implementations like this in OpenGL, it’s generally anathema to it’s objectives. At best they get to hint. It seems to me that there’s more than enough capability to both clamp with residency and without using the double tokens.

Specifically LEVEL should be used as the residency hint and LOD the MIP level computation based on my reading of the spec. I’ve yet to figure out what the purpose of this redundancy is if the intent is anything else bearing in mind that this predates shaders etc. (maybe incase it’s affected by the contemporaneous lod bias?)

That extension is about mipmaps, but if I had read correctly the article in Beyond3D, the idea was to treat textures as just another piece of memory.
So the driver is not restricted to individual mipmaps, but could choose to cache a subsection of a mipmap into VRAM.

But I question the use of this mechanism. I’m sure in the case of 3D textures is useful, but most games typically use 2D textures and in the rare case 1D textures. You might as well store the whole thing in VRAM or just store the whole thing in AGP.

Well you’ve lost me, I don’t see any problem with this extension fulfilling the needs of application directed MIP level residency. This is what was being discussed at least in part.

As for keeping subsections of larger individual texture levels resident, this may be done already on an on a demand basis on some cards based on a tiled layout. I can’t be sure but I filed a patent (with Chris Migdal) on a texture residency scheme based on indirection through a virtual memory table including MIP level redirection with the potential for explicit application control and the examiners sent me some related (but different) art in the form of a patent filed by 3DLabs for a tiled video memory paging architecture a while back. If memory serves it wasn’t clear if it was framebuffer or texture residency from the filing, but it was a while ago.

[This message has been edited by dorbie (edited 02-11-2004).]

Lost you? I think you understood from your post.

Originally posted by dorbie:
Well you’ve lost me, I don’t see any problem with this extension fulfilling the needs of application directed MIP level residency. This is what was being discussed at least in part.

If that’s what the app wants to do, that is fine.
The talk about DirectX Next or whatever it is in Beyond 3D talks about something else and you mentioned that too. So I guess 3D labs have implemented this for textures (and why not since they make workstation cards).

For an ordinary home PC that will be running games, I’m question the use of this.
Let’s not forget that “Direct3D is for games”

Take a look at Ysaneya’s first post :
4096x4096x4 mipmapped texture (uncompressed)
makes 87 MB

For a game, I think this is one insane texture dimention. For a pro app, fine, it makes sense.

There are two totally different things being discussed here. One is the patented 3dlabs technique that will apparently be in DirectX Next. That is 100% transparent to the application. It’s the same way that paging memory out to disk is transparent to applications. The other is the much older SGI extension.

The SGI extension is really only helpful if you have a few very large textures. The original paper talks about using a 1 meter resolution texture of the entire planet. The 3dlabs technique is applicable to anything that uses a lot of texture memory, be it a few huge textures or many normal sized textures.

From reading a bunch of the recent posts, I think there is a lot of confusion over which is which.

For a game, I think this is one insane texture dimention. For a pro app, fine, it makes sense.

Huh, no. A game like Battlefield 1942 (which is starting to be… old) uses 8192x8192, split in 64 1024x1024 chunks. Obviously it wouldn’t work at all if it was not compressed, but yet…

In my own game i use 8192x8192 too, because i have a pretty large terrain. However most of the time the camera moves “slowly”, and only a few chunks around it need the full resolution textures. For far away chunks that take 50 pixels on screen, why should i have to use the full 1024x1024 mipmaps chain, when i KNOW filtering will only use mipmaps of up to, say, 64x64 ?

In the near future i will implement a caching mechanism to manage textures that way, but then i’ve been wondering, why couldn’t the driver do it and everybody benefit it ?

Y.

>>>A game like Battlefield 1942 (which is starting to be… old) uses 8192x8192, split in 64 1024x1024 chunks. Obviously it wouldn’t work at all if it was not compressed, but yet…<<<

From what you just said, it’s not a problem for BF 1942. Therefore, it is not a problem for all games in it’s class. Therefore, this technology is not a must.

Also, not every image in a game’s texture folder is a texture. Some are height maps.
In which case, you still will require large amounts of ram, but for geo.
So a game like this can have a super large terrain with repeatable textures.

I don’t know what that game does really.

For the most part (over 90%), games use 128x128, 256x256, 512x512.