Megatextures?

JC: What we’re doing in Quake Enemy Territory is sort of our first cut at doing that over a simple case, where you’ve got a terrain model that has these enormous, like 32000 by 32000, textures going over them. Already they look really great. There’s a lot of things that you get there that are generated ahead of time, but as the tools are maturing for allowing us to let artists actually go in and improve things directly, those are going to be looking better and better.

Enemy Territory: Quake Wars:…This is due at least in part to the huge and surprisingly detailed outdoor areas that are possible, thanks to an all-new “megatexture” mapping technology developed by id Software programming guru John Carmack. The megatexture is essentially one huge, continuous texture map that can stretch all the way to the horizon, without any need for fog or other effects to mask a limited draw distance or texture tiles that repeat and show seams at the edges.
Can somebody explain how megatextures works?

yooyo

Ok I really don’t know for sure (and I doubt any one else does besides id and the engine licensees) but to me, it sounds an awful lot like Clip-Maps, which are nothing all that new. Well for games maybe. Chapter 2 of GPU Gems 2 talks all about them and from that, this MegaTexture thing sounds similar. Too bad someone at QuakeCon didn’t ask John Carmack when he was taking questions at the end of his talk.

I think there is a paper about them, but I don’t have it. I do remember seeing them mentioned on vterrain.org.

-SirKnight

What we’re doing in Quake Enemy Territory is sort of our first cut at doing that over a simple case
That part was after his talk about texture virtualization which means instead of having texture objects, you simply have one texture memory space and the texture coordinate looks into it.

Textures virtualization would help some things quite a bit for developers:

-You won’t have to bind samplers to shader program, you simply read from one texture space. Which means you could read from as many textures as you want until the swapping in and out of the texture space kills performance(just like swapping lots of memory pages from the hard drives kills CPU performance).

Of course, you would still need to pass the base address of a texture if you wanted to read from
it in a shader and did not have an object with texture coordinates to read from it.

-All textures can be non-power of 2.

-You can maximise batching on the application side. As long as data is executed with the same shaders, you can batch them. Right now, you have to bind different textures even if you are using the same states or shaders to render geometry.

This also implies that artists can customize different side of a wall without killing batching.

AFAIK the maximum texture size is 2k on Ati and 4k on nVidia. I don’t understand how one can create these ‘megatextures’. In my applications I use images larger than 4k by 4k, but am using tiling, usually at 512 by 512.

I´m REALLY waiting for virtual graphic card memory, too. I have to deal with geometry data that is multiple times larger than the main memory (and graphics card memory). We have to do alot of data-managment. And it is very hard, almost impossible, to tell the hardware exactly how to manage your data you put into the VBO´s.
Virtual memory would relief us from many headaches and probably would perform better.

Originally posted by Tzupy:
AFAIK the maximum texture size is 2k on Ati and 4k on nVidia. I don’t understand how one can create these ‘megatextures’. In my applications I use images larger than 4k by 4k, but am using tiling, usually at 512 by 512.
It really only works well for something like a terrain where you can still get batches of a reasonable size.

You simply tell you artist : You have 32k by 32k to draw into.

Then you pass it throught a software a softeware that splits the texture down to size that the video card support and assign the textures and texture coordinates to the right pieces on the terrain.

Originally posted by skynet:
I´m REALLY waiting for virtual graphic card memory, too.
I am really looking forward it too. But I am a bit worried because it is a significant change in paragdim and I suspect it will be very difficult to support non-virtual cards and virtual cards.

Of course, if your software is custom and your customer(s) don’t care about switching cards then it is great.

On the PC, game developers will have to suffer the duality. I suspect widespread usage of the technology would begin on a console first and then the PC gamers will have no choice but to upgrade to play the new games.

That part was after his talk about texture virtualization which means instead of having texture objects, you simply have one texture memory space and the texture coordinate looks into it.
Yeah, that’s never going to happen. Videocard memory may eventually be virtualized into some kind of cache-like architecture, but you’ll still have texture objects. IHV’s aren’t going to suddenly just give you a pointer to some memory and let you call it a texture. The driver will still manage the memory itself.

The advantages of virtual textures are the possibility for larger textures (the theoretical maximum is 8388608, or 2^23. This is due to 32-bit floating-point precision issues) and some possibility for greater performance. This does not mean suddenly giving fragment programs carte blanc to go rampaging through video memory reading any old block of data, nor does it make texture units in fragment programs just go away. The latter might be possible, but it would be too easy to screw up in platform-dependent ways (can you actually pass a texture object name as an integer argument? Texture objects are 32-bit ints, while integers in glslang are guarenteed only to be 16-bits + a sign bit).

Originally posted by Korval:
[QB]Yeah, that’s never going to happen. Videocard memory may eventually be virtualized into some kind of cache-like architecture, but you’ll still have texture objects.
Texture objects would in concept become a form of pointer, but you are right, they would still be texture object.

I was not saying we need direct access to the memory. You would still call glTexture* to create a texture and all the other functions to manipulate it.

This does not mean suddenly giving fragment programs carte blanc to go rampaging through video memory reading any old block of data, nor does it make texture units in fragment programs just go away
Well, if you add an extra step that transforms texture coordinates to point to a spot in the virtual memory space, then texture units are not needed anymore.

texture = glTexture2d(...)

glBindTexture(GL_TEXTURE_2D, texture )

//setup vbo for texture coordinates

//this would now transform
//the coordinates to virtual addresses
glTexCoordPointer(...) 

After the transformation, texture units are not required anymore.

Repeat and Clamp would still apply if the coordinates go overboard. This requires some work from the GPU to figure out the bounds of the current texture. CPUs can already do that for processes and complain if you try to access something outside your address space.

Something will always be hard to do for somebody. It is either going to be difficult for the hardware, or difficult for the software.

Virtualization makes it easier for the software. CPU makers figured it out, I am sure GPU makers can too.

surely there’d be cache issues - say you access a 64x64 subtexture that lies bang in the middle of a 32000x32000 texture, the texture cache manager doesn’t know the subtextures dimensions (or has any concept of a subtexture?), so it would only cache texels on the currently referenced row…as soon as the v coord increments it would have to completely repopulate its cache.
This sounds like mad talk to me.

3d Labs has had virtual texture memory for years, it basically works like normal virtual texture memory, but on textures…
So the texture is divided into 32x32 tiles and those tiles are paged in/out based on usage patterns. (Oh coincidence, 32x32xRGBA that’s 4k the page size a lot of normal systems use.) The tile indexing sheme may also use some clever space filling curve to make acces patterns better (like not having row by row indexing)
Using them basically works like normal texturing, it doesn’t even have to make the maximum texture size bigger. The advantages are:

  • If you use only a sub part of your texture only the bounding 32x32 tiles will be loaded.
  • If part of the texture is obscured by other geometry only the visible tiles will be loaded (think skyboxes…)
  • If some object with a 4kx4k texture is far away only the lowest mip level tiles will be loaded.

There is some presentation from 3D labs that explains it all in detail somewhere…

All of this basically helps to make the texture memory requirement closer to the actual framebuffer size as any invisible data is not loaded.
I’m not sure of the hardware cost to implement this but bullet 2 seems to be the most difficult one as it bascially needs hardware that is able to suspend fragments and do some other usefull stuff untill that data appears.

Pentagram, I suppose the presentation you refer to is :
http://www.graphicshardware.org/previous/www_1999/wkshp.html
search in page for “Virtual Textures - a true demand-paged texture memory management system in silicon”
http://www.graphicshardware.org/previous/www_1999/presentations/v-textures.pdf

Wow, supported since 1999 in Permedia 3 … One can ask why gaming cards don’t already do this.

I’d like to make two remarks:

  1. 32k x 32k RGBA = 4 GB.
    Not exactly usable with today’s installed memory sizes (usually 1 GB, next year probably 2 GB needed).
    So it would need heavy compression to squeeze it down to 512 MB for example. That would be acceptable.
    But more actual textures would have to be created for each frame (or maybe just updated?), than in the
    case of a ‘classical’ approach, where the same texture can be used for many quads.
  2. 512 x 64 = 32k.
    If the actual texture size is 512 x 512 (used for a local tile), then just 64 x 64 such textures would fill all
    the 32k x 32k image. For small levels this may be acceptable, but for a large world like Morrowind (loved it)
    this would be unusable.

Virtual textures doesn’t just require graphics memory and CPU memory, you also need to read from the disk, as main memory is not going to be large enough to load the types of textures that people will want to use, 32k by 32k is actually rather piddling once you start thinking about whole earth geospatial data.

Sgi’s clip mapping supports paging from disk, to main memory and then subloading down to the graphics hardware. The hardware support for IR specific, and doesn’t yet exist on modern CPU.

I haven’t personally tried implementing clip map emulation, but with shaders I would have thought you could come reasonably close to emulating it.

Another important attribute of any virtual textures support beside the ability to read from disk (or over the network), is the ability to have non uniform detail levels, where you have localised high res inserts.

All this couple of file formats, network/disks/CPU/main memory/GPU/GPU memory all points to a high level API than just OpenGL, expecting OpenGL to provide all this is really asking a bit too much of OpenGL.

This doesn’t stop anyone from writing a general purpose library that adds virtual texture support ontop of OpenGL :slight_smile:

Robert.

Nice paper!

When i read Carmacks speech, i was not convinced from his point. I don’t think it is that important to have custom textures everywhere. I don’t think landscapes will look better if their texture is different everywhere, because i cannot see a difference anyway. Not, when i am busy slaying monsters :wink:

However, there certainly ARE applications, where huge textures can be a great win, especially in CAD applications for designers.

Anyway, i didn’t think it would be worth the trouble to put that into the hardware, but the fact, that 3DLabs has put it into silicon 1999 already and claims that it IMPROVES efficiency and speed, makes me wonder, why ATI and nVidia didn’t do anything about it, yet.

But i honestly doubt, that this situation will change in the next few gfx-card generations. We’ll see.

Jan.

I haven’t personally tried implementing clip map emulation, but with shaders I would have thought you could come reasonably close to emulating it.

Yep, just see chapter 2 of GPU Gems 2. :slight_smile:

You can d/l the .fx files that are a part of ch2 on nvidia’s Gems 2 site. The stuff on the cd in the book is on their site for d/l.

-SirKnight

Well, if you add an extra step that transforms texture coordinates to point to a spot in the virtual memory space, then texture units are not needed anymore.
Why would you want to get rid of exceedingly fast hardware that does precisely what we want it to 99.99% of the time and replace it with comparitively slow fragment program code? Unless fragment programs can start doing a full 2x2 bilinear blend in 1 cycle (not to mention clamping where needed, converting a floating-point texture coordinate into a memory address, etc. All in 1 cycle), why would you want to?

Pointers are bad. They create bugs. And bugs in a GPU that you can’t really debug on anyway are never a good thing.

I’m not sure of the hardware cost to implement this but bullet 2 seems to be the most difficult one as it bascially needs hardware that is able to suspend fragments and do some other usefull stuff untill that data appears.
The underlying problem with virtual texturing is that there’s almost no way to prevent stalls due to paging in a texture piece.

If you have one pixel-quad of a triangle, and it needs to page in a piece of the texture, there’s not much you can do instead of waiting for the data. You might try processing other quads in that same triangle, but likely those quads are just going to ask for more from that same texture. And since these quads are nearby (screen-space), you’re probably already getting the data for them from the first fetch, so running these other quads in the triangle isn’t too useful.

And you can’t run quads from other triangles as this violates the GL spec all over the place. Triangles must be rendered in-order for any number of reasons. Of course, the other triangles are almost certainly also going to be accessing this texture (swapping textures creates a stall long enough that you probably won’t be looking at a different shader/material set), so they could easily stall themselves.

Originally posted by Korval:
why would you want to?
Because it get rids of the act of binding textures.

Filtering can still be done by the sampler.

Pointers are bad. They create bugs. And bugs in a GPU that you can’t really debug on anyway are never a good thing.
You are not using the pointer. To the application, it is just a single texture. To keep going with analogies, a texture would be a process. The GPU would protect access to the other textures(processes) by clamping and repeating.

Because it get rids of the act of binding textures.
So, if I want to use a shader many times, each time with different textures, I have to either pass the texture object (which I’ve already stated is a U32, while integers in glslang are only S16’s) as a uniform to the shader (thus killing batching anyway, thus negating any usefulness of doing so), or I have to do some strange attribute work to tell it what texture to use, which will probably have the same problem in terms of creating stalls as binding the texture to begin with.

Also, if I recall correctly, nVidia fragment shaders actually compile uniforms into the shader itself, so changing a uniform is the equivalent of switching to a new shader (it must upload the program again).

Originally posted by Tzupy:
32k x 32k RGBA = 4 GB.

I think the computation was 32pixels x 32pixels x 4bytesperpixel:

32 x 32 x 4 = 4096bytes

That would allow for very large disk-based bitmaps to be indexed down to 32x32 pixel blocks. A page request from the disk cache and a few pre-fetches for approaching neighbors would make this very responsive. This is not a new concept since the stereo mapping industry has been using similar techniques for nearly 20 years (if not longer).

  • tranders