ATI crash using NPOT compressed textures

JSB888 · October 15, 2012, 10:32pm

I’m trying to track down an occasional problem I’m having using non-power-of-two textures with texture compression on ATI cards. (Current card = Radeon HD 5850, driver = Catalyst 12.8.)

After looking at many examples of successful and unsuccessful texture loads I’ve picked one that I can use to reproduce the problem:


        glGenTextures(1, &TextureID);
        glBindTexture(GL_TEXTURE_2D, TextureID);
        glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
        glPixelStorei(GL_UNPACK_ROW_LENGTH, 0);


        GLsizei width = 734;
        GLsizei height = 717;


        DummyTexture = (GLubyte*) calloc(width*height, 3);


        try {
            glTexImage2D(GL_TEXTURE_2D, 2, GL_COMPRESSED_RGB, width, height, 0, GL_BGR, GL_UNSIGNED_BYTE, DummyTexture);


            GLint format;
            glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_INTERNAL_FORMAT, &format);


            if (format == 0){
                TRACE("No format at %d x %d
", width, height);
            } else {
                TRACE("OK at %d x %d
", width, height);
            }
        } catch (structured_exception& e) {
            const EXCEPTION_RECORD& rec = e.Record();
            TRACE("Exception attempting to load a texture sized %d x %d
", width, height);
            if (rec.ExceptionCode == EXCEPTION_ACCESS_VIOLATION){
                char* accessStr;


                if (rec.ExceptionInformation[0] == 0)
                    accessStr = "Read";
                else if (rec.ExceptionInformation[0] == 1)
                    accessStr = "Write";
                else if (rec.ExceptionInformation[0] == 8)
                    accessStr = "DEP";
                else
                    accessStr = "Unknown";


                size_t AccessLocation = rec.ExceptionInformation[1];


                TRACE("%s access violation to address %ld
", accessStr, AccessLocation);


                size_t ImageStart = reinterpret_cast<size_t>(DummyTexture);
                size_t ImageEnd = ImageStart + width*height*3;
                TRACE("Image data is from %ld to %ld
", ImageStart, ImageEnd);


                if (AccessLocation < ImageStart)
                    TRACE("Access is %ld bytes before image data
", ImageStart - AccessLocation);
                else if (AccessLocation > ImageEnd)
                    TRACE("Access is %ld bytes after image data
", AccessLocation - ImageEnd);
                else
                    TRACE("Access is within image data???
");
            }
        }


        free(DummyTexture);
        
        glDeleteTextures(1, &TextureID);

The trace output is:


Exception attempting to load a texture sized 734 x 717
Read access violation to address 141566091
Image data is from 139985008 to 141563842
Access is 2249 bytes after image data

A few things need explaining:

The sample code is loading mip level 2. This is because the crash, when it occurred loading a real texture, occurred with mip level 2. Using the same values with mip level 0 works fine. Loading mip levels 0 and 1 with correspondingly larger values doesn’t make a difference, i.e. putting the above into an appropriate loop gives:


OK at 2936 x 2868
OK at 1468 x 1434
Exception attempting to load a texture sized 734 x 717
Read access violation to address 50143371
Image data is from 48562288 to 50141122
Access is 2249 bytes after image data

Using automatic mipmap generation triggers the same crash when loading the 2936 x 2868 base level texture. Manual mipmap generation allows me to isolate the particular point when it occurs.

Changing GL_UNPACK_ROW_LENGTH to the image width doesn’t make any difference. Changing the height from 717 to 716 or 718 avoids the access violation in this particular case.

It is possible that other combinations of height and width that don’t trigger the access violation are still accessing memory outside of the image area – it might be purely luck that in this particular case the location 2249 bytes after the image just happen to not belong to the process.

I tried sizing the memory so that the width and height were a multiple of four and then set GL_UNPACK_ALIGNMENT to 4 but it didn’t help (it just changed the location of the access violation to be anywhere to tens of megabytes after or before the image data!).

Finally, structured_exception is just a simple wrapper class for Win32 structured exceptions (i.e. used with _set_se_translator()).

On a related note – for years now we’ve had a work-around in our code trying to find out what the maximum compressed texture size is for ATI GPUs. On the above card, if I do:

    // Get the theoretical maximum texture size
    glGetIntegerv(GL_MAX_TEXTURE_SIZE, &TexSize);

then TexSize = 16384. If I then do:


    // Find out the largest size that will actually fit
    do {
        glTexImage2D(GL_PROXY_TEXTURE_2D, 0, GL_COMPRESSED_RGB, TexSize, TexSize, 0, GL_RGB, GL_UNSIGNED_BYTE, NULL);


        GLint format;
        glGetTexLevelParameteriv(GL_PROXY_TEXTURE_2D, 0, GL_TEXTURE_INTERNAL_FORMAT, &format);


        if (format == 0){
            TexSize >>= 1;
        } else {
            done = true;
            MaxTextureSize = TexSize;
        }
    } while (!done);

TexSize will still be 16384.

But if I then try to use an actual texture larger than 4096, it crashes. So our code uses a loop similar to the first one to find out what the largest power of two is that doesn’t crash when trying to load it.

Am I doing something wrong?

Thanks,
Jason.

JSB888 · October 16, 2012, 12:02am

A small update:

The fact that the crash didn’t occur when I tried to load level 0 with that size inspired me to try using level 0 for generating the compressed data, which I then read out with a view to loading it later with glCompressedTexImage2D().

No cigar. No crash, but the texture isn’t compressed. Here’s the code for checking the result:


                GLint Compressed;
                glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_COMPRESSED, &Compressed);


                if (Compressed == GL_TRUE){            
                    glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_INTERNAL_FORMAT, &data.Format);


                    glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_COMPRESSED_IMAGE_SIZE, &data.Size);


                    data.Data = new GLubyte[data.Size];
                    glGetCompressedTexImage(GL_TEXTURE_2D, 0, data.Data);


                    TRACE("Compressed to internal format %d and size %d
", data.Format, data.Size);
                } else {
                    TRACE("Not compressed
");
                }

and here’s the output:


Loading 2936 x 2868
Compressed to internal format 33776 and size 4210224
Loading 1468 x 1434
Not compressed
Loading 734 x 717
Not compressed
Loading 367 x 358
Not compressed
Loading 183 x 179
Not compressed
Loading 91 x 89
Not compressed
Loading 45 x 44
Not compressed
Loading 22 x 22
Not compressed
Loading 11 x 11
Not compressed
Loading 5 x 5
Not compressed
Loading 2 x 2
Compressed to internal format 33776 and size 8
Loading 1 x 1
Compressed to internal format 33776 and size 8

Note that even the second mip level (which didn’t trigger a crash in the original version) wasn’t actually compressed successfully, either.

Further investigation shows that any time the width and height are not both multiples of four (the exception being 2x2 and 1x1), it fails to compress the texture. It doesn’t actually crash provided the mip level is 0, it just doesn’t compress it. This effectively means that ATI doesn’t support non-power of two compressed textures with mipmapping, since any non-power of two level 0 width or height will ultimately lead to a mipmap level with a non-multiple of four width or height.

This is rather surprising to me, so any comments on what I’ve done wrong would be appreciated.

Thanks,
Jason.

Alfonse_Reinheart · October 16, 2012, 12:23am

Am I doing something wrong?

Yes; you’re relying on ATI’s drivers to do your compression for you.

Stop doing that; the more you rely on the kinds of paths that are rarely used (and yes, compressing textures into generic formats rather than specific ones is rarely done), the more you open your code up to driver bugs. Upload pre-compressed in specific formats whenever possible.

JSB888 · October 16, 2012, 12:52am

Thanks Alfonse.

Is this advice a reflection of the quality of ATI’s drivers, or is this actually the way it’s meant to be done? The reason I ask is because:

(a) All the examples I’ve seen use the OpenGL driver to do the compression,

(b) Supporting EXT_texture_compression_s3tc is, in theory, independent of supporting compression, and

Next question: What’s the normal method for precompressing into GL_COMPRESSED_RGB_S3TC_DXT1_EXT (which is what I’m getting)? Note that this isn’t a game with a predefined set of textures that are going to be used over and over again, this program uses textured created by the end-user, so it needs to be built into the program itself.

Incidentally, the bug with maximum texture size definitely seems to be related to the driver texture compression code. I changed my test code to:

                glCompressedTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, TexSize, TexSize, 0, TexSize*TexSize/2, DummyTexture);

and now a TexSize of 16384 succeeds without problems! Thanks for the tip.

Alfonse_Reinheart · October 16, 2012, 1:53am

Is this advice a reflection of the quality of ATI’s drivers, or is this actually the way it’s meant to be done?

Define “meant”? The specification says that it should be possible. But the specification says a lot of things that aren’t a good idea. It says that you can link two shader objects for the same shader stage together, but I wouldn’t trust that it would be well supported. It says that you can use arrays and structs to pass data from one shader stage to another, but expecting it to work is folly. And so forth.

In general, the farther you stray from the beaten path (and no, online examples don’t count), the more likely you are to run into driver bugs.

While S3TC_DXT1_EXT might be “good” now, I don’t want to rule out taking advantage of better formats that might be supported in future.

How can you be “taking advantage” of them if your compressor is terrible? Even when it works, the OpenGL internal compressor will likely be designed to be fast, not good. So yes, you’ll get compression; just crappy compression. There are already “better formats” available, and even in that extension, they specifically advise off-line compression.

Note that this isn’t a game with a predefined set of textures that are going to be used over and over again, this program uses textured created by the end-user, so it needs to be built into the program itself.

That raises further questions, like why you’re doing the compression at all. If you’re fed an S3TC texture, then load it as one. If you’re fed an RGBA8 texture, load it as what it is.

JSB888 · October 16, 2012, 2:39am

“meant” as in the intended method of using it. Reading pages like http://www.opengl.org/sdk/docs/man/xhtml/glCompressedTexImage2D.xml creates the impression that it’s intended to be used this way with comments like

“glCompressedTexImage2D loads a previously defined, and retrieved, compressed two-dimensional texture image if target is GL_TEXTURE_2D (see glTexImage2D).”

and

“internalformat must be an extension-specified compressed-texture format. When a texture is loaded with glTexImage2D using a generic compressed texture format (e.g., GL_COMPRESSED_RGB), the GL selects from one of its extensions supporting compressed textures. In order to load the compressed texture image using glCompressedTexImage2D, query the compressed texture image’s size and format using glGetTexLevelParameter.”

In general, the farther you stray from the beaten path (and no, online examples don’t count), the more likely you are to run into driver bugs.

Which is precisely why asking on this forum can be so valuable. Given that every example I’ve ever seen took this approach, and pages like the above on opengl.org specifically mentioned it, I had no idea that it was “straying from the beaten path”.

How can you be “taking advantage” of them if your compressor is terrible?

I wasn’t aware it was “terrible”. Again, based on reading the documentation I simply set “glHint(GL_TEXTURE_COMPRESSION_HINT, GL_NICEST);” and I assumed I would get the highest quality compression the card supported.

Even when it works, the OpenGL internal compressor will likely be designed to be fast, not good. So yes, you’ll get compression; just crappy compression. There are already “better formats” available, and even in that extension, they specifically advise off-line compression.

Good to know.

That raises further questions, like why you’re doing the compression at all.

Why does anyone ever do compression? To reduce memory requirements, surely?

If you’re fed an S3TC texture, then load it as one. If you’re fed an RGBA8 texture, load it as what it is.

I’m “fed” very large amounts of image data. The graphics card is currently limiting the amount of data users can work with. I’m in the process of updating our software to manage resources by hand since OpenGL has proven unable to do so, and part of that process means explicitly loading and unloading textures to stay within its limitations since we simply can’t possibly load them all and we usually don’t need to load them all for a particular frame. I am hoping that uploading compressed textures will be faster than uploading raw textures, and I know that using compressed textures will allow me to keep more texture data “live” at any one point in time.

In doing these modifications I’ve uncovered a range of other issues that have had me poring over the documentation to see if I’ve done something wrong, but one thing at a time.

Thanks again for your help.

Alfonse_Reinheart · October 16, 2012, 3:09am

Again, based on reading the documentation I simply set “glHint(GL_TEXTURE_COMPRESSION_HINT, GL_NICEST);” and I assumed I would get the highest quality compression the card supported.

Don’t forget the part that says, “The interpretation of hints is implementation-dependent. An implementation may ignore them entirely.” If you’re going to build your application around something, I would suggest it not be something that OpenGL implementations are freely allowed to ignore.

In general, you should always assume that implementations will do the bare minimum needed to get by. So if your application can do something instead of OpenGL, then it wouldn’t be a bad idea to do that. Expect less of your OpenGL driver.

I am hoping that uploading compressed textures will be faster than uploading raw textures

If you’re relying on the driver to compress them, then you’re not going to get better upload performance. Live compression is almost certainly done on the CPU (as you can see by your very CPU-based error). So you’re taking precious CPU time to compress the data, then finally doing the DMA to the GPU. Yes, the DMA will be smaller, but unless texture upload bandwidth was your bottleneck (possible, but not likely), you’d be better off using pixel buffer objects and asynchronous uploads.

Of course, that requires providing image data that is already compressed.

JSB888 · October 16, 2012, 7:35am

But it doesn’t matter if it’s ignored in this case. An implementation that is capable of a newer, better compression algorithm and uses it when that hint is given is giving the user who bought it additional value with no additional effort on my part and no updates to the software required; but if my application is hard-coded to use the lowest common denominator then they’re out of luck.

In general, you should always assume that implementations will do the bare minimum needed to get by. So if your application can do something instead of OpenGL, then it wouldn’t be a bad idea to do that. Expect less of your OpenGL driver.

It seems to me that the concept is sound but it’s let down in practice by poor implementations. It’s hard to believe after all these years they’ve never tested it with NPOT textures or textures larger than 4K x 4K.

If you’re relying on the driver to compress them, then you’re not going to get better upload performance. Live compression is almost certainly done on the CPU (as you can see by your very CPU-based error). So you’re taking precious CPU time to compress the data, then finally doing the DMA to the GPU. Yes, the DMA will be smaller, but unless texture upload bandwidth was your bottleneck (possible, but not likely), you’d be better off using pixel buffer objects and asynchronous uploads.

Of course, that requires providing image data that is already compressed.

Actually, by “uploading compressed textures” that’s precisely what I meant. My original code (which crashed on NPOT textures and > 4K x 4K textures on ATI but worked fine on NVIDIA) used OpenGL to compress them then read them back (exactly as described in that link) so they could be later uploaded on demand, and it does seem faster as I’d hoped. Now I’m using an open source library I found after you gave me the idea that I could do it myself and it’s a bit slower but handles NPOT just fine.

Until today I actually thought OpenGL would have used a fragment shader under the covers to create the compressed data rather than the CPU when using glTexImage2D()!

Alfonse_Reinheart · October 16, 2012, 9:07am

But it doesn’t matter if it’s ignored in this case. An implementation that is capable of a newer, better compression algorithm and uses it when that hint is given is giving the user who bought it additional value with no additional effort on my part and no updates to the software required; but if my application is hard-coded to use the lowest common denominator then they’re out of luck.

And that’s the point: you cannot control what the implementation does. You can only control what you do. Yes, an implementation may use a great compression algorithm. Or… it might not. You can neither detect this nor can you do anything about it if you could.

That’s why relying on it is a bad idea; it’s not reliable. If you want to use the GPU to compress an image, then the only way to ensure that is to write the code yourself. If you want to use “newer, better compression algorithms”, the only way to do that is for you to do it.

Consistency is better than preferring one platform over another.

It’s hard to believe after all these years they’ve never tested it with NPOT textures or textures larger than 4K x 4K.

Why? Generally, bits of example code are used by hobby programmers on-line. They are copied and pasted more or less as-is directly into applications. And hobbyests will rarely break 4K texture sizes, simply because to do so they would have to create a texture that big.

Serious OpenGL users will upload textures as they are, because that’s the fast path. These applications are provided textures in a format, and that’s how they use them. OpenGL ES doesn’t even allow format conversions at all (until ES 3.0).

So who exactly would be peeking into this dark corner of OpenGL?

JSB888 · October 16, 2012, 11:41pm

Yes, but I really don’t care what the implementation does, as long as it doesn’t crash my application, although it would be nice if it at least stayed within the range of specified behaviours. There are far bigger and more important things that are outside of my control than how a compressed texture looks when the user magnifies it too far.

Yes, an implementation may use a great compression algorithm. Or… it might not. You can neither detect this nor can you do anything about it if you could.

We would do what we do now – have a checkbox in the settings dialog that activates or deactivates texture compression, and tell users who have problems with it activated not to use it. Those users will find they cannot work as efficiently with as much data as users with better implementations and will therefore have an incentive to change to a better implementation.

Consistency is better than preferring one platform over another.

In this case consistency is not important. I’d rather the software take advantages of whatever capabilities the end user’s hardware has, with the option for the user to disable the use of a capability if it proves unreliable on their implementation.

Why?

Because testing glTexImage2D() with textures larger than 12.5% of the maximum hardware-supported texture size and a compressed target seems like it should automatically be a part of their testing regime. The fact that they apparently haven’t tested it for many years, or, if they have, decided to spend less effort preventing user’s applications from crashing than I spent tracking the problem down and adding a work-around doesn’t give me much confidence in the rest of the implementation.

Generally, bits of example code are used by hobby programmers on-line.

Note that I’m not just talking about so-called “hobby” code, I’m talking about what the OpenGL SDK page for glCompressedTexImage2D() says in relation to this approach, which I linked to above. Likewise, the page for glTexImage2D() says:

“If the internalFormat parameter is one of the generic compressed formats, GL_COMPRESSED_ALPHA, GL_COMPRESSED_INTENSITY, GL_COMPRESSED_LUMINANCE, GL_COMPRESSED_LUMINANCE_ALPHA, GL_COMPRESSED_RGB, or GL_COMPRESSED_RGBA, the GL will replace the internal format with the symbolic constant for a specific internal format and compress the texture before storage. If no corresponding internal format is available, or the GL can not compress that image for any reason, the internal format is instead replaced with a corresponding base internal format.”

I don’t have any problem at all with that behaviour. The absolute worst case should be either (a) the compressed image looks lousy because of a poor quality implementation, in which case the user can turn it off, or (b) the image isn’t compressed at all, in which case the user is in the same boat as without using texture compression. The fact that the actual absolute worst case is that your application can actually crash due to illegal memory accesses by the driver despite being given perfectly valid and legal parameters is something worth complaining about, IMHO.

Serious OpenGL users will upload textures as they are, because that’s the fast path. These applications are provided textures in a format, and that’s how they use them.

I think that’s a somewhat limited definition of “serious”. I’ve been writing OpenGL code for 15 years and lived through the early days where we had to test specific board revisions of 3Dlabs cards with specific driver versions so we could figure out which combinations actually worked. Until recently I thought those days were long behind us. I deliberately ordered ATI cards the last time we refreshed our development machines because all the code always worked fine on our NVIDIA cards but we were having users with ATI cards reporting issues that we couldn’t explain but it’s only now that I’m trying to get to the bottom of some of those issues (rather than working around them as I did at the time, by specifically checking for ATI and disabling NPOT textures and textures larger than 4K x 4K when compression was enabled) that I’ve realised how stupid the problem actually is.

So who exactly would be peeking into this dark corner of OpenGL?

What bothers me is that the only usage scenario actually discussed in the SDK page for glCompressedTexImage2D(), namely loading “a previously defined, and retrieved, compressed two-dimensional texture image […] (see glTexImage2D)”, and also talked about in the glTexImage2D() page, and used in countless examples, is considered “peeking into a dark corner of OpenGL”. While the ARB_texture_compression extension specification does mention that loading an already-compressed texture should be significantly faster, which is not an issue and hardly a surprise, it also makes the following arguments in favour of having the driver do it, which is similar to the point I made above:

Generic compressed internal formats allow applications to use texture compression without needing to code to any particular compression algorithm. Generic compressed formats allow the use of texture compression across a wide range of platforms with differing compression algorithms and also allow future GL implementations to substitute improved compression methods transparently.

I’m very pleased that I was able to get to the bottom of this issue so quickly by posting on here, but I wish I didn’t have to in the first place.

JSB888 · October 16, 2012, 11:51pm

There is another issue that I was mindful of that I forgot to mention, and that’s the patent on S3TC mentioned on Wikipedia, which it says also covers compression algorithms. (S3 Texture Compression - Wikipedia)

I assume that NVIDIA and ATI have licenses for S3TC. I do not. Letting the OpenGL driver do the work seemed like a good way to take advantage of the patent license that the end user had already paid for without me needing to hire a patent attourney.

Alfonse_Reinheart · October 17, 2012, 12:59am

Because testing glTexImage2D() with textures larger than 12.5% of the maximum hardware-supported texture size and a compressed target seems like it should automatically be a part of their testing regime.

It’s easy to say that; it’s rather more difficult when you have a budget to work with. You can’t test every possible combination of states; it would take way too much time and effort to just develop such tests. So you spend your money on the things that people actually use. And in general, people are more likely to upload texture data that is already compressed than to rely on the driver’s compression system. So that’s what gets tested.

Note that I’m not just talking about so-called “hobby” code, I’m talking about what the OpenGL SDK page for glCompressedTexImage2D() says in relation to this approach, which I linked to above.

It doesn’t matter what the OpenGL specification says; what matters is what is likely to work. It’s the difference between what is legal and what is practical.

The fact that the actual absolute worst case is that your application can actually crash due to illegal memory accesses by the driver despite being given perfectly valid and legal parameters is something worth complaining about, IMHO.

I’m not saying it isn’t worth complaining about (though really, you should be complaining to AMD, not here). I’m saying that you shouldn’t be surprised that doing something unusual even if it’s legal leads to finding driver bugs.

OpenGL has a lot of traps, where the spec says that X should happen, but nobody ever really uses X so it never gets tested, so you can’t trust that X will happen correctly.

What bothers me is that the only usage scenario actually discussed in the SDK page for glCompressedTexImage2D(), namely loading “a previously defined, and retrieved, compressed two-dimensional texture image […] (see glTexImage2D)”

Well that’s just bad documentation. I didn’t write it, but thanks for bringing it up, so that I could fix it.