Crash on compressed texture partial upload

I know, the subject is strange, I didn’t really know how to express it.

Basically, I had a screen background 1024x768, and decided to try putting it in a compressed texture object to save some gfx mem.

For the uncompressed object, I created the texture with:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, w, h, 0, GL_RGB, GL_UNSIGNED_BYTE, 0)

and then simply uploaded the data using glTexSubImage2D (with UNPACK_ALIGNMENT and ROW_LENGTH set). Works like a charm, as expected.

For the compressed texture, I tried to only change the texture creation, to replace internal fmt with GL_COMPRESSED_RGB_ARB, and leave all the other code as-is. However, this crashed (SEGV) in the user-mode part of the driver on the glTexSubImage2D call (Windows, ATI Cat 5.8 IIRC).

I’ve read the spec, but I can’t for my life see that I’ve done anything wrong. For now I’ve worked around it using a memory hog (creating a POT image, copying the non-pot, uploading the whole pot, and then deleting it), but I’m not the least happy about it.

So my question is, am I doing something that should crash (at least the user-mode part) of ICD’s, or do I have a bug to file with ATI here?

Have you tried the glCompressedTexImage2D/glCompressedTexSubImage2D functions?

Edit: I mean precompress the background image and upload it using the compressed texture functions.

Nope, I didn’t try that. All I did is in the OP.

I suspect what I’m doing is (by spec.) correct, but the driver in this case screws up due to an assumption that for a compressed texture, a non-NULL initial pointer was expected/assumed. With closed source drivers and no debug info it’s however close to impossible to see what really crashes (ATI, this was a boot in your direction).

Perhaps I should just create a small repro case, so anyone could check. It’s not like it’s a rocket science thing. :slight_smile:

OK, I’ve narrowed it down to see that it simply crashes in glTexSubImage2D, whether I have given a pointer to the originally created texture object or not. I’ve also tested it on another vendors h/w, and there it does not crash, why I’m now starting to assume this is an ATI driver thing. Could be interesting to hear some results from people with earlier/later versions and/or different hardware, if it makes a difference. I used Cat 5.8.

Please note I ripped out all error checking, and even the check for compressed texture support, to keep the code size down a bit.

#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#endif
#include <GL/gl.h>
#include <glut.h>
#include <string.h>

#ifndef GL_COMPRESSED_RGB_ARB
#define GL_COMPRESSED_RGB_ARB 0x84ED
#endif


void upload_sub_rgb(const void* p)
{
	glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
	glPixelStorei(GL_UNPACK_ROW_LENGTH, 1024);
	glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 1024, 768, GL_RGB, GL_UNSIGNED_BYTE, p);
	glPixelStorei(GL_UNPACK_ROW_LENGTH, 0);
	glPixelStorei(GL_UNPACK_ALIGNMENT, 4);
}


// not compressed. Would be silly if this didn't work, eh. :-)
void works1()
{
	char* p = (char*)malloc(1024*768*3);
	glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, 1024, 1024, 0, GL_RGB, GL_UNSIGNED_BYTE, 0);
	upload_sub_rgb(p);
	free(p);
}

// just to display a/the (slow & ugly) workaround
void works2()
{
	char* p = (char*)malloc(1024*768*3);
	char* p2 = (char*)malloc(1024*1024*3);
	memcpy(p, p2, 1024*768*3);
	glTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGB_ARB, 1024, 1024, 0, GL_RGB, GL_UNSIGNED_BYTE, p2);
	free(p2);
	free(p);
}

// test to see if it also crashes when given an initial pointer, even if a dummy
void just_testing()
{
	char* p = (char*)malloc(1024*768*3);
	char* p2 = (char*)malloc(1024*1024*3);
	glTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGB_ARB, 1024, 1024, 0, GL_RGB, GL_UNSIGNED_BYTE, p2);
	upload_sub_rgb(p);
	free(p2);
	free(p);
}

// This always crashes for me, in the glTexSubImage2D call
void crash()
{
	char* p = (char*)malloc(1024*768*3);
	glTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGB_ARB, 1024, 1024, 0, GL_RGB, GL_UNSIGNED_BYTE, 0);
	upload_sub_rgb(p);
	free(p);
}


void test()
{
	GLuint tex_names[4];
	glGenTextures(3, tex_names);
	printf("doing works1
");
	glBindTexture(GL_TEXTURE_2D, tex_names[0]); works1();
	printf("did works1
");
	printf("doing works2
");
	glBindTexture(GL_TEXTURE_2D, tex_names[1]); works2();
	printf("did works2
");
	printf("doing just_testing
");
	glBindTexture(GL_TEXTURE_2D, tex_names[2]); just_testing();
	printf("did just_testing
");
	printf("doing ATI crasher
");
	glBindTexture(GL_TEXTURE_2D, tex_names[3]); crash();
	printf("doing ATI crasher
");
}


int main()
{
	glutInitDisplayMode(GLUT_DOUBLE bitor GLUT_RGBA bitor GLUT_DEPTH);
	glutInitWindowSize(320, 240);
	glutInitWindowPosition(100, 50);
	glutCreateWindow("Triangle Stripper Test");
	test();
	return 0;
}

You need to make sure your x, y origin and width and height are a power of 4 when sub loading part of the compressed texture. Here is how I do it.

  
                static int xoff, yoff, w, h;

                xoff = int(xOffset / 4) * 4;
                yoff = int(yOffset / 4) * 4;
                w = int(width / 4) * 4;
                h = int(height / 4) * 4;

                if(w < width) w+=4;
                if(h < height) h+=4;

                glTexSubImage2D(GL_TEXTURE_2D, 0, xoff, yoff, w, h, GL_RGBA, GL_UNSIGNED_BYTE, buffer);

buffer is a small rectangle of pixels I would like to place on the compressed texture. That’s it.

You need to make sure your x, y origin and width and height are a power of 4 when sub loading part of the compressed texture.
Nope. For DXTn compression(s) that’s true, but while thats the most frequent implementation, it’s far from the only option for COMPRESSED_RGB_ARB, which may use everything from a non-compressed internal storage up to wavelet, or something not yet even considered. Anyway, have a look at the code again. That’s the exact code I used to verify it crashed on Cat 5.8 using 9250, and worked on another vendors hw+driver. I think you’ll find 0, 768 and 1024 to be multiples of 4. :wink:

Anyway, having studied the spec even more now ( “Issues (5)” ), it’s obvious the crash I experience isn’t intended behaviour. I’ll wait a few days to give e.g. Humus time to spot it and comment on it before posting a bug to ATI devrel.

[O/T]
Re. truncating or rounding an integer to every fourth (or every 2^n): division is the most expensive ALU operation. If you do it frequently, you can save measurable CPU by using bitwise AND. OK, any decent optimizer “fixes” your code for you, but you could be in for some nasty performance surprises in unoptimized builds, or trying the same code on a new architecture where no “good” optimizer is available.

Thanks for the idea, though.

What about updating your catalyst driver? 5.8 seems rather old to me…