View Full Version : Possible NVidia Driver Bug ~ 319.49

09-03-2013, 03:46 PM
I'm running the latest linux drivers (319.49) on a 680. I have a big performance problem with glCompressedTextureSubImage3D or glCompressedTexSubImage3D. I am trying to subload into a fairly large array texture 512x512x1000.

The following call goes out to lunch for about 15 ms. Subsequent calls to larger mip levels block for even longer.

glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, 0);

The problem was also seen on 580s. I think the 260 driver was ok. I am still trying to reproduce in a stand-alone app, but so far I haven't had any luck. Is this a known bug?


09-04-2013, 07:53 AM
Here's a small program that demonstrates the bug. I realize that populating a PBO and immediately using it is not ideal. However, the performance I'm seeing has got to be a bug. If I don't use a PBO, then the texture subload is lightning fast. With a PBO, it's taking 16 ms for 8 bytes of data. Larger mipmaps take over 100 ms. I tried the latest beta driver too.

#include <GL/glut.h>
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <GL/glext.h>
#include <sys/time.h>
using namespace std;
static GLuint bind_name;
void draw(void)
static GLuint pbo_handle;
static GLsizeiptr pbo_size = 8;
GLsizeiptr subload_size = 8;
unsigned char subload_buf[8] = {0};
if ( pbo_handle == 0 )
glGenBuffers( 1, &pbo_handle );
// Bind PBO
glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, pbo_handle );
// Bind Texture
glBindMultiTextureEXT ( GL_TEXTURE0, GL_TEXTURE_2D_ARRAY, bind_name ) ;
pbo_size = std::max( pbo_size, subload_size );
glBufferData( GL_PIXEL_UNPACK_BUFFER_ARB, pbo_size, 0,
glBufferSubData( GL_PIXEL_UNPACK_BUFFER_ARB, 0, subload_size,


timeval tv ;
gettimeofday ( & tv, NULL ) ;
double start = (double) tv.tv_sec + (double) tv.tv_usec / 1000000.0 ;
glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, subload_buf);

gettimeofday ( & tv, NULL ) ;
double end = (double) tv.tv_sec + (double) tv.tv_usec / 1000000.0 ;
std::cout << "Why is this taking so long?: " << (end - start)*1000.0 << " ms" <<endl;
// Unbind PBO
cout << glGetString(GL_VERSION) <<endl;
glGenTextures(1, &bind_name);
glBindMultiTextureEXT ( GL_TEXTURE0, GL_TEXTURE_2D_ARRAY, bind_name ) ;
int size[] = { 8000, 8000, 8000, 32000, 128000, 512000, 2048000, 8192000, 32768000, 131072000 };
for ( int i = 0; i < 10; ++i )
glCompressedTextureImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, i,
GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 512>>i, 512>>i, 1000, 0, size[9-i], 0);
main(int argc, char **argv)
glutInit(&argc, argv);
glutInitWindowSize(640, 480);
glutCreateWindow("Terrible Driver Bug");
return 0;

Alfonse Reinheart
09-04-2013, 08:10 AM
What is with the odd combination of DSA and non-DSA functions? Why do you bother binding the texture if you're just going to upload to it with a DSA function?

09-04-2013, 08:27 AM
I was playing with DSA and non-DSA to see if that had anything to do with the bug. You're welcome to clean it up and post another version.

09-05-2013, 09:25 AM
Doh, my test program had a bug... To see the bug, change:

glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, subload_buf );

To this:

glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, 0 );

09-30-2013, 10:24 AM
I'd be really happy if someone from Nvidia could at least confirm that this is a bug. The one page test program reproduces it...

09-30-2013, 01:33 PM
If it only happens when loading from a PBO, then it seems to me that it's most likely blocking while waiting for the PBO to finish loading, rather than blocking in the texture load itself. So the slowdown would be in the driver's PBO code, not necessary the driver's texture upload code.

Have you tried different combinations of PBO usage hints, or via a normal (uncompressed) glTexImage call? That may be useful info to help narrow down what's happening.

The whole thing does remind me of this old thread: http://www.opengl.org/discussion_boards/showthread.php/171394-slow-transfer-speed-on-fermi-cards

09-30-2013, 03:15 PM
The texture size affects the PBO load times. STREAM, STATIC, and DYNAMIC DRAWs all perform very poorly. Unfortunately, using uncompressed textures isn't an option.

09-30-2013, 06:50 PM
It may not be an option as a solution, but trying it can help with diagnosis of the problem.

10-07-2013, 02:22 AM
I admit this what if should not happen, but what if the format of the compressed texture data needs to be converted to something the GPU can use and that conversion MUST happen by CPU? I admit it should not happen, but it would explain. Is the upload without PBO also slow or is it fast?

Other ideas: make the buffer object a frame or two before, do the upload(of a big texture) and draw a quad with it. Ideally have a few compressed images hanging around where you change the image every frame [or ideally cycle textures for the upload]. This might make the test look harder to isolate, but it might be a delayed action kind of thing that the GL implementation does that is fine (or even good) in uses cases in programs.

Just an idea though.