PDA

View Full Version : Possible NVidia Driver Bug ~ 319.49



ViolentHamster
09-03-2013, 03:46 PM
I'm running the latest linux drivers (319.49) on a 680. I have a big performance problem with glCompressedTextureSubImage3D or glCompressedTexSubImage3D. I am trying to subload into a fairly large array texture 512x512x1000.

The following call goes out to lunch for about 15 ms. Subsequent calls to larger mip levels block for even longer.

glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, 0);

The problem was also seen on 580s. I think the 260 driver was ok. I am still trying to reproduce in a stand-alone app, but so far I haven't had any luck. Is this a known bug?

Thanks...

ViolentHamster
09-04-2013, 07:53 AM
Here's a small program that demonstrates the bug. I realize that populating a PBO and immediately using it is not ideal. However, the performance I'm seeing has got to be a bug. If I don't use a PBO, then the texture subload is lightning fast. With a PBO, it's taking 16 ms for 8 bytes of data. Larger mipmaps take over 100 ms. I tried the latest beta driver too.



#define GL_GLEXT_PROTOTYPES 1
#include <GL/glut.h>
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <GL/glext.h>
#include <sys/time.h>
using namespace std;
static GLuint bind_name;
void draw(void)
{
static GLuint pbo_handle;
static GLsizeiptr pbo_size = 8;
GLsizeiptr subload_size = 8;
unsigned char subload_buf[8] = {0};
if ( pbo_handle == 0 )
glGenBuffers( 1, &pbo_handle );
// Bind PBO
glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, pbo_handle );
// Bind Texture
glBindMultiTextureEXT ( GL_TEXTURE0, GL_TEXTURE_2D_ARRAY, bind_name ) ;
pbo_size = std::max( pbo_size, subload_size );
glBufferData( GL_PIXEL_UNPACK_BUFFER_ARB, pbo_size, 0,
GL_STREAM_DRAW );
glBufferSubData( GL_PIXEL_UNPACK_BUFFER_ARB, 0, subload_size,
subload_buf);

glFinish();

timeval tv ;
gettimeofday ( & tv, NULL ) ;
double start = (double) tv.tv_sec + (double) tv.tv_usec / 1000000.0 ;
glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, subload_buf);

gettimeofday ( & tv, NULL ) ;
double end = (double) tv.tv_sec + (double) tv.tv_usec / 1000000.0 ;
std::cout << "Why is this taking so long?: " << (end - start)*1000.0 << " ms" <<endl;
// Unbind PBO
glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, 0 );
glutPostRedisplay();
}
void
display(void)
{
draw();
glutSwapBuffers();
}
void
init(void)
{
cout << glGetString(GL_VERSION) <<endl;
glGenTextures(1, &bind_name);
glBindMultiTextureEXT ( GL_TEXTURE0, GL_TEXTURE_2D_ARRAY, bind_name ) ;
int size[] = { 8000, 8000, 8000, 32000, 128000, 512000, 2048000, 8192000, 32768000, 131072000 };
for ( int i = 0; i < 10; ++i )
glCompressedTextureImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, i,
GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 512>>i, 512>>i, 1000, 0, size[9-i], 0);
}
int
main(int argc, char **argv)
{
glutInit(&argc, argv);
glutInitWindowSize(640, 480);
glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH);
glutCreateWindow("Terrible Driver Bug");
glutDisplayFunc(display);
init();
glutMainLoop();
return 0;
}

Alfonse Reinheart
09-04-2013, 08:10 AM
What is with the odd combination of DSA and non-DSA functions? Why do you bother binding the texture if you're just going to upload to it with a DSA function?

ViolentHamster
09-04-2013, 08:27 AM
I was playing with DSA and non-DSA to see if that had anything to do with the bug. You're welcome to clean it up and post another version.

ViolentHamster
09-05-2013, 09:25 AM
Doh, my test program had a bug... To see the bug, change:


glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, subload_buf );

To this:

glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, 0 );

ViolentHamster
09-30-2013, 10:24 AM
I'd be really happy if someone from Nvidia could at least confirm that this is a bug. The one page test program reproduces it...

mhagain
09-30-2013, 01:33 PM
If it only happens when loading from a PBO, then it seems to me that it's most likely blocking while waiting for the PBO to finish loading, rather than blocking in the texture load itself. So the slowdown would be in the driver's PBO code, not necessary the driver's texture upload code.

Have you tried different combinations of PBO usage hints, or via a normal (uncompressed) glTexImage call? That may be useful info to help narrow down what's happening.

The whole thing does remind me of this old thread: http://www.opengl.org/discussion_boards/showthread.php/171394-slow-transfer-speed-on-fermi-cards

ViolentHamster
09-30-2013, 03:15 PM
The texture size affects the PBO load times. STREAM, STATIC, and DYNAMIC DRAWs all perform very poorly. Unfortunately, using uncompressed textures isn't an option.

mhagain
09-30-2013, 06:50 PM
It may not be an option as a solution, but trying it can help with diagnosis of the problem.

kRogue
10-07-2013, 02:22 AM
I admit this what if should not happen, but what if the format of the compressed texture data needs to be converted to something the GPU can use and that conversion MUST happen by CPU? I admit it should not happen, but it would explain. Is the upload without PBO also slow or is it fast?

Other ideas: make the buffer object a frame or two before, do the upload(of a big texture) and draw a quad with it. Ideally have a few compressed images hanging around where you change the image every frame [or ideally cycle textures for the upload]. This might make the test look harder to isolate, but it might be a delayed action kind of thing that the GL implementation does that is fine (or even good) in uses cases in programs.

Just an idea though.