Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 10 of 10

Thread: Possible NVidia Driver Bug ~ 319.49

  1. #1
    Intern Contributor
    Join Date
    May 2008
    Posts
    94

    Possible NVidia Driver Bug ~ 319.49

    I'm running the latest linux drivers (319.49) on a 680. I have a big performance problem with glCompressedTextureSubImage3D or glCompressedTexSubImage3D. I am trying to subload into a fairly large array texture 512x512x1000.

    The following call goes out to lunch for about 15 ms. Subsequent calls to larger mip levels block for even longer.

    glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
    999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, 0);

    The problem was also seen on 580s. I think the 260 driver was ok. I am still trying to reproduce in a stand-alone app, but so far I haven't had any luck. Is this a known bug?

    Thanks...

  2. #2
    Intern Contributor
    Join Date
    May 2008
    Posts
    94
    Here's a small program that demonstrates the bug. I realize that populating a PBO and immediately using it is not ideal. However, the performance I'm seeing has got to be a bug. If I don't use a PBO, then the texture subload is lightning fast. With a PBO, it's taking 16 ms for 8 bytes of data. Larger mipmaps take over 100 ms. I tried the latest beta driver too.

    Code cpp:
     
    #define GL_GLEXT_PROTOTYPES 1
    #include <GL/glut.h>
    #include <iostream>
    #include <stdlib.h>
    #include <stdio.h>
    #include <GL/glext.h>
    #include <sys/time.h>
    using namespace std;
    static GLuint bind_name;
    void draw(void)
    {
      static GLuint     pbo_handle;
      static GLsizeiptr pbo_size = 8;
      GLsizeiptr        subload_size = 8;
      unsigned char     subload_buf[8] = {0};
      if ( pbo_handle == 0 )
        glGenBuffers( 1, &pbo_handle );  
      // Bind PBO
      glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, pbo_handle );
      // Bind Texture
      glBindMultiTextureEXT ( GL_TEXTURE0, GL_TEXTURE_2D_ARRAY, bind_name ) ;
      pbo_size = std::max( pbo_size, subload_size );
      glBufferData( GL_PIXEL_UNPACK_BUFFER_ARB, pbo_size, 0, 
                    GL_STREAM_DRAW );
      glBufferSubData( GL_PIXEL_UNPACK_BUFFER_ARB, 0, subload_size, 
                       subload_buf);
     
      glFinish();
     
      timeval tv ;
      gettimeofday ( & tv, NULL ) ;
      double start = (double) tv.tv_sec + (double) tv.tv_usec / 1000000.0 ;
      glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
                999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, subload_buf);
     
      gettimeofday ( & tv, NULL ) ;
      double end = (double) tv.tv_sec + (double) tv.tv_usec / 1000000.0 ;
      std::cout << "Why is this taking so long?: " << (end - start)*1000.0 << " ms" <<endl; 
      // Unbind PBO
      glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, 0 );
      glutPostRedisplay();
    }
    void
    display(void)
    {
      draw();
      glutSwapBuffers();
    }
    void
    init(void)
    {   
      cout << glGetString(GL_VERSION) <<endl;
      glGenTextures(1, &bind_name);
      glBindMultiTextureEXT ( GL_TEXTURE0, GL_TEXTURE_2D_ARRAY, bind_name ) ;
      int size[] = { 8000, 8000, 8000, 32000, 128000, 512000, 2048000, 8192000, 32768000, 131072000 };
      for ( int i = 0; i < 10; ++i )
        glCompressedTextureImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, i,
           GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 512>>i, 512>>i, 1000, 0, size[9-i], 0);
    }
    int
    main(int argc, char **argv)
    {
      glutInit(&argc, argv);
      glutInitWindowSize(640, 480);
      glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH);
      glutCreateWindow("Terrible Driver Bug");
      glutDisplayFunc(display);
      init();
      glutMainLoop();
      return 0;
    }
    Last edited by ViolentHamster; 09-04-2013 at 07:03 AM. Reason: Code tag...

  3. #3
    Senior Member OpenGL Guru
    Join Date
    May 2009
    Posts
    4,948
    What is with the odd combination of DSA and non-DSA functions? Why do you bother binding the texture if you're just going to upload to it with a DSA function?

  4. #4
    Intern Contributor
    Join Date
    May 2008
    Posts
    94
    I was playing with DSA and non-DSA to see if that had anything to do with the bug. You're welcome to clean it up and post another version.

  5. #5
    Intern Contributor
    Join Date
    May 2008
    Posts
    94
    Doh, my test program had a bug... To see the bug, change:

    Code cpp:
    glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
                999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, subload_buf );
    To this:
    Code cpp:
    glCompressedTextureSubImage3DEXT( bind_name, GL_TEXTURE_2D_ARRAY, 9, 0, 0,
                           999, 1, 1, 1, GL_COMPRESSED_RGB_S3TC_DXT1_EXT, 8, 0 );

  6. #6
    Intern Contributor
    Join Date
    May 2008
    Posts
    94
    I'd be really happy if someone from Nvidia could at least confirm that this is a bug. The one page test program reproduces it...

  7. #7
    Senior Member OpenGL Pro
    Join Date
    Jan 2007
    Posts
    1,137
    If it only happens when loading from a PBO, then it seems to me that it's most likely blocking while waiting for the PBO to finish loading, rather than blocking in the texture load itself. So the slowdown would be in the driver's PBO code, not necessary the driver's texture upload code.

    Have you tried different combinations of PBO usage hints, or via a normal (uncompressed) glTexImage call? That may be useful info to help narrow down what's happening.

    The whole thing does remind me of this old thread: http://www.opengl.org/discussion_boa...on-fermi-cards

  8. #8
    Intern Contributor
    Join Date
    May 2008
    Posts
    94
    The texture size affects the PBO load times. STREAM, STATIC, and DYNAMIC DRAWs all perform very poorly. Unfortunately, using uncompressed textures isn't an option.

  9. #9
    Senior Member OpenGL Pro
    Join Date
    Jan 2007
    Posts
    1,137
    It may not be an option as a solution, but trying it can help with diagnosis of the problem.

  10. #10
    Advanced Member Frequent Contributor
    Join Date
    Apr 2009
    Posts
    578

    What if...

    I admit this what if should not happen, but what if the format of the compressed texture data needs to be converted to something the GPU can use and that conversion MUST happen by CPU? I admit it should not happen, but it would explain. Is the upload without PBO also slow or is it fast?

    Other ideas: make the buffer object a frame or two before, do the upload(of a big texture) and draw a quad with it. Ideally have a few compressed images hanging around where you change the image every frame [or ideally cycle textures for the upload]. This might make the test look harder to isolate, but it might be a delayed action kind of thing that the GL implementation does that is fine (or even good) in uses cases in programs.

    Just an idea though.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •