Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: The speed of uploading textures seems slow.

  1. #1
    Junior Member Newbie
    Join Date
    Sep 2011
    Location
    China
    Posts
    29

    The speed of uploading textures seems slow.

    Hi,

    I uploads textures using the PBO skill, got a data transfer speed of ~3.5GB/s on GTX570, and ~2.5 GB/s on GTX670. The timings are performed by ARB_time_query.

    These speeds seem slow, since I read in some posts that the right speed is ~5GB/s. I coded following this doc(http://www.songho.ca/opengl/gl_pbo.html). Have I missed something?
    It is said that pinned memory benefits data transfer, but how to use pinned memory in OpenGL?

    Thanks in advance.

  2. #2
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    You are unlikely to get peak data transfer speed with any method due to CPU + synchronization overhead. I think 3.5GB/s is a pretty reasonable (and probably way better than most people achieve). Why do you think that it should be 5GB/s?

    Also, pinned memory is available in OpenGL through the GL_AMD_pinned_memory extension, but works only on Radeons as NVIDIA doesn't support it yet.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  3. #3
    Junior Member Newbie
    Join Date
    Sep 2011
    Location
    China
    Posts
    29
    Thanks for the reply.

    The timings above exclude the CPU time, actually, just the GPU time consumed for uploading. So i think it should be faster.

    In this thread, http://www.opengl.org/discussion_boa...l+copy+engines,I_hrabcak posted his results of some tests, "The OpenGL data upload (texture and buffers) works in full speed on GeForce family which means ~5GB/s on PCe 2.0 and 2.5GB/s on PCIe 1.1.". I am impressed by this number (5GB/s) because I once read this number in another site, but i cannot find the original post.

    Besides, why is
    GTX670 slower than GTX 570? strange.

  4. #4
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    Quote Originally Posted by robotech_er View Post
    The timings above exclude the CPU time, actually, just the GPU time consumed for uploading. So i think it should be faster.
    I know that the timing excludes CPU time there, but that doesn't mean that CPU overhead or synchronization time is not affecting the performance, especially if you have more than a single upload performed. Don't forget that OpenGL has implicit synchronization of resources, thus it might happen that you run into some unintended race condition. Or the simplest: the CPU is unable to send the commands to the GPU just in time, leaving small gaps between the uploads. Your application doesn't even have to be CPU-bound in order to such things to happen.

    I'm just saying, you have to make the perfect synthetic test to achieve peak performance.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  5. #5
    Junior Member Newbie
    Join Date
    Sep 2011
    Location
    China
    Posts
    29
    Thank you, aqnuep. looks like i have a lot to learn.

    Could you direct me some resources/links about how to make this perfect synthetic test, or in other words, how to get the peak performance? The speed of uploading is the bottleneck of my application. A few keywords for this problem is good, too, i even don't know what keywords should i google.

  6. #6
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985
    To be honest, I'm not the right person to ask. Maybe you should try to contact the users who had the discussion on the dual-copy engine topic (hopefully they'll come by and visit this topic too).
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  7. #7
    Junior Member Newbie
    Join Date
    Sep 2011
    Location
    China
    Posts
    29
    thanks, aqnuep, appreciate your help. I'll continue to try.

  8. #8
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,067
    Quote Originally Posted by robotech_er View Post
    These speeds seem slow, since I read in some posts that the right speed is ~5GB/s. I coded following this doc(http://www.songho.ca/opengl/gl_pbo.html). Have I missed something?
    In order to tell you if you missed something, you have to tell us what you know about "PBO skill".

    First, why do you think using PBO is faster than "normal" glTexSubImage2D()? I can say it isn't, and it is right in many use-cases.
    We need a pseudo-code of your texture upload/download to tell you whether you can get any benefit of PBO usage, or an answer to a previous question.

    Second, maximal throughput of 16-lane PCI-E v2.x bus is 8GB/s. You cannot gain even the half of that throughput if you don't overlap multiple texture transfers. The reason is obvious. Data have to be transferred to PBO (the part of the OpenGL controlled main memory) first, and then asynchronously transferred to a GPU memory.

    Third, the time you have measured is the execution time of some code-block on the GPU, but that is not the actual transfer time. What have you measured and how?

    Fourth, since data have to be uploaded first to a main memory and then downloaded to a GPU, the speed of the main memory and FSB is very important. You have mentioned GTX570 and GTX670. Are those cards on the identical machines? If not, the results might differ a lot. Maybe there is also some problem in a driver for GTX670.

    Fifth, "dual copy engine" does not work on GeForce cards.

  9. #9
    Junior Member Newbie
    Join Date
    Sep 2011
    Location
    China
    Posts
    29
    Thank you for the detailed reply, Aleksandar. I should have post a more complete question .

    For the first question, I have tested the "normal" glTexImage2D() ( not glTexSubImage2D(), Do the two funcs differ much? ). Without using PBO, glTexImage2D() is performed on the CPU, and I got a speed of ~600M/s.

    For the second item, could you tell me how to "overlap multiple texture transfers" for a better performance?

    For the third, here is the code snippets:
    Code :
     
     
    void func1()
    {
        //--------------codes for timing---------------------------------------
        GLuint query;
        glGenQueries(1, &query);
        glBeginQuery(GL_TIME_ELAPSED,query);
        //---------------------------------------------------------------------
     
     
        //uploading ---- the code-block tested
        glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo1);   //
        glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER);
     
        glBindTexture( GL_TEXTURE_2D, tex1 );
        glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo1);
        glTexSubImage2D( GL_TEXTURE_2D, 0, 0, 0,tex1_width,tex1_height,GL_RED_INTEGER,GL_UNSIGNED_INT,0); //where the data transfer occurs
     
        glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
     
        //-----------codes for timing----------------------------------------------
        //yes,measuring like this stalls the whole progress. But I am just testing the time consumed for
        //uploading from host to video memory, rather than the performance of the whole app.
        //Does measuring like this broke the GPU execution sequence, thus hurts the uploading speed?
     
        glEndQuery(GL_TIME_ELAPSED);
        GLuint done = 0;
        while (done == 0)
        {
            glGetQueryObjectuiv(query, GL_QUERY_RESULT_AVAILABLE, &done);
        }
        GLuint elapsed_time;
        glGetQueryObjectuiv(query, GL_QUERY_RESULT, &elapsed_time);
        glDeleteQueries(1, &query);
        float time_ms = elapsed_time/1000000.0f;
        LogTime( time_ms ); //write time_ms to a log file.
        //-----------------------------------------------------------------------
     
        Render();
     
        //remap the buffer
        glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo1);
        glBufferData(GL_PIXEL_UNPACK_BUFFER, buffersize, 0, GL_STREAM_DRAW);
        pBufferData = (byte*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER,GL_WRITE_ONLY);
        //pBufferData is a global variable, it will be refilled in another thread which focus on I/O.
     
     
        glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
    }


    Fouth, yes, the GTX570 and the GTX670 are on the same machine. And i tend to believe it is a diver issue.
    Last edited by robotech_er; 07-06-2012 at 01:27 AM.

  10. #10
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,067
    Quote Originally Posted by robotech_er View Post
    I have tested the "normal" glTexImage2D() ( not glTexSubImage2D(), Do the two funcs differ much? ).
    Yes, they differ a lot considering performance. glTexSubImage2D() just updates the portion of existing texture. glTexImage2D() should be used for texture initialization only.
    Everything depends on drivers and the way they optimize work. I remember the excitement of a colleague of mine when replaced glTexImage2D with glTexSubImage2D. The frame rate increased significantly on NVIDIA, but have almost no boost on AMD. So, everything is on the drivers. Anyway, don't use glTexImage2D for the texture update.

    Quote Originally Posted by robotech_er View Post
    Without using PBO, glTexImage2D() is performed on the CPU, and I got a speed of ~600M/s.
    I have to disappoint you. Transferring data is done in two phases: copying data to "driver's memory" (system main memory) and downloading data from "driver's memory" to "graphics memory".
    Using standard approach (without PBO), both phases are done synchronous. Your CPU (or to be precise a core) is busy until everything is finished. PBO enables to do second phase asynchronously.
    But, during the second phase GPU is busy. You cannot do anything else on the GPU while texture is downloaded. That is what NV dual copy engine solves, but it doesn't work on your cards.
    In short, there is no magic in using PBO. It just releases CPU to do some other work while texture is downloaded from "driver's" to "graphics" memory. If you can do something useful on the CPU in the meantime, PBO saves some CPU time. The first phase stays anyway on the CPU. If you issue texture downloading and then wait to draw something and again wait for another to complete etc., you'll probably get no benefit of PBO. That's why texture download is usually started with multiple PBOs. While you are filling the second the first is (probably) downloading to a GPU. That's what I meant with overlapping.

    Considering timer_query, using TIME_ELAPSED is not preferable query since doesn't allows overlapping. glGetQueryObjectuiv() is a blocking function that significantly reduces performance of your code. I have no time now to elaborate on the topic. Please find some tutorial on the net.

    P.S. Remove while loop since it does nothing, and call glEndQuery() after some trivial drawing that uses uploaded texture. The measured time using timer_query does not represent exact upload-time anyway, since you are measuring something that is not GPU execution code only. Adding drawing code is necessary to force driver to actually upload texture.
    Last edited by Aleksandar; 07-07-2012 at 05:32 AM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •