Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 20

Thread: quick question about PBO's and glTexImage

  1. #1
    Junior Member Newbie
    Join Date
    Jun 2012
    Posts
    14

    quick question about PBO's and glTexImage

    I'm confused about the function of the parameters of glTexImage2D when used with a PBO. I assume that the format and type parameters are ignored? (as I understand in the traditional non-PBO usage that conversion is done in the CPU)

    Code :
    	glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, pbo);
     
    	void* ioMem = glMapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY);
    	memcpy(ioMem, data, size);
    	glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB);
     
    	glTexImage2D ( GL_TEXTURE_RECTANGLE, 0, internalFrmt, tw, th, 0, format, type, 0 ); GL_ASSERT;
    	glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, 0);

  2. #2
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,064
    The parameters describe the format of the pixel data you are providing. They cannot be ignored; otherwise, OpenGL has absolutely not idea what the bytes you're providing mean.

    Translation can still happen when using PBOs. And yes, it will be done on the CPU. It's up to you to provide image data in a format that your implementation won't have to convert.

  3. #3
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    989
    The format and type parameters are not ignored when using PBOs, they mean the exact same thing like in case of non-PBO use. The only difference is that the last argument of glTexImage is an offset into the pixel unpack buffer instead a pointer to the client side data.

    Thus format conversion does in fact happen (if needed) also in case of PBOs.
    Further, non-PBO usage does not require the conversion to be done on the CPU either. The driver can choose to still do it on the GPU, however, non-PBO usage might require an additional copy from application memory space to driver space and it also blocks the application to some extent as it is carried out synchronously, while PBOs work asynchronously.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  4. #4
    Junior Member Newbie
    Join Date
    Jun 2012
    Posts
    14
    Quote Originally Posted by Alfonse Reinheart View Post
    The parameters describe the format of the pixel data you are providing. They cannot be ignored; otherwise, OpenGL has absolutely not idea what the bytes you're providing mean.

    Translation can still happen when using PBOs. And yes, it will be done on the CPU. It's up to you to provide image data in a format that your implementation won't have to convert.
    Wow so that implies that the data efficiently blasted to the GPU in the PBO will be copied back to RAM converted and sent to the texture the old way. Would it still be asynchronous? OK I'll keep an eye on it.

  5. #5
    Junior Member Newbie
    Join Date
    Jun 2012
    Posts
    14
    Quote Originally Posted by aqnuep View Post
    Further, non-PBO usage does not require the conversion to be done on the CPU either. The driver can choose to still do it on the GPU.
    Of course, and I'm dying for info on how to trigger those new hw based 'Copy Engines' in the NVidia Fermi+ architecture. But in practice (except for the most common formats) the data transfer / conversion is astonishingly slow. It not just done on the CPU but it's being done in a sub-optimal way. So I definitely don't want to trigger the PBO data to be sent back to the CPU to be processed in the old way and then sent back to the texture. Anyway thanks for the heads up.

  6. #6
    Senior Member OpenGL Lord
    Join Date
    May 2009
    Posts
    6,064
    Wow so that implies that the data efficiently blasted to the GPU in the PBO will be copied back to RAM converted and sent to the texture the old way.
    You assume that a buffer object will always be allocated using GPU memory.

    Would it still be asynchronous?
    All uploads are asynchronous. It's simply a question of how much.

    Using client memory, the driver will generally copy your data into an internal buffer, then upload that asynchronously. PBOs simply cut out the middle-man; you get to specify the "internal buffer" yourself.

    The main use for PBOs is downloading. That's not to say that they aren't useful for uploading data. But doing downloads is where you really gain something. Downloads without PBOs are never asynchronous.

    Really, the biggest bang for your buck will be figuring out what the optimal pixel transfer format and data type will be for your internal format of choice. If you pick the right format and data type, then there will be no need for any modification of the data outside of what the DMA engine can do (ie: swizzling).

  7. #7
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    989
    Quote Originally Posted by vhammer View Post
    Of course, and I'm dying for info on how to trigger those new hw based 'Copy Engines' in the NVidia Fermi+ architecture. But in practice (except for the most common formats) the data transfer / conversion is astonishingly slow. It not just done on the CPU but it's being done in a sub-optimal way. So I definitely don't want to trigger the PBO data to be sent back to the CPU to be processed in the old way and then sent back to the texture. Anyway thanks for the heads up.
    Why you think that drivers upload and convert textures using the CPU? While, in fact, there may be certain scenarios when using a CPU path might be necessary, most uploads and conversions could be easily performed on the GPU. Not to mention that PBO does not necessarily resides in video-RAM, but in memory that is accessible by the GPU (which can be either video-RAM or some special type of system memory).
    Actually, a GPU path could be used by the driver even without PBOs, however, without PBOs glTexImage still works synchronously, that's why PBOs perform better, because they perform the whole thing asynchronously.
    And you don't need those "copy engines" to do so. Any GPU not older than 5-6 years definitely has some sort of support for GPU uploads/conversions.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  8. #8
    Junior Member Newbie
    Join Date
    Jun 2012
    Posts
    14
    Quote Originally Posted by aqnuep View Post
    Why you think that drivers upload and convert textures using the CPU?
    Because they are so slow

  9. #9
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    989
    Quote Originally Posted by vhammer View Post
    Because they are so slow
    Well, forgive me that I'm sceptic, but most of the time when people complain about very basic functionality being slow, like VBOs, PBOs, etc. it is because they don't use it properly.
    Yes, generally all forms of uploads, downloads and conversions are time consuming, because the CPU or GPU has to crunch through that data.
    But, if used correctly, these operations can be made very fast.

    Do you use AMD or NVIDIA GPU? And what model?
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  10. #10
    Junior Member Newbie
    Join Date
    Jun 2012
    Posts
    14
    Quote Originally Posted by Alfonse Reinheart View Post
    Really, the biggest bang for your buck will be figuring out what the optimal pixel transfer format and data type will be for your internal format of choice. If you pick the right format and data type, then there will be no need for any modification of the data outside of what the DMA engine can do (ie: swizzling).
    Yes that's exactly what I'm trying to do. There is also the promise in the link in my first post to this thread that the transfer can be made completely asynchronous using threading. With my initial tests I'm seeing none of this though.

    doing this:
    Code :
    	glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, pbo);
     
    	void* ioMem = glMapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY);
    	memcpy(ioMem, data, size);
    	glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB);
     
    	glTexImage2D ( GL_TEXTURE_RECTANGLE, 0, internalFrmt, tw, th, 0, format, type, 0 );
    	glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, 0);

    takes longer than doing this:
    Code :
     glTexImage2D ( GL_TEXTURE_RECTANGLE, 0, internalFrmt, tw, th, 0, format, type, data );

    I've been trying various combinations to avert data munging, it would assume that the below would not trigger any conversion
    1. GL_RGBA + GL_UNSIGNED_BYTE ->GL_RGBA8
    2. GL_RGBA + GL_FLOAT -> GL_RGBA32F
    3. GL_RGBA + GL_HALF_FLOAT -> GL_RGBA16F

    Clearly there is something I'm missing / doing wrong....

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •