PDA

View Full Version : glTexSubImage2D with Buffer Object less efficient.



Narann
07-28-2014, 02:28 PM
Hi OpenGL community. I'm face to a problem using Buffer Objects and would like your opinion on this. I have a code like this:

Texture object constructor:

glBindTexture(GL_TEXTURE_2D, m_textureName);
glTexImage2D(GL_TEXTURE_2D, 0, m_format, m_width, m_height, 0, GL_BGRA, GL_UNSIGNED_BYTE, NULL);

Texture object update():

glBindTexture(GL_TEXTURE_2D, m_textureName);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, m_width, m_height, GL_BGRA, GL_UNSIGNED_BYTE, m_pTexture); // As you guess, m_pTexture is a pointer to the pixels

The texture pixels need some CPU modifications (It's a N64 HLE emulator) so some (most) of them are updated almost once per frame so I often have lags.

Trying to improve the situation as much as possible, I've tried to modify my code to use Buffer Objects, like this:

Texture object constructor:

glBindTexture(GL_TEXTURE_2D, m_textureName);
glTexImage2D(GL_TEXTURE_2D, 0, m_format, m_width, m_height, 0, GL_BGRA, GL_UNSIGNED_BYTE, NULL);

glBindBuffer(GL_PIXEL_UNPACK_BUFFER, m_pixelBuffer);
glBufferData(GL_PIXEL_UNPACK_BUFFER, m_width * m_height * GetPixelSize(), NULL, GL_DYNAMIC_DRAW);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, NULL);


Texture object update():

glBindBuffer(GL_PIXEL_UNPACK_BUFFER, m_pixelBuffer);
glBufferSubData(GL_PIXEL_UNPACK_BUFFER, 0, m_width * m_height * GetPixelSize(), m_pTexture);

glBindTexture(GL_TEXTURE_2D, m_textureName);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, m_width, m_height, GL_BGRA, GL_UNSIGNED_BYTE, NULL);

glBindBuffer(GL_PIXEL_UNPACK_BUFFER, NULL);

It work but performances are very bad, I was surprised (Intel 965GM, MESA 8.0.1, Linux Mint 13 Maya aka Ubuntu 12.04).

So I have multiple question:
1) Am I doing things badly?
2) Is there any interest to use Buffer Objects here?
3) What could be "the good way" to deal with my case? (maybe there is some other OpenGL features more appropriates). I would like to stay on OpenGL 2.1 but using the more modern approach with it.
4) Would I have better performance using glMapBuffer/glUnmapBuffer with a memcopy() between them?

A big thanks in advance all! :)

Nikki_k
07-28-2014, 03:18 PM
Buffer or not, you still need to transfer all the data from the CPU to the GPU. That costs time.

I once had a similar problem with a larger set of textures being constantly changed and updated. Ultimately the only solution was to find a way to reduce such uploads. For example, if you create a specific modification the may be reused later, leave it alone on the GPU until you need it again. If you somehow can predetermine what modifications you need, create and upload them up front for later use.

Narann
07-28-2014, 04:28 PM
Thank Nikki_k, this is the kind of optimization I will certainly do.

But the mistery is still why use Buffer Object slowdown the whole texture transfert. It should, at least, be the same in terme of performance as the amount of data transfered is the same.

Nikki_k
07-28-2014, 11:43 PM
Yes, the amount of data is the same, but you are taking a detour getting it to where it needs to be. Of course that takes longer because there's an additional (most likely GPU_internal - depending on driver implementation) copy of your data - from the buffer to the texture. And the first copy from system to GPU memory, no matter whether you copy to a texture directly or to a buffer, needs to be synchronous. Under normal circumstances doing a second GPU-internal copy will always add more overhead because it cannot possibly start before your data is on GPU memory.

Here's some bit of explanation of the whole thing:

http://gamedev.stackexchange.com/questions/35486/map-and-fill-texture-using-pbo-opengl-3-3

mhagain
07-29-2014, 01:18 AM
With Intel graphics change this:
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, m_width, m_height, GL_BGRA, GL_UNSIGNED_BYTE
To this:
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, m_width, m_height, GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV
That should give you quite a large speedup, and you won't need the PBO.

Narann
07-29-2014, 07:01 AM
Thanks @Nikki_k! I think I start to understand:

Using Buffer Object with glVertexAttribPointer (https://www.khronos.org/opengles/sdk/docs/man/xhtml/glVertexAttribPointer.xml) doesn't do any overload because it's just a pointer to internal datas already stored in the GPU but glTexSubImage2D (https://www.khronos.org/opengles/sdk/docs/man/xhtml/glTexSubImage2D.xml) has not this notion of "pointer". It will always copy the whole data (and produce the "double copy" you are talking about).

I was thinking glTexSubImage2D had the same behavior than glVertexAttribPointer.

So, is there an equivalent to glVertexAttribPointer but for textures? Something that will use GPU Buffer Objects datas instead of copy again? (I guess no but I just ask).

Thank @mhagain, I will try that! Is that the famous prefered format/type (https://gamedev.stackexchange.com/questions/17587/how-detect-which-opengl-texture-formats-are-natively-supported)? Is there any way to find this the best type somewhere in OpenGL 2.1? If not, maybe a list somewhere? How did you find this?

Thanks in advance! :)

mhagain
07-29-2014, 01:24 PM
Thank @mhagain, I will try that! Is that the famous prefered format/type (https://gamedev.stackexchange.com/questions/17587/how-detect-which-opengl-texture-formats-are-natively-supported)?
It is, yes.

Is there any way to find this the best type somewhere in OpenGL 2.1? If not, maybe a list somewhere? How did you find this?
No way that I know of in 2.1; I found this by writing a program to benchmark various combinations until I just found which was the fastest. For raw TexSubImage upload speed on this particular generation of Intel gfx, GL_BGRA/GL_UNSIGNED_INT_8_8_8_8_REV came in about 25 times faster than GL_BGRA/GL_UNSIGNED_BYTE. The difference is less on modern hardware. What's neat is that it also holds good for NV (roughly equal) and AMD (about twice as fast).

Narann
07-29-2014, 01:41 PM
Thanks mhagain!


No way that I know of in 2.1; I found this by writing a program to benchmark various combinations until I just found which was the fastest.
Wow! :eek: Any place I could find a such tool? Or a database that gather this informations depending on vendors etc...

For raw TexSubImage upload speed on this particular generation of Intel gfx, GL_BGRA/GL_UNSIGNED_INT_8_8_8_8_REV came in about 25 times faster than GL_BGRA/GL_UNSIGNED_BYTE. The difference is less on modern hardware. What's neat is that it also holds good for NV (roughly equal) and AMD (about twice as fast).
I'm impatient to test this!

Thanks again!

tmason
07-29-2014, 03:18 PM
With Intel graphics change this:
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, m_width, m_height, GL_BGRA, GL_UNSIGNED_BYTE
To this:
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, m_width, m_height, GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV
That should give you quite a large speedup, and you won't need the PBO.

I am glad I sign in every so often just to read the boards...

Every once in a while gems like this pops up.

Thanks again, mhagain, you are indeed a OpenGL guru...