PDA

View Full Version : Streaming several (YUV) videos using OpenGL



sampsa
01-23-2018, 07:28 AM
Hi,

I'm trying to stream video using OpenGL textures. Some features of my programming architecture are:



Direct GPU memory access using PBOs - works great
I am using a stack of pre-reserved PBO that are being recycled - works great
Textures present LUMA and CHROMA planes that given to a shader program. Shader interpolates from YUV to RGB - works nicely


However, there is an bottleneck that is driving me nuts when I'm copying from PBO to textures.

I have traced this issue into the format/internal_format pair in

glTexImage2D(GL_TEXTURE_2D, 0, internal_format, w, h, 0, format, GL_UNSIGNED_BYTE, 0)

As we know, OpenGL converts everything to RGBA. The documentation states that:

https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glTexImage2D.xhtml : "GL_RED : each element is a single red component. The GL converts it to floating point and assembles it into an RGBA element by attaching 0 for green etc."

My problem is that I don't want RGB-whatever, just single channel ..! The "texture" here is just a blob of bytes that is given to the shader program. I don't need any conversion.

The only format/internal_format pair that gives me decent results, is GL_RGBA/GL_RGBA (but I can't use that).

My case would be GL_RED/GL_RED .. but that sucks big time. With that pair, "glTexSubImage2D" is hundred times slower than with GL_RGBA/GL_RGBA.

So, I tried to drop PBOs alltogether and start using TBOs (texture buffer objects) .. with TBOs there is no conversion - they are more like plain byte buffers, right?

However, TBOs don't give be dma to the GPU (this works with PBOs). This one:

payload = (GLubyte*)glMapBuffer(GL_TEXTURE_BUFFER, GL_WRITE_ONLY)

gives me a null pointer.

I am starting to run out of ideas - please help!

A small stand-alone test program can be found here .. it just benchmarks texture uploading (does not visualize anything):

https://github.com/elsampsa/opengl-texture-streaming
(https://github.com/elsampsa/opengl-texture-streaming)
Regards,

Sampsa

P. S. A related stack overflow question is here: https://stackoverflow.com/questions/48382350/streaming-several-yuv-videos-using-opengl

GClements
01-23-2018, 02:50 PM
As we know, OpenGL converts everything to RGBA. The documentation states that:

https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glTexImage2D.xhtml : "GL_RED : each element is a single red component. The GL converts it to floating point and assembles it into an RGBA element by attaching 0 for green etc."

My problem is that I don't want RGB-whatever, just single channel ..! The "texture" here is just a blob of bytes that is given to the shader program. I don't need any conversion.

If format and type match internalFormat, there shouldn't be any conversion.

The "conversion" to floating-point RGBA is only conceptual. It allows the internal and external formats to be discussed separately, rather than needing to discuss each possible combination of format and internalFormat.

Conceptually, the external format is converted to floating-point RGBA (converting signed or unsigned normalised values to floats, setting missing components to their default values), then that is converted to the internal format (converting floats to normalised values, discarding unused components). But I wouldn't expect it to actually convert to/from float if neither the internal nor external format is float, nor add components only to discard them.



The only format/internal_format pair that gives me decent results, is GL_RGBA/GL_RGBA (but I can't use that).

If you're using 3.2+ core profile, you can't use unsized internal formats; you need to use e.g. GL_R8 or GL_RGBA8.



My case would be GL_RED/GL_RED .. but that sucks big time. With that pair, "glTexSubImage2D" is hundred times slower than with GL_RGBA/GL_RGBA.

glTexSubImage2D doesn't have an internalFormat parameter; was that a typo for glTexImage2D?

Does it make a difference if you use GL_R8 rather than GL_RED? (I'm assuming that type is GL_UNSIGNED_BYTE.)



So, I tried to drop PBOs alltogether and start using TBOs (texture buffer objects) .. with TBOs there is no conversion - they are more like plain byte buffers, right?

Buffer textures essentially just allow buffers to be accessed with texelFetch(); the data is treated as a 1D array of texels in any format available to textures (between one and four components consisting of signed or unsigned integers, signed or unsigned normalised values, or floats).



However, TBOs don't give be dma to the GPU (this works with PBOs). This one:

payload = (GLubyte*)glMapBuffer(GL_TEXTURE_BUFFER, GL_WRITE_ONLY)

gives me a null pointer.

Is anything bound to GL_TEXTURE_BUFFER? Note that GL_TEXTURE_BUFFER is a convenience target, similar to GL_COPY_READ_BUFFER and GL_COPY_WRITE_BUFFER. Binding a buffer to that target doesn't have any effect beyond associating the buffer with that target.

A buffer texture is created using glTexBuffer() or glTextureBuffer(), which just takes an existing buffer and wraps a buffer texture around it. The data isn't copied; the buffer texture references the underlying buffer, so changes to the buffer affect subsequent reads from the texture.

Dark Photon
01-24-2018, 05:48 AM
I'm trying to stream video using OpenGL textures. Some features of my programming architecture are:


Direct GPU memory access using PBOs - works great ...
Textures present LUMA and CHROMA planes that given to a shader program. ...

However, there is an bottleneck that is driving me nuts when I'm copying from PBO to textures.

I have traced this issue into the format/internal_format pair in
glTexImage2D(GL_TEXTURE_2D, 0, internal_format, w, h, 0, format, GL_UNSIGNED_BYTE, 0)


If you're searching for texel transfer "fast paths" in the driver, I'd recommend that you use glGetInternalFormativ() (https://www.khronos.org/opengl/wiki/GLAPI/glGetInternalformat) and just ask your GL driver what they are. This API lets you query (among other things):


what internal formats it prefers (i.e. which are likely the ones natively supported by the GPU+GL driver),
which format/type you should use when feeding data into and out of that internal format (to avoid internal conversions) when using different transfer APIs.

Here's some code to copy/paste:


const GLenum target = GL_TEXTURE_2D;
GLint supported, preferred, format[ 3 ], type[ 3 ];

glGetInternalformativ( target, intFormat, GL_INTERNALFORMAT_SUPPORTED , 1, &supported );
glGetInternalformativ( target, intFormat, GL_INTERNALFORMAT_PREFERRED , 1, &preferred );
glGetInternalformativ( target, intFormat, GL_TEXTURE_IMAGE_FORMAT , 1, &format[0] );
glGetInternalformativ( target, intFormat, GL_TEXTURE_IMAGE_TYPE , 1, &type [0] );
glGetInternalformativ( target, intFormat, GL_READ_PIXELS_FORMAT , 1, &format[1] );
glGetInternalformativ( target, intFormat, GL_READ_PIXELS_TYPE , 1, &type [1] );
glGetInternalformativ( target, intFormat, GL_GET_TEXTURE_IMAGE_FORMAT , 1, &format[2] );
glGetInternalformativ( target, intFormat, GL_GET_TEXTURE_IMAGE_TYPE , 1, &type [2] );


You don't really care about the glReadPixels and glGetTex*Image() behavior, so ignore the last 4. I'm just including those for completeness. The info you care about should be provided by the first 4 lines.

If you use this to buzz out various internal formats on an NVidia GL driver for instance, you'll find that it really doesn't like internal formats for 2- and 3-component uncompressed textures. So prefer 1- and 4-component internal texture formats. And of course when possible, you want the data you upload (format/type) to already match the internal format.


My case would be GL_RED/GL_RED .. but that sucks big time. With that pair, "glTexSubImage2D" is hundred times slower than with GL_RGBA/GL_RGBA.

As GClements mentioned, I'd try buzzing out internal format GL_R8 (with format/type = GL_RED / GL_UNSIGNED_BYTE). If your driver's just weird/old, check out the legacy internal format GL_LUMINANCE8 (with format/type = GL_LUMINANCE / GL_UNSIGNED_BYTE).


If format and type match internalFormat, there shouldn't be any conversion.
In my experience, that depends on the driver. Not all supported internal formats seem to be natively supported by the GPU+driver without extra expense in texel transfers to/from those formats.

sampsa
01-24-2018, 11:57 AM
Hi,

Thanks for the comments. Eventually I found out that there is not that much difference between GL_RED and, say, GL_BGR (the latter is around 1/3 faster in my intel-gfx based laptop).

The PBO=>TEX seems to take a few milliseconds for a 1080p frame. I guess that's the best we can do?

The bottleneck in my case seems to be elsewhere .. please do take a look at my follow-up question:

https://www.opengl.org/discussion_boards/showthread.php/200394-glxSwapBuffers-and-glxMakeCurrent-when-streaming-to-multiple-X-windowses

For texture uploading, I put some tests in a single file here:

https://github.com/elsampsa/opengl-texture-streaming