PDA

View Full Version : Direct GL_YUV, GL_YCbCr or YCoCg texture support ?



Yann LE PETITCORPS
11-16-2009, 04:44 PM
Why OPENGL haven't a standardised texture support for standard TV digital video format such as YUV or YCbCr in 4:2:2 or 4:1:1 format ???

In first view, we have to use a color matrix for this
=> why this haven't to be upgraded since a long time ???

Alls videos are made with a sort of YUV format, so we have to systematicaly convert it from YUV to RGB before send it to an OPENGL texture => it's a lot of loose time for nothing ...

Alls recents graphics cards seem have a support for YUV format, so this can't be a technical problem ... and only some add/mul aren't really hard to integrate into a graphic chipset or the API : )

It's really mysterious for me to understand why this haven't yet already exposed into the OpenGL API (because this format exist since a lot of decades ...)


@+
Yannoo

Yann LE PETITCORPS
11-16-2009, 05:02 PM
It is the same thing with some hypotetics GL_MPEG1, GL_MPEG2 and/or a relatively new GL_MPEG4 supports for video texturing too :)

This work already very fine and in real-time with avicodec that decode MPEG1/MPEG2/MPEG4 streams that I can easily put into OPENGL textures ...

So ... => WHY ???


@+
Yannoo

Yann LE PETITCORPS
11-16-2009, 05:29 PM
I think that the Mesa extension YCBCR_422_MESA is something like I want and that Apple have make a similar (but really slow from what I have can read) extension in the past ...

I test if this is exposed into my various PCs, EEEPC or Macintoshs that have alls differents graphics cards and drivers ...


@+
Yannoo

Dark Photon
11-16-2009, 06:40 PM
Why OPENGL haven't a standardised texture support for standard TV digital video format such as YUV or YCbCr in 4:2:2 or 4:1:1 format ???

...

It's really mysterious for me to understand why this haven't yet already exposed into the OpenGL API (because this format exist since a lot of decades ...)
OpenGL is a 2D/3D graphics rendering API, not a video processing API. Is that to say you can't do video processing using OpenGL? No. But it's to say that there aren't custom APIs geared around it.

I suggest you check out APIs such as the X.Org Xv library, which already has built-in support for playback with these video formats, with GPU acceleration when available, and APIs such as NVidia's VDPAU, which provides GPU-assisted decode and playback of even very compute-intensive formats like h.264.

feelgood
11-16-2009, 07:54 PM
A number of vendors (i.e. Apple, ...) have had support for rendering YCbCr 4:2:2 formats through the OpenGL API. For example on Mac OS X (going all the way back to 10.2.x) you have support for GL_APPLE_ycbcr_422 which allows one to read, store and optionally process texture data in (signed, unsigned) YCbCr 4:2:2. Considering that consumer HW has had direct (texture) support for 4:2:2 for quite some time the performance cost of this particular extension is effectively nonexistent.

Snow Leopard gives you more control over the server side conversion from YCbCr into RGB by way of GL_APPLE_rgb_422. The latter extension allows one to generate their own (client side) conversion from YCbCr into RGB via the programmable pipeline or through another method.

If you need support for other YCbCr formats (i.e. 4:2:0, 4:4:4, ...) you are much better off using the existing (L, A, RGB) texture path and the programmable pipeline to accomplish this.

yooyo
11-17-2009, 02:23 AM
YUV422, 411, 420 and all sort of it is easy to implement via shaders, and render targets. For example to perform YUV422 to RGB you need:
1. RGBA texture contains YUV422 samples
2. FBO with RGB target texture bound
3. draw screen aligned textured quad with appropriate fragment shader
In shader you need to resolve sampling (422 in this case), collect YUV samples, convert to RGB (using conversion matrix provided by app) and write RGB pixels.

Then just use RGB version of texture.

In above example I choose RGBA texture to store YUV422 samples. YUV422 format looks like:


Y0 U Y1 V Y0 U Y1 V Y0 U Y1 V... and it perfectly fits in
R G B A R G B A R G B A

So.. for a given YUV422 @ width x height, create width/2 x height RGBA texture and just fill raw YUV422 data. Also create target RGB texture width x height and bind it to FBO.

In shader (depending on even/odd gl_FragCoord.x coordinate) choose Y0 U V or Y1 U V and perform YUV to RGB conversion.

Other format and samplings is easy job to do.

Groovounet
11-17-2009, 02:26 AM
I first thought: Great idea! And then I realize that you can store the YUV texture into several textures and decode them within a shader.

Maybe a dedicated format could be more efficient on the GPU side but I get the limitation will be on the image streaming.

mfort
11-17-2009, 02:38 AM
I am pretty against the idea to have YUV support in OpenGL.
I can understand that in 2.x era. But not today with all the shader functionality.

The problem with YCrCb 422 & co. is that there are so many combinations or memory format. (CrYCbY, YCrYCb,....)

There are several color matrices to convert to RGB (CCIR601, CCIR709, FullRange).

Then you can do have several ways to upscale from 422 to 444. Using simple upscale or linear filtered upscale or more sophisticated upscaling.

Then you have 8, 10, 12 bits per channel.

Hardwiring all this into driver is pretty agains the idea to have few fixed func. features and more programmable features.

I'd more understand to make some utility library to handle all this.

Eosie
11-17-2009, 09:36 AM
If hardware supports it directly, there is no reason for it not to be included in the API. It's in Apple's implementation and Mesa, so it probably should be standardized. It's worth noting that Xv uses 3D hardware these days anyway.

Groovounet
11-17-2009, 10:41 AM
The video decoder is a separed component from the GPU for some company. I don't know about ATI or nVidia but even if it's on the same chip it doesn't means that the video decoded is really integrated with the rest of the chip. "It lives by it's own".

Yann LE PETITCORPS
11-17-2009, 01:54 PM
Responses about YUV shaders seem to me such as an emission that we name "Les chiffres et les lettres" :)

I explain ...

We have for example only the numbers 1, 7, 9, 5, 10, 8, 3 and 2 to use
And we have for example to found the number 92

7-1 = 6
10 * 8 = 86
6 + 86 = 92 => yes, good we have found the good formule that give the good final value :)

In OPENGL, this seem to be about the same case but it's more comic because we have to handle relatively complex pixel's shaders that combine formulas that are a lot more difficult for to make the YUV->RGB conversion


But something like

glTexImage2D( target, level, GL_RGB, width, height, border, GL_YUYV, GL_BYTE, texels)

seem to me very very more simple to handle that a pixel shader and doesn't present the lot of incompatibilities between different card/vendor/opengl shader's versions that exist today (when and only when the card/driver support pixel shader of course ...)


@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 01:58 PM
Note that I have volontary make an error on the "chiffres et les lettres" solution

This is because I have found a lot of bad YUV to RGB conversion formulas on the net ... :)

=> the simplest response can be too the better ...

@+
Yannoo

mfort
11-17-2009, 02:10 PM
glTexImage2D( target, level, GL_RGB, width, height, border, GL_YUYV, GL_BYTE, texels)


Maybe you did not understand my claim. There are about 50 (or more?) combinations/formats/standards for YUV packing.
Should OpenGL implement all of them?

But ask yourself. Is there anything that blocks you to write application that displays image in YUV format? Is it slow?
The answers are: no, it can be done, and it is as fast.

Don't worry to write simple shader. In the end you realize that you add few more things to the shader. Like gamma correction, deinterlacing, ... . Then the 5 lines of code for YUV->RGB is nothing.

Yann LE PETITCORPS
11-17-2009, 02:19 PM
For an idea about what is already standardised in OPENGL :


Table 1: OpenGL Internal Texture Formats. Each internal texture format has a corresponding base internal format and its desired component resolutions.
Sized Base R G B A L I
Internal Format Internal Format bits bits bits bits bits bits
ALPHA4 ALPHA 4
ALPHA8 ALPHA 8
ALPHA12 ALPHA 12
ALPHA16 ALPHA 16
LUMINANCE4 LUMINANCE 4
LUMINANCE8 LUMINANCE 8
LUMINANCE12 LUMINANCE 12
LUMINANCE16 LUMINANCE 16
LUMINANCE4_ALPHA4 LUMINANCE_ALPHA 4 4
LUMINANCE6_ALPHA2 LUMINANCE_ALPHA 2 6
LUMINANCE8_ALPHA8 LUMINANCE_ALPHA 8 8
LUMINANCE12_ALPHA4 LUMINANCE_ALPHA 12 4
LUMINANCE16_ALPHA16 LUMINANCE_ALPHA 16 16
INTENSITY4 INTENSITY 4
INTENSITY8 INTENSITY 8
INTENSITY12 INTENSITY 12
INTENSITY16 INTENSITY 16
R3_G3_B2 RGB 3 3 2
RGB4 RGB 4 4 4
RGB5 RGB 5 5 5
RGB8 RGB 8 8 8
RGB10 RGB 10 10 10
RGB12 RGB 12 12 12
RGB16 RGB 16 16 16
RGBA2 RGBA 2 2 2 2
RGBA4 RGBA 4 4 4 4
RGB5_A1 RGBA 5 5 5 1
RGBA8 RGBA 8 8 8 8
RGB10_A2 RGBA 10 10 10 2
RGBA12 RGBA 12 12 12 12
RGBA16 RGBA 16 16 16 16

So, why not a little new GL_YUYV24 or something like this at the end of this ????

Heu ... have you really make a day only one shader that can handle alls formats that are present here ??? :)

@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 02:26 PM
Note that the LUMINANCE internal format is already standardised :)

=> so we have **ONLY** the UV to add at this (that can easily be computed by a very simple yuv2rgb table association ...)


@+
Yannoo

mfort
11-17-2009, 02:45 PM
Cheap arguments.

Everybody knows OpenGL has lot of legacy stuff.
All the intensity and luminance formats are dead end because they cannot be used as render target. Most of the formats you listed are deprecated. Some replaced by GL_ARB_texture_rg

Regarding your GL_YUYV24.
Look here: http://www.fourcc.org/yuv.php
There are dozens of YUV formats. Each of them can be interpreted in a dozen different ways. (color matrix, upscale filtering, ...)

OpenGL is not a format conversion library. It is a tool to render stuff. And as far as it can render YUV video it does its job.

Please think of it. Write you shader. You will be happy that you have full control of your pipeline.

Yann LE PETITCORPS
11-17-2009, 02:57 PM
Mfort, please test to understand that I don't speak about a multitude of internals formats, but only one ...
=> how many different RGB(A) format OpenGL handle please ?

On another side, I see about what is the GL_ARB_texture_rg ...

@+
Yannoo

mfort
11-17-2009, 03:07 PM
I know you want only one .... right now. Tomorrow another one ... In 6 months you will ask for 50 combinations.
OpenGL should be generic enough but still related to 3D rendering.

OpenGL has several extensions regarding YUV formats:
GL_EXT_422_pixels
GL_APPLE_ycbcr_422
GL_SGIX_ycrcb
none of them gained ARB "stamp" nor reached core. It indicates something.

(my last post in this thread)

Yann LE PETITCORPS
11-17-2009, 03:09 PM
And the shufps SSE instruction already exist for example :)

@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 03:12 PM
Thanks, the GL_EXT_422_pixels seem to be a good start ...


@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 03:15 PM
With the first C remplaced by a U and the second by a V, this can be a solution :)

@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 03:22 PM
The idea is to haven't to copy/transform the lot of data that various webcams, PCTVs, DVD or others MEPG1/2/4 can output ... but **DIRECTLY** use the YUV output of this to an OpenGL texture ...


@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 03:25 PM
From what I have read, video card use now busses that are very more speed that a "simple" PCI bus and the DXT texture compression is relatively commun on recent graphics cards ... this is perhaps not for nothing ...


@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 03:42 PM
Can you a second think that :

YUV (PCTV) -> RGB -> CPU -> GPU -> RGB -> YUV (TV screen for example)

can be more speed that a very more simple

YUV (PCTV) -> GPU -> YUV (TV screen)

If yes, please can you explain your logic please ? :)

@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 04:03 PM
OpenGL is theorically an "Open Graphic Library", so I don't see why it can be limited to make only 3D image ...

It can be normally used for handle "simple" 2D imaging with various effect too ...

Graphics isn't limited to the 3D demain, so why think that the 3D can deprecated what the OpenGL library have make for a long time before ???

For me, it's clearly a regression ... and certainly not an evolution ...

A long time ago, I have found that OpenGL was really more simple to use on various OS / hardware than the DirectX adverse ...

In first view, OpenGL have begin now to be more complex to handle than DirectX :(

Perhaps (but perhaps not) some lastest software industries that have join the OpenGL group have something to see with that ...



@+
Yannoo

ZbuffeR
11-17-2009, 05:00 PM
Can you a second think that :
YUV (PCTV) -> RGB -> CPU -> GPU -> RGB -> YUV (TV screen for example)
can be more speed that a very more simple
YUV (PCTV) -> GPU -> YUV (TV screen)
If yes, please can you explain your logic please ? :)

And who talked about CPU-side YUV->RGB conversion ?
mfort rightfully said that fast hardware conversion can be done in a shader, wich will both avoid the CPU conversion step and allow a lot of customization in the conversion.
So :
YUV (PCTV) -> CPU -> GPU with proper shader.

By the way :
GPU -> YUV (TV screen) and GPU -> RGB -> YUV is exactly the same.

Please stop thinking out loud on your keyboard :D

Yann LE PETITCORPS
11-17-2009, 05:40 PM
Thank for you response Zbuffer,

And alls my escuses about my "lourdeur" :)

Effectively, GPU -> RGB -> YUV is "exactely" the same thing as GPU -> RGB (->YUV)
For example, with a SVGA output, we haven't to make the last YUV conversion.

No my problem isn't really at the last RGB->YUV conversion (that is make in hardware), but this is about the input in the GPU

=> why we have to make this with a shader that we have to write ourself and have finally a big program when the graphic card have already all the circuitry for to make this "tranparently" and that we we have only to replace GL_RGB to GL_YUYV in 2 or 3 funcs on an existant program for upgrade it for that it directely handle TV/DVD/MEPGx sources without big modifications ???

For exemple, when I use an glVertex3f(x,y,z), I haven't to understand exactely what transistor on what part of the GPU handle this ... it's the API that handle normaly this for me ... This is such as create a vertex shader for only display a triangle ...

With shaders, I have the impression that alls good things that OpenGL have win the first years and for a long time, are now very very fast deprecated ... for to be replaced by something that isn't really standardised and very more hard to handle ?

The vast majority of OpenGL's tutorials does't say one line about shaders ... but the majority say about standard techniques for handle textures that use only a glTexImage2D and nothing about shaders.

So my remark is very simple : why make something difficult (cf. I have myself to do all the conversion in shader) when this can be very simple to make (with an API that handle this for me) ???



@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 05:45 PM
But OK, this can be done on a extern API ... but without the hardware accelleration ...


@+
Yannoo

Alfonse Reinheart
11-17-2009, 05:47 PM
why we have to make this with a shader that we have to write ourself and have finally a big program when the graphic card have already all the circuitry for to make this "tranparently" and that we we have only to replace GL_RGB to GL_YUYV in 2 or 3 funcs on an existant program for upgrade it for that it directely handle TV/DVD/MEPGx sources without big modifications ???

1: Because hardware does not necessarily have this. I'd bet that some driver makers are doing a lot of this stuff internally in optimized shaders.

2: Because it's a waste of their time to do something that we can do easily enough.


The vast majority of OpenGL's tutorials does't say one line about shaders ... but the majority say about standard techniques for handle textures that use only a glTexImage2D and nothing about shaders.

That's because the majority of OpenGL's tutorials are very old. That's a failing with OpenGL's tutorials, not with graphics cards.

Yann LE PETITCORPS
11-17-2009, 05:58 PM
And cannot this be a good thing that to easily divide the size of data that is transfered via the AGP/PCI/VLB bus to/from the GPU card ???

@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 06:02 PM
Personally, I don't know a lot of persons that know the YUV to RGB conversion formula ...

But I know a lot of persons that know how to display one RGB image via opengl :)



@+
Yannoo

yooyo
11-17-2009, 06:05 PM
YUV formats usually contains macropixels. Depending on horizontal and vertical sampling one macropixel cover one or more RGB pixels. There is also packed and planar YUV formats. Packed format have interleaved Y, U and V. Planar formats have all Y valuse, then all U then all V, or all Y then left half U, right half V... too many veriations

It would be really nice to have such conversion, but because it is possible to do on current hw and API to achive same functionality, I doubt that your request will reach driver developers.

Oh.. I almost forgot.. any chance to get glEnable(GL_SHADOWS) and glDisable(GL_SHADOWS) in opengl? :)

Yann LE PETITCORPS
11-17-2009, 06:10 PM
Perhaps on the futur, but only when OpenGL can handle triangles such as the raytracing or the radiosity handle this :)

Object instanciation is something that can help a lot about this ..


@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 06:17 PM
Various S(VGA) definitions are too packed or planar ... interleaved or not .. and this don't seem to be a problem when I see the evolution from the CGA 4 colors from first PC computers and the HDs definition that we have now ... and I speak only about the two last decades :)

@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 06:27 PM
Yooyo, a macroblock is "only" something such as a 8x8 bloc of pixel (cf. something like we name a tile), such as video contain more that one image, one image contain more than one tile/macrobloc ... where each bloc can be treated more or less independantly ...

@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 06:37 PM
"Only" 4:1:1, 4:2:0, 4:2:2 and 4:4:4 format have to be handle ... and they can easily to be converted from one to another.

So the more hard task is to handle the packet/planar organisation, and I don't think that is too hard task.


@+
Yannoo

Dark Photon
11-17-2009, 06:41 PM
Oh.. I almost forgot.. any chance to get glEnable(GL_SHADOWS) and glDisable(GL_SHADOWS) in opengl? :)
Damn, that's brilliant! We need to get you on the ARB! :D And here all this time I've been doin' it the hard way... :p

Brolingstanz
11-17-2009, 06:43 PM
Ten quatloos APPLE_rgb_422 is adopted, said the first.
Twenty quatloos it is adopted and deprecated, said the second.
Fifty quatloos the majority moves to abstain, said the third.
One hundred quatloos the motion is set aside, said the first.
Two hundred quatloos the motion is referred to a committee, said the second.
Five hundred quatloos all motions are postponed indefinitely, said the third.

Yann LE PETITCORPS
11-17-2009, 06:45 PM
And YUV formats that can be handle can be volontary limited for only handle the 2 or 3 most commons cases ... a little support is far better that nothing ...


@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 07:01 PM
The rasterisation of OpenGL is obligatory for handle things such as the orthogonality on multiple passes ... or the sames results on various implementions

Now, pixels shaders can make something such as the raytracing do (it's the idea, make a "complex func" on each pixel), so we can perhaps a day avoid multiples passes with the uses of shaders ...

So perhaps a day OpenGL can loose the rasterisation rules for begin to handle some more advanced techniques such as raytracing (and object instanciation is really a very good thing about this evolution) .. but ok this is a dream :)


@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 07:14 PM
For informations, I have begin with a PC 1512 with a "CGA extented" 640x200 16 couleurs, 8 Mhz, 512 Ko RAM, one 5 1/4 floppy drive and no DD.

Now, I have computeurs with Go of ram, Terabytes of DD that have processors at more that 2 Ghz and this handle more that one millions of pixels.

=> the multiplication is about a factor of x1000
==> if we speak in time, it's like hours in 1980 have now become seconds in 2009 about the computers power ... :)

Something like the millisecond isn't really a short time now for computers, their internal work time is more the nanosecond or less now ...


@+
Yannoo

Yann LE PETITCORPS
11-17-2009, 07:30 PM
I have never can to make a true 3D program that work as speed as I want with the PC 1512 ...

With only the quart of this price, I have now an EEEPC 701 that is really more compact, have a lot more of ram, dd and CPU power, and that can easily make some basics 3D / Video traitments in real-time ...


@+
Yannoo

Yann LE PETITCORPS
11-21-2009, 07:29 AM
I can now handle multiples video textured quads and make a mixing between them in real time with OpenGL with a very little CPU usage on my EEEPC :)

This can effectively to be make without any modification to OpenGL, but only with new fonctions that "use and upgrade " the various standards OpenGL gl*Text* funcs for to add a new "pseudo-mipmap autogeneration with GL_YUYV support" :)

When I see the attitude of lot of persons in this thread that don't want to see something to be upgraded in OpenGL, I prefer to handle this personnaly and can to have a little more detached view of things ...

=> when I want really something, I can generaly to do what I want
(that's only a question of time and a little work ...)

==> but when we really don't want something, we can't of course ...


@+
Yannoo

Yann LE PETITCORPS
12-29-2009, 03:58 PM
I have found on the net a nice fragment shader that make the YUV to RGB conversion (thanks to Peter Bengtsson)

I have a little modified it for to can directly use the 4:2:2 YUV images used by various v4l2 and/or ffmpeg sources and with the use of only one texture unit.

Now, I can map videos texture that come from an video file such as .avi and .mp4 on various 3D object (and so can rotate/resize/compose multiples videos streams into each face of "a lot" of animated and spinned cubes) => my dream since a very long time is now a reality :)

This work (very) well on Mac OS and Linux and I work in this time about port this feature into the eeepc word.

For the pleasure, here is the fragment shader :

uniform sampler2D tex;

void main(void)
{
float nx, ny, r, g, b, y, u, v;
float u1,u2,v1,v2;

nx = gl_TexCoord[0].x;
ny = gl_TexCoord[0].y;

y = texture2D(tex, vec2( (nx), (ny)*(4.0/6.0) )).r;
u1 = texture2D(tex, vec2( (nx/2.0), (ny+4.0)/6.0 )).r;
u2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+4.0)/6.0 )).r;
v1 = texture2D(tex, vec2( (nx/2.0), (ny+5.0)/6.0 )).r;
v2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+5.0)/6.0 )).r;

y = 1.1643 * (y - 0.0625);
u = (u1+u2)/2.0 - 0.5;
v = (v1+v2)/2.0 - 0.5;

r = y + 1.5958 * v;
g = y - 0.39173 * u - 0.8129 * v;
b = y + 2.017 * u;

gl_FragColor=vec4(b,g,r,1.0);
}

And the calls for to create/bind/load/update the 4:2:2 YUV stream into a standard GL_LUMINANCE OpenGL texture :

glGenTextures(1, &texID); // Generate the YUV 4:2:2 handle
glBindTexture(GL_TEXTURE_2D, texID); // and use it
glTexEnvf(GL_TEXTURE_2D, GL_TEXTURE_ENV_MODE, GL_REPLACE); // note that GL_REPLACE is certainly not the best thing for video mixing ...
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); // Linear Filtering seem a good compromise between speed/quality
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); // this seem the same thing for the magnification and minification

glBindTexture(GL_TEXTURE_2D, texID); // update the YUV video texturing unit
glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE, width, (height*3/2), 0, GL_LUMINANCE, GL_UNSIGNED_BYTE, pictures_queue[pictures_read++]); // with a new frame/picture (generated by libavcodec and/or v4l for example)

The height*3/2 formula is used because the U and V planes are immediately after the Y plane and have only dimension of width/2 and height/2, so the total of the YUV planes is only 1.5x the size of the width*height grey Y plane.

I work too about using the u2 and v2 texels for to can handle things such as alpha, stencil and multiples audio channels into this YUV texture in a very near future (for to optimize/standardize the memory access that are used by the alpha, stencil and YUV planes/channels and because I think that it is necessary for to can handle "artistics interpolations" between multiples audio/video textures into the fragment shader without sacrify a lot of texture/sound memory/handles) and begin to see about "how to fastly/easily compact all this into a sort of compressed DXT texture".

=>I propose a new GL_LUMINANCE_CHROMINANCE_ALPHA_STENCIL_AUDIO_VIDEO _MIPMAP_42222_EXT token that can help to combine GL_LUMINANCE_ALPHA/GL_422_EXT with ffmpeg/V4L2 and OpenAL/FluidSynth APIs :)





@+
Yannoo

yooyo
01-02-2010, 02:26 PM
422 texture nicely fits into RGBA or BGRA texture. Just create w/2 x h RGBA texture and upload YUV422 raw data as GL_BGRA. With this you have simpler fragment shader code:



uniform sampler2D tex;
vec4 mp = texture2D(yuyv_tex, coords); // fetch 422 macropixel YUYV
// now... mp.r-g-b-a is actually mp.y1-u-y2-v
// depending on even/odd fragment x position choose y1-u-v or y2-u-v

Yann LE PETITCORPS
01-05-2010, 01:42 PM
I'm very sorry because I have make a mistake, it's the 4:2:0 format that this shader support (cf. the U and V planes are just after the Y plane and they have each a w/2 and h/2 size of the Y plane).

But thanks for the tips, because this conversion seem me effectively very more simple and can easily resolve my problem on the eeepc plateform that don't seem to have shaders support on Linux (because I have only to convert one YUYV "bloc of two texels" directly on the CPU for to have two RGB(A) texels on the final texture).

Ok the picture size on the 422 format is biggest that the "same" picture on the 420 format but this seem very simple to make and give me the possibility to not have the need to use shaders on the eeepc plateform.

So, this is really a good new but this doesn't really resolve my real problem that is a true hardware shaders support on the eeepc plateform :(


@+
Yannoo

yooyo
01-06-2010, 04:16 PM
Im thinking to develop library to handle all those YUV formats. Frontend will. E simple. Provide input yuv image with proper sampling information (422 420 411 etc), define color conversion matrix and run conversion. Result is nice rgb texture. Im planning to support transfer using PBO and 10bit images.

Btw, why you use eeepc? This little machine have miserable Intel integrated graphics. Atm only netbooks with nvidia ion can be used for serious 3e graphics.

Yann LE PETITCORPS
01-06-2010, 06:29 PM
Because a big PC, Mac or UN*X workstation cannot be used on a car, bus or metro because this isn't very portable and in my country we haven't electrics alimentations into the street :)

And I want that this can be used by everybody, so a very cheap eeepc is for me a good choice for that ... and I think about the autonomy too (I want that this can work more that one or two hours without an extern alimentation, for to work/play with it in travel or in a desertic country for example) .

And perhaps too that for me, more is hard to make, more I like it :)
For example, I think to adapt this for to work with a very slow @café (a very cheap portable at 199 euros in French) and the PocketPC and iPhone plateforms.

I'm very interresting too by a small API (but fast and multi-plateforms, it's principally why I like particulary the OpenGL API) that can make a very fast conversion between various YUV formats and RGB(A) :)

And my problems with vertex/fragment shaders seem to be resolved on the eeepc plateform, so I really think that the eeepc can be really more speed in 2D/3D animation with the 3D hardware accelleration than a big computer without ...

@+
Yannoo

Yann LE PETITCORPS
01-18-2010, 04:44 PM
The fragment shader way work very nicely :)

I have make a video player around it that use avcodec and OpenGL and that can display in real time AVI/MPEG/V4L videos streams on various 3D surfaces, and this with less that 10% power of the CPU on my iMac.

This work fine from YUV 4:2:0 planar frames that come from libavcodec or various V4L devices streams for example (cf. one Y width*height plane, followed by one (width/2)*(height/2) U plane and a (width/2)*(height/2) V plane).

I search now about something for to efficiently mapping/tunnelling IPBBPBBPBB... GOPs (group of pictures) into GPU textures units (cf. successives frames in a MPEG file become successives textures units in the GPU).

I read/decode now each frame one by one but begin to think about to read/decode an entire GOP in one block (this is certainly very more faster).

Something that permit the GPU to load an entire GOP from a video streams into buffers, buffers that it can directly reuse as input into GPU textures units for to permit a lot of wonderfull inter-frames special effect such as fade-in and fade-out and support multiples video stream with the less % CPU possible (cf. the GPU make the more intensives computations)

I have now this :

- compressed MPEG/AVI or a raw video frame input from a webcam

- decompression of a frame in system memory

- load this frame into the texture unit with glTexImage

- make the YUV to RGB transformation at the final fragment shader level

And I want

- compressed MPEG/AVI/RAW input

- fast decompression of a entire GOP into the system/video memory

- load this GOP (or a part of it) into textures units
(where pictures can be recompressed into DXT/ST3C/JPEG formats for to limit the video memory used )

- the YUV to RGBA conversion is always make at the fragment shader level
(but with one frame per texture unit, so I think this is certainly limited to I and P frames on commons video card)

I look too for to handle another more efficient GOP streaming from libavcodec/v4l to OpenGL texture units, and permit very fast (but acurate ...) forward/forward playing and seeking on video streams with the help of the IPBBPBBPBB... IPBB informations that we have in GOPs.


@+
Yannoo

yooyo
01-19-2010, 07:34 AM
Current API (OpenGL or D3D) still doesnt expose enough functionality to decode frame on GPU. GL and D3D is not designed for that. Shaders require a lot of features in order to decode stream.
You can do this using CUDA but this is limited to NVidia only. CUDA video is realy fast. My NV 8800GT can decode and display 720p in 150-170fps.

Also you can get some of GPU hw decoding functionality using DXVA (on windows) or VDPAU on Linux.
http://http.download.nvidia.com/XFree86/vdpau/doxygen/html/index.html
http://www.phoronix.com/scan.php?page=article&item=xorg_vdpau_vaapi&num=1

You can use hybrid solution..What you could do is to use PBO. Create several PBO buffers (2-4), map them all an mark as 'avaible'. Decoding thread pick one of 'avaible' PBO buffers, decode frame directly to that buffer and map it as 'ready'. Renderer thread in loop search for 'ready' buffers, unmap them, upload its content to texture and render. Later, this buffer can be mapped again and reused.
If you provide enough PBO buffers you can stream and display multiply video streams at same time.

Now.. a month ago I introuced that I could develop conversion library. I did spent some time on it by designing interface.. So far it looks like:


// initialise Video4GL library
// initialise gl extensions and register builtin conversion classes
GLint v4glInit();

// delete all stuff
GLint v4glShutdown();

enum v4glOutputPixelformat
{
V4GL_RGB,
V4GL_RGBA,
};

// convertor stuff
GLuint v4glCreateConvertor(unsigned int in_fourcc, v4glOutputPixelformat out_format); // input fourcc, output pixelformat. return convertor handle
void v4glSetColorConverstion(GLuint conv, GLfloat* matrix); // input: converter and matrix. set conversion matrix or use default
void v4glSetColorConverstionAlpha(GLuint conv, GLfloat alpha); // input: converter and alpha channel... only for RGBA output when input doesnt have alpha. default alpha 1.0f
void v4glDeleteConvertor(GLuint conv);
void v4glProcessPendingConversions(GLuint conv);


// image stuff
GLuint v4glCreateImage(GLuint conv, GLuint width, GLuint height); // creates image. return image handle
void v4glSetImageData(GLuint image, void* data); // fill image data (from decoded stream)
GLuint v4glGetOutputTexture(GLuint image); // returns rgb texture id
void v4glDeleteImage(GLuint image); // delete image



and usage example


// initialise gl extensions and register builtin conversion classes
v4glInit();

// create convertor YUY2 to RGB
GLuint conv = v4glCreateConvertor(FOURCC(YUY2), V4GL_RGB);
// and two streams HD1080 and PAL
GLuint img1 = v4glCreateImage(conv, 1920, 1080);
GLuint img2 = v4glCreateImage(conv, 720, 576);

...
// fill YUY2 data. It will mark internal flag need_processing
v4glSetImageData(img1, pointer1);
v4glSetImageData(img2, pointer2);

// this call process all pending processing related to specific converter
v4glProcessPendingConversions(conv);

// now is safe to use RGB versions. This is opengl texture id's
GLuint rgbimg1 = v4glGetOutputTexture(img1);
GLuint rgbimg2 = v4glGetOutputTexture(img2);

// render using rgbimg1 and rgbimg2

...
// this call will delete and invalidate all images
v4glDeleteConvertor(conv);

// at exit shutdown library
v4glShutdown();


Each fourcc format require conversion class and specific shaders. Most of fourcc formats can share shader code and sampling so with carefull planing it is possible to write generic converter to handle most of fourcc formats.

Yann LE PETITCORPS
01-19-2010, 12:38 PM
I have resolved a lots of problems about multi-pictures and GOP support :)
=> I use now a 3D texture where slices are pictures in the GOP
==> so I can now access alls pictures in a GOP, but with only the use of one texture unit (cf. mixing between multiples video textures and something such as "temporal mip-mapping" into the fragment shader are possibles)

I have now to see with avcodec devellopers how can I have the more speedly an entire decompressed GOP in memory (for the instant, I decompress picture by picture and make a GOP after 8, 12 or 16 pictures decompressed) and I think too to add rapidly an S3TC or DXT compression for to limite the size of "decompressed" pictures in VRAM (and/or to see how to share efficiently the data between the video card and the system memories or how working with very fast VLB/AGP/PCI memory transferts)

Yoyoo, your v4gl API seem good, can I have access to this ?

(It can certainly to be interresting to include this work into v4l_convert* fonctions provided by the V4L library for example)

How can I find a good tutorial that explain precisely how we can use pictures decompressed with VDPAU (I only find things that seems directly use the VDPAU API for displaying, but not how to get pictures outside the API) ?


@+
Yannoo

ZbuffeR
01-19-2010, 01:14 PM
one texture unit (cf. mixing between multiples video textures into the fragment shader is possible)
A shader CAN access to multiple textures, 32 on my gtx275 card, even if max multitexture is only 8.

Yann LE PETITCORPS
01-19-2010, 01:31 PM
Yes, it's true.

But with GOPs of more that 16 pictures, I cannot to have 2 GOPs already loaded into texture units if I use one texture unit per picture.

And I prefer not to loose the multiples textures units layered power ... for to can handle a mix/fade-in/fade-out/incrustations between 32 video streams for example :)

Plus the fact that we have only to use one texture unit per video stream seem me more natural and simple for to be the more user-friendly possible (and this certainly permit to a compagny that make the hardware to implement/map this very easily).


@+
Yannoo

ZbuffeR
01-19-2010, 01:42 PM
I see now, and your post is clearer.

Yann LE PETITCORPS
01-19-2010, 02:07 PM
==> so I can now access alls pictures in a GOP, but with only the use of one texture unit (cf. mixing between multiples video textures and something such as "temporal mip-mapping" into the fragment shader are possibles)

Is this ?

I complete/correct sometime my old post for to be "more precise".

This is never for delete something but only for to correct or test make a better traduction.


@+
Yannoo

Yann LE PETITCORPS
01-19-2010, 02:34 PM
Each GOP is transformed into a 3D texture composed of slices of width*height consecutives pictures.

=> we can map the frame at time timestamp to a 2D quad from (x0,y0) to (x1,y1) in the z depth using :

glBegin(GL_QUADS);
glTexCoord3f(0,0,timestamp); glVertex3f(x0,y0,z);
glTexCoord3f(1,0,timestamp); glVertex3f(x1,y0,z);
glTexCoord3f(0,1,timestamp); glVertex3f(x0,y1,z);
glTexCoord3f(1,1,timestamp); glVertex3f(x1,y1,z);
glEnd();

where the s,t positions into glTexCoord3f(s,t, timestamp) calls and the (x0,y0) and (x1,y1) positions can be adapted for handle the screen format (4/3, 16/9, etc ...) and an automatic zoom in/out

and timestamp is used for to index the slice into the 3D texture (that is an array of 2D pictures that form a video)

=> we have only to increment the timestamp variable (and/or z) in this short code for to display in 3D consecutives frames (and/or slices) from the GOP

With this vertex shader :

void main()
{
gl_FrontColor = gl_Color;
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = ftransform();
}

and this YUV 420 fragment shader :

uniform sampler2D tex;

void main(void)
{
float nx, ny, r, g, b, y, u, v;
float u1,u2,v1,v2;

nx = gl_TexCoord[0].x;
ny = gl_TexCoord[0].y;

y = texture2D(tex, vec2( (nx), (ny)*(4.0/6.0) )).r;
u1 = texture2D(tex, vec2( (nx/2.0), (ny+4.0)/6.0 )).r;
u2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+4.0)/6.0 )).r;
v1 = texture2D(tex, vec2( (nx/2.0), (ny+5.0)/6.0 )).r;
v2 = texture2D(tex, vec2( (nx/2.0)+0.5, (ny+5.0)/6.0 )).r;

y = 1.1643 * (y - 0.0625);
u = (u1+u2)/2.0 - 0.5;
v = (v1+v2)/2.0 - 0.5;

r = y + 1.5958 * v;
g = y - 0.39173 * u - 0.8129 * v;
b = y + 2.017 * u;

gl_FragColor=vec4(b,g,r,1.0);
}

I can map videos on the surface of a lot of various 3D animated objects.

This versions of shaders doesn't use the 3D texture approach, they only map one YUV420 picture into the RGBA domain color of OpenGL textures and display (I work on it and don't have something that work well for this instant, but this is only a question of hours or days)

The OpenGL output is really very impressive, but the video input seem to me "relatively slow", so I want something for to boost it


One other side, GOPs can be "simuled" by glTextGenTextures use too ...



ZBuffer, how you make quotes ?



@+
Yannoo

yooyo
01-19-2010, 04:08 PM
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=faq

and open What UBBCode can I use in my posts?

Regarding v4gl.. its a just a draft.. I still need to work to be sure that this interface is good enough.

Yann LE PETITCORPS
01-19-2010, 04:25 PM
Thanks Yooyo :)

I have various problems with incompatibilies between V4L(2) API and hardware versions :(

But this exactely the same thing with ffmpeg/avcodec ... :(

They aren't mature APIs but they are really very nice ...
(but too complicates for to use on differents OS/computers I find)

But they are here and it's a really good thing for all the world ... :)


@+
Yannoo

Yann LE PETITCORPS
01-19-2010, 04:39 PM
For to be simple, I want too something such as :

GLuint glTexGenVideoExt(int *numframes, void *GOP);

That return an texture index to numframes consecutives frames decompressed from IPBB packets into numframes consecutives texture handles.

and a glBindVideoExt(int video_texture) that bind the current frame into the video_texture array and increment an internal counter for the next glBindVideoExt call.

On other side, glGenTextures/glDeleteTexture/glBindTexture seems to make a very good job about texture memory uses and allocations

=> so this can be very more simple that I think :)

And after reflexions, I think that the work is more from the avcodec/ffmpeg/v4l audio/videos and frames decompressions techniques/optimisations point of view that from an more generic OpenGL point of view where textures access are really very very fast.

But I really think that the various MPEG1,2,4 hardware decompression engines in recents videocards can easily handle the GOP decompression to successives OpenGL textures, no ?


@+
Yannoo

Yann LE PETITCORPS
01-19-2010, 06:22 PM
Yoyoo, what do you think about an YUYV et YCbCr addition on this little API ?

enum v4glOutputPixelformat
{
V4GL_RGB,
V4GL_RGBA,
V4GL_YUYV
V4GL_YCbCr
};

+ the 4:2:0 format support
(this is already make into the fragment shader, but this can to be extended/discarded for specifics shaders)

But what about the time support for the video texturing support ?


@+
Yannoo

yooyo
01-19-2010, 06:54 PM
v4glCreateConvertor have two params.
First param is input fourcc and second param is what texture format you prefer for later usage. Fourcc description define pixelformat and sampling which is enough to build proper shader. API should be smart enough to choose proper rgb/rgba texture to handle 8bit or 10bit formats.

What you actually want.. to do direct conversion from one fourcc to another fourcc, or input fourcc to rgb then to output fourcc?

I have another thing to resolve... which OpenGL version to use? If someone wants to use my lib in pure 3.2 context then I cant use glBegin/glnd calls and I have to switch to vertex pointers and glDrawElements/Arrays calls.
Also GLSL shader syntax is changed between gl versions so I have to create shaders with respect of choosen version rules.

Yann LE PETITCORPS
01-19-2010, 07:00 PM
I want to input with a fourcc codec and output YCbCr 4:2:0, DXT, S3TC or MJPEG compressed frames textures.

Cf. use various avcodec/v4l/raw video datas as input and output OpenGL compressed 2D or 3D textures on the output.

This can be see such as a pipe that can decompress/recompress a video stream into one OpenGL texture unit "on the fly".

So, we can combine texture units between them in the GPU and handle a lot of specials effects on the OpenGL window output from multiples videos streams/files in input.



@+
Yannoo

yooyo
01-19-2010, 07:49 PM
Direct fourcc to fourcc (like YUY2 to NV12) conversion using OpenGL is pointless. Better use OpenCL for that.
Conversion fourcc to RGB/RGBA and back RGB/RGBA to fourcc have more usage.. For example you want to output 3d graphics to external video renderer (like matrox or black magic professional video card) using YUY2 or YUYV pixelformat.

At this moment, API and GPU cannot handle MJPEG compression using shaders. OpenGL is graphics rendering API... it is not image compression API. But.. MJPEG might be possible using CUDA or OpenCL. OpenGL and OpenCL/CUDA have interoperability features (to use textures and buffers between OpenGL and CUDA/OpenCL)

Yann LE PETITCORPS
01-21-2010, 03:46 PM
But I have already a really very fast YUV 4:2:0 to RGBA 4:4:4:4 conversion with my fragment shader, and I find it already very short and trivial to use :)

So make the inverse that convert a RGB picture to a YUV picture doesn't seem to me a too hard task ...

On other side, a library that can make the conversion seem to me too effectively a very good thing for to be the more generic possible (because alls successives optimisations into the library are directly shared by alls programs that use this library).

I test tomorow if OpenCL can be handled on the EEEPC plateform
(CUDA certainly don't work with an INTEL video chipset but I make perhaps one mistake to think that).

I see/work on other side about something that can make, in real time and on the EEEPC plateform, the compression/decompression of something like DXT1 but with the use of 8 bits intensity (on the Y plane) and chroma (on the Cb and Cr planes) instead of two RGB 16 bits colors in a packed format (and note that the planar YUV 4:2:0 format divide already the packet RGB24 4:4:4 format size by 2)
=> the compression factor is about 800% and limitations about the number of colors used in a block that occur in DXT1 is very very reduced with a YCbCr planar format ... and 64 "intensities" of YUV per bloc (Y, Cb and Cr planes are planars, where the fragment shader RGBA output is packed)
==> and hop, I see already the video Groups Of Pictures directly stored in a sort of "temporally and spacially compressed texture format" that is more little than only one picture into the RGB24 format
==> I propose a new GL_COMPRESSED_YUV420_GOP_YLP texture format token for that :)

Very thanks for your help Yooyo

PS : I don't like the 3.x versions because I think that this give more incompatibilities at the end that others things ...
(and I'm affraid to see that OpenGL has begin to become something like the DirectX/Direct3D interface ... where the major part of the code is for handle very various implementations but not the algorithm ...)

PS2 : and don't understand why glBegin/glColor,glTexCoord,glNormal,glVertex/glEnd can't trivialy to be emulated in the hardware driver via vertex arrays in the last OpenGL versions
(because it's very easy to do ... via #define glVertex3f(x,y,z) vertexarray_tab[vertexarray_size++]={x,y,z} and likes this for example)
=> it's as saying that we haven't the law of to walk because we can now use a car or a fly



@+
Yannoo

Gedolo
02-19-2010, 08:59 AM
Listen to your words: RENDERING, this implies making textures visible. Having support for color models is core stuff. This ain't about high-level API features. Just a Texture format support, is part of OpenGL as a rendering API.

Actually it would be nice if you could go lower. Being able to define a texture format and conversions to and from RGB. Wasn't OpenGL about having the building blocks, thus enabling the maximum amount of possibilities? I thought it was.

Yann LE PETITCORPS
03-14-2010, 03:50 PM
Yes Gedolo, block building support at a lower level (cf. at texture format level) is exactely about what I think.

Block building seem already to be internaly used into the hardware (cf. JPEG/MPEG/DXT pictures are already stored/used in a block manner), so I think that this type of texture format can "easily" to be exposed into the OpenGL API if the block structure is limited to 4x4, 8x8 or 16x16 block sizes for example.

On the other side, the same task can certainly to be make into vertex shader and with a texture support of 64, 128 or more bits per pixel ...
(for example, the DXT compression/decompression only use 64 or 128 bits for to handle a 4x4 block of texel)

The YUV(or YCbCr) <-> RGB conversion isn't really very difficult to handle, I only found that this spend some precious lines into vertex shaders that are already limited in size :)
(about 50% of my vertex shader is here "only" for to handle the YCbCr to RGB conversion => with an hypothetic GL_YCbCr420 texture format the size of this shader can become as little as only 5 or 6 lines ... and where the majority of the lines are only here for to handle in/out parameters ;) )


@+
Yannoo