View Full Version : Texture Compression

09-08-2004, 12:55 PM
Anyone have any clue as to why my call to

glCompressedTexImage2DARB(target, 0, format,
pTex->width, pTex->height, 0, pTex->size, pTex->pixels);might be failing? I know the border has to be set to 0, and it is...but I'm getting an error return of GL_INVALID_OPERATION.

I'm trying to compress the image using
GL_COMPRESSED_RGBA_S3TC_DXT5_EXT if that helps or means anything to anyone.

the target is GL_TEXTURE_2D.

09-08-2004, 01:26 PM
heres what i do,
one thing to watch out for is mipmaps, cause the smallest block is 4x4 pixels (not 1x1)

int size = (w * h);
if ( bt.compression_used )
if ( size<16 ) size=16;
fread( (pixels+total_size), sizeof( GLubyte ), size, file );
glCompressedTexImage2DARB( GL_TEXTURE_2D, i, GL_COMPRESSED_RGBA_S3TC_DXT5_EXT, w, h, 0, size, (pixels+total_size) );
total_size += size;

09-08-2004, 02:27 PM
To clarify, you still have 2x2 and 1x1 mipmaps, but they are one block large, so it still contains 4x4 pixels, but only the 2x2 or 1x1 upper left pixels are used. Also make sure the size parameter is correct.

09-08-2004, 07:32 PM
Are you trying to upload pre-compressed data, and it fails? Then one of your parameters is not right.

Or are you trying to make the driver compress data that you have, which is un-compressed? If so, then you should use TexImage with the _COMPRESSED_ internal format and NULL data, then use TexSubImage() with RGB external format, and the driver will compress for you (although it'll be slow and poor image quality).

09-09-2004, 06:08 AM
The image data is uncompressed; so, I've got uncompressed image data in system memory -
pTex->pixels - and I'm trying to upload it to the board compressed. The call to glCompressed... should compress the image data on upload should it not?

sorry, I've not worked with compressed textures before...It may be very well possible I have no idea what I'm talking about.

EDIT: jwatte -
I tried what you suggested, here's the call

glTexImage2D(GL_TEXTURE_2D,0,GL_COMPRESSED_RGBA,51 2,512,0,
GL_COMPRESSED_RGBA_S3TC_DXT5_EXT,GL_UNSIGNED_BYTE, NULL);before the call I bind the texture that I'm trying to compress...
anyway, this call returns GL_INVALID_ENUM.
???, any ideas?

09-09-2004, 07:21 AM
glCompressedTexImage2D is for uploading precompressed texture data. If you want the GL to compress it, use glTexImage2D with the internalFormat set to a supported compressed format. Make sure to use a format supported by the driver:

glGetIntegerv(GL_NUM_COMPRESSED_TEXTURE_FORMATS, & formatCount);
glGetIntegerv(GL_COMPRESSED_TEXTURE_FORMATS, formatArray);I get really tired of seeing apps that assume if GL_ARB_texture_compression is supported that they can use GL_COMPRESSED_RGBA_S3TC_DXT1_EXT. :( It is completely valid for a driver to advertise GL_ARB_texture_compression (or GL_VERSION >= 1.3) but not support any compressed formats. The spec was written to specifically allow that.

If you don't want to bother determining which compressed formats are supported, you can use GL_COMPRESSED_RGBA (or one of the other generic compressed formats) and the driver will pick one for you. You can query the texture to find out what the actual format is. This is useful if you're going to readback the compressed texture with glGetCompressedTexImage.

09-09-2004, 07:28 AM
Okay, don't take this the wrong way, but this is basic OpenGL. The internalFormat (the 3rd parameter to glTexImage2D) is the format you want the texture to be on the card. The format / type (the 7th and 8th parameters) describe the format of the texture data you are passing in. You want the data to be GL_COMPRESSED_RGBA_S3TC_DXT5_EXT, but you have it as an array of unsigned bytes representing RGBA texels.

glTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGBA_S3TC_DXT5_EXT, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, ptr_to_uncompressed_texture);

09-09-2004, 07:53 AM
idr, thanks.
that helped.

01-20-2010, 07:40 AM

I'd like to use glCopyTexSubImage2D to copy data to a texture. I've a pool of >50 textures and wonder if it's possible to let the GPU compress the data and then write it to the texture? (to save VRAM)

Can I instantiate my textures like this:

and use something like that:
glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, m_poolTexWidth, m_poolTexHeight);

to get a compressed texture? Will the GPU compress it, or is it passed to the driver/CPU.

Is there a better way to have a huge amount of textures, but save VRAM?

Alfonse Reinheart
01-20-2010, 10:38 AM
Really? Was there a need to unearth a thread 6 years old? You couldn't just create a new thread?

Will the GPU compress it, or is it passed to the driver/CPU.

Implementation dependent, but I'm guessing that it'll be done on the CPU.

Is there a better way to have a huge amount of textures, but save VRAM?

I'm not sure what you mean. CopyTexSubImage copies pixel data from the framebuffer to the given texture. Both the framebuffer and the texture are in VRAM, so I don't know how this is saving you anything.

01-21-2010, 06:33 PM
It's true that I have never used GL_COMPRESSED_RGBA_S3TC_DXT5_EXT before to have read this thread ... :)

As a lot of others things ... :(

Exist something like ?


Something that can handle easily a 3D texture of 2D IPBBPBB... compressed slices of pictures (where the Z dimension is the time) and permit to directly share compressed data between the RAM and the VRAM ?

What is the "best but standardised internal format" for video display and sharing betwen the GPU and the CPU ???


Alfonse Reinheart
01-21-2010, 07:30 PM
Exist something like ?



Something that can handle easily a 3D texture of 2D IPBBPBB... compressed slices of pictures (where the Z dimension is the time) and permit to directly share compressed data between the RAM and the VRAM ?


S3TC, and all the variations thereof, are formats designed for a specific purpose: fast texture access. The decompression algorithm is both braindead-simple and very localized. You can easily decompress any 4x4 block of the texture, and doing so only requires exactly 64 or 128 bits. It requires only some table accesses and integer math to decompress the images. It is regular, fast, and easy to implement in hardware.

JPEG is not. It, like MPEG and such, are designed for a different purpose. You cannot easily decompress a section of a JPEG image; you pretty much have to do the whole thing. JPEG requires high-end math to decompress.

Of course, JPEG is a better image compression format in terms of overall quality. But hardware for doing texture accesses from JPEG compressed images would be very complex, expensive, and slow. The formats used for compressed textures are those that are designed to be implemented in hardware, not things designed for the convenience of the user.

Ilian Dinev
01-21-2010, 07:31 PM
Yann, sometimes it's really insightful to read through glext.h (the latest version). I.e, in this case you should Ctrl+F for "compressed" there ;).
The glext.h contains all extensions, thus shows the complete functionality that _could_ be available at the moment.

01-22-2010, 05:45 AM
First, thanks for your replies!!! :)

Really? Was there a need to unearth a thread 6 years old? You couldn't just create a new thread?
My policy is to:
a) search the forum for existing threads
b) if there are no search results, open a new thread

Is it considered as a bad manner, to add something to an older thread? - As long as the topic fits IMHO it's better to group things which belong together into a single thread... anyhow... :whistle:

I understand that both, the framebuffer and the texture consume VRAM. The problem is, that the textures consume too much!

Background: I'm working on a video effect plug-in which gets called 50 times per second, and gets a handle to a texture (which is already on the GPU). Video frames are stored in the texture. My effect should delay the video, thus I copy the current texture to a buffer (in VRAM) and read-out a frame of the past to be processed now.
The question is how to maximize the amount (buffer) of frames without copying them to RAM (as copying to RAM performs very bad)
My thought was to compress those textures in the buffer. Not sure whether that makes sense or not.

Maybe there is no solution to this specific problem?

01-22-2010, 09:03 AM
3D graphics is a topic that is so rapidly changing, that bringing up such an old thread is usually seen as bad practice, because people unfamiliar with the thread will start reading at the beginning, wondering why it is full of outdated information, before they realize that the thread is very old.

On other forums about other things that might be handled differently, but in this case it is better to create a new thread, if the threads that you found are older than, say, 6 months.

IF you find a topic that is very old, but still comes very close to your problem, insert a link to it in your post, so that people see that there has been a discussion about it before, but it is clear that everything it contains might be very outdated.


Alfonse Reinheart
01-22-2010, 11:11 AM
The question is how to maximize the amount (buffer) of frames without copying them to RAM (as copying to RAM performs very bad)

No, the question is why are they textures to begin with? If your intent is to process these frames in some way on the CPU, then they should just be RGB data stored in main memory. It's a waste of time to upload the image data after decompression, only to download it, modify it and then re-upload it.

If you're trying to use a shader to process the image, then uploading it to a texture is the right way to go. Otherwise, don't do it. Just leave it in main memory until you are ready to draw it.

01-24-2010, 08:15 AM

I'm make some tests about something like DXT1 but adapted to the YCbCr color space instead of the RGB color space (cf. with 16bits yuv844 instead of rgb565 colors)
=> the compression is always 8:1 but the quality seem "more good/realistic"
(I haven't test with yuv655 or others yuv 16 bits possibles formats, but I think to find another yuv 16 bits format that is better in a short time)

I think this is principaly because the grey gradient is very more "visible by the eye" that the color shift when we interpole between C0 and C1 in C2 and C3.

The Y part of the C0 and C1 colors are really easy to find
=> they are the minimal and maximum values of the current 4x4 bloc in the Y plane from the YUV picture that is generated by libavcodec or v4l for example
(the 4:2:2, 4:2:0 or 4:2:1 format is not important for the Y plane because the Y plane is always the same, only Cb and Cr planes are "compacted" between this differents formats)
==> with MMX instructions, this is really very very fast

My actual problem is about to find the best CbCr line from what I can interpole the UV part of C0 and C1 in C2 and C3
=> it's like interpolating a rainbow :(
==> but I think this can be possible by interpoling alternatively Cb and Cr instead of to test to interpole a rainbow :)
(and this not add additionnals bytes into this new DXT video texture format ...)

After finish to resolve this problem on I frames (cf. frames that are not dependants from next or previous frames) that are "YUV DXTed", I think to begin to handle the P and B frames cases for "really" compress the video texture :)
(my goal is to have something between a 12:1 and 25:1 compression with this "DXT like video texture format" for really reduce the need of memory for netbooks/PDAs that are really limited in memory)

Alphonse, I want to handle **a lot** of textures/frames into the same shader ... otherwise this is too easy because I have already it :)
(cf. two 3D textures generated by successives 2D frames from AV1 and AV2 , where the r dimension on 3D textures is an index into the time, cf. a timestamp)
=> it's more or less 50 frames that I want to handle into this shader for to can direcly handle in the shader a mix between two videos and for a time of about one second before to have to reload the AV1 and AV2 "frames texture packs", cf. GOPs generated by libavcodec or v4l, into the GPU
==> this give a latence of the number of pictures on the Group Of Pictures but multiply possiblities about streaming / compression / decompression / specials effects into the shader ...
(with a GOP of 4, I find that it is not too perceptible, but with a GOP of 8 or more this begin to be really perceptible ... and GOPs in video files are generally very more that 8 :( )
===> a GOP of numerous MPEG/JPEG/AVI/V4L video frames is not really the same thing to handle than one or two littles and independants RGB frames in the OpenGL point of view (but only for this instant, I think ...)
====> but the fact that this can give superior quality with very less of RAM/VRAM memory occupation and %CPU utilisation give to me a lots of goods reasons for to continue my research about it :) :)

Personnaly, I find that the RGB color space is very bad for handle video pictures ... the YUV/YCbCr color space is really more adapted for video textures
(and with the luminance/chrominance embeded in it, we can easily adapt the video stream exactely such as what we can make color/intensity adjustments with potentiometers on a TV ...)

And I want a memory location where I can modify something with the CPU but where the GPU have a direct access
(for to bypass the CPU->GPU memory transfert if possible ... I think that the AGP memory or something like this is certainly the more adapted for this)
=>but grouping multiples consecutives RAM =>VRAM memory transferts into only one seem to me a very efficient way for to cache this too


Dark Photon
01-24-2010, 12:43 PM
What is the "best but standardised internal format" for video display and sharing betwen the GPU and the CPU ???
Best depends on your application. Is you data SDR or HDR? Are you requiring use of GL or are other GPU APIs OK? What format are you coming from? What are your performance constaints? Is quality, memory, or speed more important?

If you're requiring use of GL, best for space/bandwidth is of course the GPU-supported compressed texture formats such as DXT1 and DXT5 (ringing in at a mere 0.5 and 1.0 byte/texel, respectively). You can store std RGB color space in these, or store alternate color spaces in DXT5 such as YCoCg for better quality.

However, if you're coming from already compressed MPEG or MPEG-like video (especially something like h.264) and aren't insisting on GL, you'll likely get much better perf using a library like NVidia's VDPAU (http://www.mythtv.org/wiki/VDPAU) or XvMC (http://www.mythtv.org/wiki/XvMC) to feed video to the GPU. MythTV (http://www.mythtv.org) for instance uses these for GPU-assisted video playback, when enabled and available.

01-24-2010, 01:11 PM
Thanks Dark Photon :)

I have now see VdpVideoSurfaceGetBitsYCbCr, VdpVideoSurfacePutBitsYCbCr and others VdpOutputSurfacePutBitsYCbCr funcs specs, this seem to be about what I want :)

Where can I find a complete but simple and fonctional sample/tutorial that use this in C/C++ ?

Because the pseudo-code seem cool, but I don't really know how to compile it in gcc or g++ :)
=> I want to read frame by frame a videofile (the file format can be .avi, .mpg, .mov or /dev/video for example) and output each image in a "compressed but standardised and user-friendly internal format" into a queue on memory (so, where I can easily and fastly decompress one picture, modify it and rewrite it in a compressed format, all this "on the fly").

One thread fill a frames queue when it read a videofile and we have multiples others threads that can read this frames queue (and/or make a mix between multiples queues in input, and output this mix into another pictures queue, or display directly it on numerous and various 3D OpenGL shapes that are video-texture mapped).

For this instant, this is only for SD resolutions (CIF, QCIF and others 4CIF) on 1 to 32 bpp surfaces (B&amp;W to RGBA8 with the YCbCr format between), but I'm for to have support for 9CIF/16CIF or HDR versions such as 1920x1080 or more in multiples views and in float or double formats too :)

I want to display/stream "not too slowly" something like four or five audio/video streams/files (or a lot more like dozens if this is possible) on a little netbook such as a eeepc, an iPhone or a PDA ...

At this instant I can only handle two ot three littles video streams on my eeepc, but with a lot of difficulties (I have to volontary loose somes frames for to have something that work) and CPU/RAM consommations that are really too hights
(and my PDA doesn't seem to like this when I test multiple video streams displaying with it and this work perhaps/certainly with the iPhone but I haven't found the time to work about this implementation :( )

But on other side, I can already handle more than a dozen of littles videos streams in // on various CoreDuo plateforms (such as recents PCs, iMac or Mac Mini) with V4L(2) and/or libavcodec, so I find that it's not as bad as it :)
(on the iMac plateform, I can already for example fill the HD screen with a lot of SD avi/mpeg/raw streams and resize/zoom/scroll/rotate/mix/... independantly each video stream display in real time ... but I haven't the /dev/video support for the webcam with MacOS because this seem to be a "Linux only" feature)

And I dream about that this can work "very well and speedly with HD contents" on a very little computers farm (with two or three CoreDuo plateforms for example), from the client/server and network point of view too :)


01-25-2010, 05:31 AM
If your intent is to process these frames in some way on the CPU, then they should just be RGB data stored in main memory. It's a waste of time to upload the image data after decompression, only to download it, modify it and then re-upload it.

If you're trying to use a shader to process the image, then uploading it to a texture is the right way to go. Otherwise, don't do it. Just leave it in main memory until you are ready to draw it.

I'm not responsible for uploading the texture to the GPU. It's done by the host application, my plug-in just gets a handle to a TEXTURE_2D and that's it. And you are right, the plug-in is mainly a shader, which for example blends the last 50 frames (50 textures). And no, I do not want to download the textures to RAM to process them on the CPU, not at all! - But I'm looking for a clever way to increase the maximum number of frames (textures) stored at the graphic card. And while I was looking for a solution I found a post about texture compression, but I have no idea if I can draw the "current" texture (the one I got the handle to) to the framebuffer, and then copy it to a compressed texture on the GPU.

@Jan: I got the point, and agree. I'll change my policy ;)

Alfonse Reinheart
01-25-2010, 10:42 AM
And while I was looking for a solution I found a post about texture compression, but I have no idea if I can draw the "current" texture (the one I got the handle to) to the framebuffer, and then copy it to a compressed texture on the GPU.

You can do that with OpenGL. But you'd be wrong to believe that it would all be handled on the GPU.

Compressing a texture with S3TC (or most other compression formats) is non-trivial. As far as I know, shaders don't exist that can do it. So when you tell OpenGL to copy from the framebuffer to a compressed texture, it will likely download it to main memory, run the CPU compression routine, and re-upload it to the texture. Seeing as how this is probably not what you want, I would advise against it.

01-25-2010, 01:40 PM
I admit that texture compression is certainly a task that cannot now handle shaders (for the instant ...)

But I'm sure that this is certainly not the case for the decompression side :)
=> somewhere, this is only one type of color indexing ... and with a very very little colormap of only 4 colors :)

For the compression side, I don't think that this can really add a big number of news transistors for next GPUs because we have only to find the maximum and mimimum reds, greens and blues components of 16 colors and make some comparaisons ...
=> something like "goods olds" MMX registers can make this very efficiently ...

So, hear in 2010 that the DXTs compressions/decompression is a really very difficult task to make on hardware seem to me something like a big mistake ...

And the fact that DXTn are really low complexity algorithms compared to JPEG, MPEG or others MJPEG formats (that are "relatively old" video formats and implemented in hardware since a very long time) give me a lot of assurance for to think this :)

Please, don't loose time for to say "no, it's not possible" , we prefer to hear "it's certainly possible, **BUT** this is really very hard to implement" :)
=> the shortest path is often the best ...
(but ok, sometimes this is the longest to traverse ... before to make the bridge for a lot of others persons that cannot traverse it without :) )

So, in fact the problem is only to find a very fast (cf. in real time or very near to this) compressor that can handle a DXT output
=> I don't know why but I think that it exist in the word a lot of guys that can give help for to have this :)

If a shader can access individualy 16 texels in the same texture, I think that it can make the compression ... in uniforms that we can retrieve in RAM after the execution of this shader with a bloc of 4x4 texels ...
(but on other side, a CPU implementation can certainly to be more speed because it haven't to pass arguments in/out with the shader memory space)

I begin to think that my dream/delirium about "YCbCr DXTed" textures can to become a reality in a near future :)
(and it haven't the problem of interpolation with three differents colors in the 4x4 block ...)

For to be simple, the idea is to make something like a DXT but only on 8 bits values (and not rgb565) and that work independantly for the Y, Cb and Cr planes
=> this can be "easily" decoded into a fragment shader and I think that the encoder isn't too hard to make ...
(but I see already a very big problem with this because we loose the linear interpolation between blocs of 4x4 texels, so to have something such as a "interpolated mosaic" at the end :( )
(but on another side this is already a specific problem of the DXT compression and this don't seem too problematic :) )


mark ds
01-25-2010, 05:45 PM

01-26-2010, 01:32 AM
Thank you for all your answers! You helped me a lot!

01-26-2010, 12:53 PM
Thank Mark DS,

Your link is really good and I have find a lot of code/samples/new ideas in it

This confort me to the fact that the RGB colorspace isn't the best for the compression/decompression of pictures/videos :)

This seem very nice but I don't want to loose the 4:2:0 compression in the way :(

But I think that this is not too hard to add to the YCoCg sheme, because this seem exactely the same thing that what I make for to handle Y, Cb and Cr planes in my shader, cf. only some "scale/decal" with texcoords for to access the good plane.

And I see one YCoCg to RGB conversion formula in this linked page, so I don't think that it's really too hard to have a formula for a direct YCbCr to YCoCg conversion
(so, the 4:2:0 compression is not necessary loose ...)

Note that I don't like to work with only a diagonale in a 3D colorspace, I prefer the possiblilty to work with a "true but little/reduced" colorspace, not only a "gradient colorline" :)

With one "diagonale/interpolation" per component, this can form a sort of "curved triangle" if we think colors such as 3D vectors (cf. x,y,z r,g,b, y,u,v, y,cb,cr and y,co,cg are alls 3D vectors) that are not obligatory linears and/or perpendicals ...
(the DXT compression scheme can only handle a line in the 3D RGB colorspace)

In alls cases, if the RGB colorspace don't seem to be very used in the video domain, it's certainly not for nothing :)

But this work always only for handle intra-pictures ... :(
=> it's now time to think about the inter-pictures algorithm for to really have a good compression ratio :)
(interlacement technics, subpictures/mosaics and others bi-directionals pictures can help a lot for this)

==> I have already something that begin to work and that use the standard PAL/SECAM interlacement/frame (cf. 50 Hz to 25 fps) scheme for to handle two successives YCbCr video pictures for the price of only one :)
(I have now my GOP, of only two pictures it's true, but it's already the beginning of the implementation of my "GOP dream/delirium" )

===> this make already a 4:1 compression without any visual artifacts (and with a temporal interpolation between this two frames if we want more/less fps) when we compare it to basics/simples RGB successives frames (and the DXT1 compression is only 8:1 with a lot of visuals artifacts and any inter-pictures features ...)


02-02-2010, 02:39 PM

I begin to have something that work and have the same 8:1 compression rate such as DXT1 but with a quality that seem to me really better with photographics, statics and animated pictures.
(my implementation is always too slow for to handle a video in real time at 25/50 fps, but I think resolve this problem in a relatively short time because my code doesn't use MMX/SSE instructions for the instant).

This is something like a monochromatic version of DXT1 but that handle a YCbCr 4:2:2 packed format with 4x4 blocs constitued of two YCbCr 6:5:5 colors (minimals and maximals Y, Cb and Cr values where Y, Cb and Cr components are totaly independants) coupled with a 4x4 1 bit/pixel array generated by a "Floyd-Steinberg like" error diffusion algorithm for the Y part and two arrays of 2x2 2bits for the Cb and Cr parts.
(cf. 8 bytes for 16 pixels)

I can per example handle without any problem :

one or more pixels that are only shades of blues
one or more pixels that are only shades of green
one or more pixels that are only shades of red
one or more black pixels
one or more white pixels
and a mix of all this "independantly" for each pixel in the 4x4 bloc of course :)

Where the DXT1 compression can only handle 4 colors that are in "a line between two colors" ...

So now, I begin to play with my "GOP of 8 YCbCr 4:2:2 DXTed frames that have only the size of one RGB24 picture" :)
(and with the PAL/SECAM interlacement, I think easily extend this for to double the number of frames in my compressed GOP with a very small visual difference)

But on other side, I have loose the high quality for the zoom/resize on the display window and/or when I project the video texture on various and animated 3D shapes :(

But I think to come back to a multiplanar format for that texture units hardware can make the bilinear interpolation without any penalty as before
(but ok only with the reducted minimals/maximals YCbCr values, not for the arrays of bits)
=> how can I handle differents interpolations schemes into the same texture (but without the use of multiples texture units) ???


02-11-2010, 12:21 PM
I think to have found 3 news DXT compressions algorithms :)

They works with a bloc of 16 pixels (4x4) such as DXT1/2/3/4/5/6

But this is computed in the YCbCr color domain, not in the RGB domain ...

And the input/ouput is already in a 4:2:0 precompressed and planar format :)
=> this give somes levels of mipmaps for free with this DXTed version ...

My algorithm use zigzag + dithering + error diffusion methods for to convert independently the Y, Cb and Cr planes from 8 bits to 1 or 2 bits

minimums YCbCr components => 2 bytes (6:5:5 format)
maximums YCbCr components => 2 bytes (6:5:5 format)

YYYY 1bit/pixel => 2 bytes

CbCb 1 bit/pixel => 4 bits

CrCr 1 bit/pixel => 4 bits

This give only 7 bytes for each bloc of 16 pixels
(2 colors of 2 bytes + 2 Y bytes +1 Cb/Cr byte)
=> the compression ratio compared to plain RGB is near of 7:1
==> so DXT7 seem me a good name for a 7:1 compression algorithm :)

Another way is to use 2 bits per Y sample for to have a better scale of intensity => DXT8 (8 bytes for 16 pixels)
(exactely such as the DXT1 but this is better because this is really the intensity that is interpoled, no a line between two colors)

We can too add one or two anothers bits/planes per pixel for to handle the alpha/transparents pixels => DXT9 (9 bytes for 16 pixels)

And/or use more than 1 bits per pixel for the Cb/Cr plane for a better color quality.

If we compile all this, this make 12 bytes for 16 pixels for a really great quality :)
But the compression ratio is very bad with "only" 4:1 ... :(

And this is only for the intra-picture compression in real time ...
=> the inter-pictures compression in GOPs is certainly for soon :)

I'm really happy because my good old EEEPC 701 can now work with YCbCr 4:2:0 DXted and mipmapped HD textures :)
(not in real time for the instant, but I think that some MMX/SSE asm and/or vertex/fragment shaders optimisations can easily resolve this temporal problem ...)