Nvidia GL_UNSIGNED_INT_10_10_10_2 endianess

tbfx · March 29, 2003, 12:27pm

Anyone know if there is an extension
on Nvidia to switch the endianess
of GL_UNSIGNED_INT_10_10_10_2 pixel packing?

For those who’ve never worked on film
scans before, the 10_10_10_2 pixel packing
is identical to data coming from Cineon
and DPX image files. Unfortunately GL_UNSIGNED_INT_10_10_10_2 pixel packing
only works correctly on an SGI.
(sgi GL little endian)

Anyway since 10_10_10_2’s only real purpose
is for supporting fast DPX/Cineon drawing,
it seems pretty useless without having an endian switch (or cough just doing it correctly)

Thanks for any suggestions.

GPSnoopy · March 29, 2003, 1:50pm

I guess what you are looking for is GL_UNSIGNED_INT_2_10_10_10_REV.

dorbie · March 29, 2003, 4:15pm

No, conceptually these things are not split like this. Big endian vs little endian only matters for i/o at the byte level, otherwise it’s a transparent in memory representation. The C code to for example create an in memory representation on the fly would look identical on any endian system.

The only thing that could go wrong is BYTE swizzling on big vs little endian i/o.

So if you have a problem due to endianness and the data is written as a packed 32 bit type then you need to swizzle the bytes not the components. That’s IF the problem is endian related.

[This message has been edited by dorbie (edited 03-29-2003).]

tbfx · March 29, 2003, 9:39pm

> I guess what you are looking for is GL_UNSIGNED_INT_2_10_10_10_REV.

Yes tried this. Does some interesting
things, but fix the endian problem
isnt one of them.

> The only thing that could go wrong is BYTE swizzling on big vs little endian i/o.

Yes definately the problem is that internally
for the card each pixel is being considered an int32.

In any case, would be most cool to be
able to handle large volumes of
uncompressed data in a nice platform
independant way.

Im “Super Curious” what happens under
OSX (its little endian and has Nvidia support doesnt it?) Anyone ever test OSX for this?

I think it would be great if both
ATI and NVidia made an Endian switch
available for 10bit, 12bit, 16bit and Float.

Since they have drivers for both OSX and x86
they must have this figured out already in
the driver. Or its just slow on OSX
because its swizzleing I guess.

GPSnoopy · March 30, 2003, 2:13am

What about using GL_ABGR as a source format along with “2_10_10_10_REV” ?

AFAIK OpenGL is always based on a big endian model. So if there is a problem it’s probably in the image loader function.

ehart · March 30, 2003, 10:51am

OpenGL uses the native endian model for client data. (GLX wire protocol may have a defined endianness) I think what the poster is looking for can be done by setting the swap bytes unpacking pixel mode with glPixelStore. I might be wrong here, because I can’t remember exactly how it interacts with packed formats.

-Evan

tbfx · March 30, 2003, 11:41am

> What about using GL_ABGR as a source format
> along with “2_10_10_10_REV” ?

yes, tried this and ABGR with 10_10_10_2 as well.
Still wrong. Although it takes the framerate
down from 40fps to 2fps so it must be doing
something.

> AFAIK OpenGL is always based on a big endian
> model. So if there is a problem it’s probably
> in the image loader function.

Naw OpenGL endian-ness is definately
platform dependent.

On linux/x86 its big endian, on SGI
its definately little endian. On OSX
it should be little endian, but probably
just falls back to software for everything
but GL_RGB packing modes. (just a guess)

So I guess my new question is:

Is an nvidia card all BigEndian internally
or can it be reprogrammed to
swap byte orders. And if the answer is
no how are they dealing with this on
OSX.

HS1 · March 30, 2003, 11:53am

Originally posted by tbfx:
On linux/x86 its big endian, on SGI
its definately little endian.

Naw… x86 based systems are little endian (they are eating the egg with the small end first)

On most RISC system “big endian” is the default.

tbfx · March 30, 2003, 12:16pm

HS says
> Naw… x86 based systems are little endian (they are eating the egg with the small end first)

Yes, you are right, sorry to confuse
the topic.
http://www.rsinc.com/services/output.cfm?tip_id=1804

ehart says
> OpenGL uses the native endian model for client data. (GLX wire protocol may have a defined endianness) I think what the poster is looking for can be done by setting
> the swap bytes unpacking pixel mode with glPixelStore. I might be wrong here, because I can’t remember exactly how it interacts with packed formats.

A great suggestion, tried this with both
10_10_10_2 and 2_10_10_10_REV. Got different
results, but still not right. It is fast though, so theres hope for the future.
Reading the man page on glPixelStorei
this sounds like the right function.

Thanks

imported_jwatte · March 30, 2003, 12:29pm

OpenGL constants for texture formats seem defined with big-endian conventions. GL_RGBA, GL_UNSIGNED_BYTE means that the bytes come in R, G, B, A order, which means that, read as a longword on a little-endian machine, they actually read 0xaabbggrr.

Similarly, I believe you may need to use glPixelStorei to set the UNPACK_SWAP_BYTES to get the behavior you want if your packed pixel format is packed into a little-endian pixel value (as opposed to big-endian pixel value). This impacts even simple formats, such as the 16-bit pixel (texel) formats.

tbfx · March 30, 2003, 9:21pm

Getting interesting.

glPixelStorei(GL_UNPACK_SWAP_BYTES, GL_TRUE);
glDrawPixels(pwid, phei, GL_RGBA,
GL_UNSIGNED_INT_10_10_10_2_EXT, pptr);

Doesnt work by itself. (still wrong, but quite fast @40fps)

I believe this might be a case where the
hardware method is broken and software isnt.
Ie, it you preface the above calls with
glMatrixMode(GL_COLOR);
glLoadMatrixf(mtx);
glMatrixMode(GL_MODELVIEW);

It looks great.

So basically software path is treating
the packing correctly which is a great start.

At 1 fps though it might be faster to
hand reverse the bytes.!

This is an Nvidia FX 1000 AGP8x btw.

dorbie · March 31, 2003, 1:33am

Just ****ing shoot me, I’ve lost all hope. I give up!

tbfx · March 31, 2003, 10:23am

Hopefully some kind soul from NVidia
will read this and log it as a bug.

Fixing it will definately be helpful
towards establishing themselves as a
viable graphics card for film
production.

(not to say Im not more than happy to
be proved wrong here with a brilliant
work around that no one anticipated
could exist)

dorbie · April 1, 2003, 3:59am

If you can do this fast enough try this for each packed pixel before you send it to OpenGL if you need to correct for the bug in the Cineon file reader:

packed_out =
(packed_in & 0x000000ff) << 24 |
(packed_in & 0x0000ff00) << 8 |
(packed_in & 0x00ff0000) >> 8 |
(packed_in & 0xff000000) >> 24;

I find the manual page on this informative:

GL_UNPACK_SWAP_BYTES
If true, byte ordering for multibyte color components, depth
components, color indices, or stencil indices is reversed. That is,
if a four-byte component is made up of bytes b , b , b , b , it is
0 1 2 3
taken from memory as b , b , b , b if GL_UNPACK_SWAP_BYTES is true.
3 2 1 0
GL_UNPACK_SWAP_BYTES has no effect on the memory order of components
within a pixel, only on the order of bytes within components or
indices. For example, the three components of a GL_RGB format pixel
are always stored with red first, green second, and blue third,
regardless of the value of GL_UNPACK_SWAP_BYTES.

This explicitly states that only bytes within a component should be swapped and component ordering of packed types are not swizzled. For a packed type where 10 bit components cross byte boundaries I would expect there would be no swizzle. I dunno what the spec says, perhaps it is clearer or should I say different.

I don’t think it is appropriate to take an issue with file i/o related endianness and imply it’s the driver’s problem. The Cineon file reader should handle it’s own endianness problems. When a loader gives you an in memory representation that is a mangled 10_10_10_2 that’s actually something like a 6_2,4_4,2_6,8 it is buggy unless you deliberately plan this. Just getting this and hoping the graphics implementation can fix it is nice but not a given. It’s definitely not a deficiency of Nvidia OpenGL vs SGI OpenGL, the SGI system gets a free ride with a native endian file that produces a correct in memory representation that requires no swizzle.

GL_UNPACK_SWAP_BYTES should be consistent on all code paths, but it looks like the real bug is in the software path. It would seem there is a need to swizzle packed multi-component types, but it’s just not supported in OpenGL. You must therefore fix the Cineon file loader to correctly handle endianness, or write your Cineon files with x86 native endianness (if that’s even possible).

[This message has been edited by dorbie (edited 04-01-2003).]

imported_jwatte · April 1, 2003, 6:03am

Dorbie,

It appears that the poster is getting different results based on whether he forces the software path (by using an identity color matrix) or the hardware path. This implies to me that there’s a driver bug.

Further, I think I can defend my assertion that OpenGL is defined in big-endian order, because when the layout is 10-10-10-2, that means the 10 top bits are “red” and the 2 bottom bits are “alpha”. This is contrary to how most little-endian architectures count bits; the first 10 bits in little-endian parlance would be the 10 LEAST significant bits of the pixel value.

ehart · April 1, 2003, 6:04am

Because all this was starting to bother me, I decided to crawl through the GL spec. It says in section 3.6 that byte swapping occurs before the components are extracted from the packed format.

This tells me that there is indeed a bug in the implementation you are testing against. As always, please attempt to contact the developer support for the that company as bugs, especially ones like this, can be very obscure.

-Evan

ehart · April 1, 2003, 6:12am

jwatte,

The spec explicitly states that the ordering of bytes within shorts, ints, etc are up to the implementation. (Section 3.6) What it does specify is at which bit each of the components start. As a result a packed format of 8888 will be RGBA order when read in bytes on one machine, but ARGB when read on another.

-Evan

system · April 1, 2003, 9:56am

Originally posted by ehart:
[b]jwatte,

The spec explicitly states that the ordering of bytes within shorts, ints, etc are up to the implementation. (Section 3.6) What it does specify is at which bit each of the components start. As a result a packed format of 8888 will be RGBA order when read in bytes on one machine, but ARGB when read on another.

-Evan[/b]

I think what jwatte means is that the way the #defines are expressed suggest BIG endian.

The most basic example being GL_RGBA. I think most of us consider 0xaabbggrr as a backwards storage of the colors. On a big endian, it should be 0xrrggbbaa
and if you look at the #define (GL_RGBA), you see the order matches.

PS: most people read from left to right, big endian style

PSS: now Im confused. Why doesn’t GL_RGBA work on little endian?

[This message has been edited by V-man (edited 04-01-2003).]

imported_jwatte · April 1, 2003, 4:15pm

GL_RGBA, UNSIGNED_BYTE is actually not a bigger-than-a-byte format. Thus, the bytes are stored in the order suggested, as GL_UNSIGNED_BYTES in the order R, G, B, A. It just so happens that when you load this into a longword on x86, you get 0xaabbggrr. The native format on Windows seems to be 0xaarrggbb, so hence the GL_BGRA extension (which is now promoted to required format).

What I was saying last was that OpenGL spec does, indeed, specify which bits contain which components, once you’ve unpacked a sequence of bytes into a larger native data type (through swapping, or not). What I’m complaining (only mildly) about is the fact that the way the formats are defined, they’re written using big-endian byte counting.

imported_jwatte · April 1, 2003, 4:26pm

And, to further clarify this small point:

GL_UNSIGNED_SHORT_5_5_5_1 is defined to store the alpha bit in the lowest alpha bit. I’m not aware of any image file format that actually packs images like this.

Note that there are now several ways of specifying a “default” loaded image (in B, G, R, A format in memory):

traditional: GL_BGRA, GL_UNSIGNED_BYTE

new1: GL_BGRA, UNSIGNED_INT_8_8_8_8, setting UNPACK_SWAP_BYTES to TRUE if you’re on little-endian host, else FALSE

new2: GL_BGRA, UNSIGNED_INT_8_8_8_8_REV, UNPACK_SWAP_BYTES to TRUE if you’re on a big-endian host, else FALSE

To tie back to the original question, if the file format is defined as follows:

each pixel is stored as a 32-bit word
the two highest bits are A, then come R, G, B, so the lowest 10 bits are “B”
the words are stored in little-endian format in memory

Then, on a little-endian machine, you’d specify this format as GL_BGRA, UNSIGNED_INT_2_10_10_10_REV; on a big-endian machine you’d additionally set UNPACK_SWAP_BYTES to TRUE. If the file format was defined using big-endian 32-bit words, then you’d set SWAP_BYTES to TRUE on little-endian machines, but you’d still use 2_10_10_10_REV (assuming the file format definition above).

Clear as mud? Are we all agreeing aggressively with each other at this point?