PDA

View Full Version : Nvidia GL_UNSIGNED_INT_10_10_10_2 endianess



tbfx
03-29-2003, 01:27 PM
Anyone know if there is an extension
on Nvidia to switch the endianess
of GL_UNSIGNED_INT_10_10_10_2 pixel packing?

For those who've never worked on film
scans before, the 10_10_10_2 pixel packing
is identical to data coming from Cineon
and DPX image files. Unfortunately GL_UNSIGNED_INT_10_10_10_2 pixel packing
only works correctly on an SGI.
(sgi GL little endian)

Anyway since 10_10_10_2's only real purpose
is for supporting fast DPX/Cineon drawing,
it seems pretty useless without having an endian switch (or cough just doing it correctly)

Thanks for any suggestions.

GPSnoopy
03-29-2003, 02:50 PM
I guess what you are looking for is GL_UNSIGNED_INT_2_10_10_10_REV.

dorbie
03-29-2003, 05:15 PM
No, conceptually these things are not split like this. Big endian vs little endian only matters for i/o at the _byte_ level, otherwise it's a transparent in memory representation. The C code to for example create an in memory representation on the fly would look identical on any endian system.

The only thing that could go wrong is *BYTE* swizzling on big vs little endian i/o.

So if you have a problem due to endianness and the data is written as a packed 32 bit type then you need to swizzle the bytes not the components. That's IF the problem is endian related.

[This message has been edited by dorbie (edited 03-29-2003).]

tbfx
03-29-2003, 10:39 PM
> I guess what you are looking for is GL_UNSIGNED_INT_2_10_10_10_REV.

Yes tried this. Does some interesting
things, but fix the endian problem
isnt one of them.

> The only thing that could go wrong is *BYTE* swizzling on big vs little endian i/o.

Yes definately the problem is that internally
for the card each pixel is being considered an int32.

In any case, would be most cool to be
able to handle large volumes of
uncompressed data in a nice platform
independant way.

Im "Super Curious" what happens under
OSX (its little endian and has Nvidia support doesnt it?) Anyone ever test OSX for this?

I think it would be great if both
ATI and NVidia made an Endian switch
available for 10bit, 12bit, 16bit and Float.

Since they have drivers for both OSX and x86
they must have this figured out already in
the driver. Or its just slow on OSX
because its swizzleing I guess.

GPSnoopy
03-30-2003, 03:13 AM
What about using GL_ABGR as a source format along with "2_10_10_10_REV" ?

AFAIK OpenGL is always based on a big endian model. So if there is a problem it's probably in the image loader function.

ehart
03-30-2003, 11:51 AM
OpenGL uses the native endian model for client data. (GLX wire protocol may have a defined endianness) I think what the poster is looking for can be done by setting the swap bytes unpacking pixel mode with glPixelStore. I might be wrong here, because I can't remember exactly how it interacts with packed formats.

-Evan

tbfx
03-30-2003, 12:41 PM
> What about using GL_ABGR as a source format
> along with "2_10_10_10_REV" ?

yes, tried this and ABGR with 10_10_10_2 as well.
Still wrong. Although it takes the framerate
down from 40fps to 2fps so it must be doing
something.

> AFAIK OpenGL is always based on a big endian
> model. So if there is a problem it's probably
> in the image loader function.

Naw OpenGL endian-ness is definately
platform dependent.

On linux/x86 its big endian, on SGI
its definately little endian. On OSX
it should be little endian, but probably
just falls back to software for everything
but GL_RGB packing modes. (just a guess)

So I guess my new question is:

Is an nvidia card all BigEndian internally
or can it be reprogrammed to
swap byte orders. And if the answer is
no how are they dealing with this on
OSX.

HS
03-30-2003, 12:53 PM
Originally posted by tbfx:
On linux/x86 its big endian, on SGI
its definately little endian.

Naw... x86 based systems are little endian (they are eating the egg with the small end first)

On most RISC system "big endian" is the default.

tbfx
03-30-2003, 01:16 PM
HS says
> Naw... x86 based systems are little endian (they are eating the egg with the small end first)

Yes, you are right, sorry to confuse
the topic.
http://www.rsinc.com/services/output.cfm?tip_id=1804

ehart says
> OpenGL uses the native endian model for client data. (GLX wire protocol may have a defined endianness) I think what the poster is looking for can be done by setting
> the swap bytes unpacking pixel mode with glPixelStore. I might be wrong here, because I can't remember exactly how it interacts with packed formats.

A great suggestion, tried this with both
10_10_10_2 and 2_10_10_10_REV. Got different
results, but still not right. It is fast though, so theres hope for the future.
Reading the man page on glPixelStorei
this sounds like the right function.


Thanks

jwatte
03-30-2003, 01:29 PM
OpenGL constants for texture formats seem defined with big-endian conventions. GL_RGBA, GL_UNSIGNED_BYTE means that the bytes come in R, G, B, A order, which means that, read as a longword on a little-endian machine, they actually read 0xaabbggrr.

Similarly, I believe you may need to use glPixelStorei to set the UNPACK_SWAP_BYTES to get the behavior you want if your packed pixel format is packed into a little-endian pixel value (as opposed to big-endian pixel value). This impacts even simple formats, such as the 16-bit pixel (texel) formats.

tbfx
03-30-2003, 10:21 PM
Getting interesting.

glPixelStorei(GL_UNPACK_SWAP_BYTES, GL_TRUE);
glDrawPixels(pwid, phei, GL_RGBA,
GL_UNSIGNED_INT_10_10_10_2_EXT, pptr);

Doesnt work by itself. (still wrong, but quite fast @40fps)

I believe this might be a case where the
hardware method is broken and software isnt.
Ie, it you preface the above calls with
glMatrixMode(GL_COLOR);
glLoadMatrixf(mtx);
glMatrixMode(GL_MODELVIEW);

It looks great.

So basically software path is treating
the packing correctly which is a great start.

At 1 fps though it might be faster to
hand reverse the bytes.!

This is an Nvidia FX 1000 AGP8x btw.

dorbie
03-31-2003, 02:33 AM
Just ****ing shoot me, I've lost all hope. I give up!

tbfx
03-31-2003, 11:23 AM
Hopefully some kind soul from NVidia
will read this and log it as a bug.

Fixing it will definately be helpful
towards establishing themselves as a
viable graphics card for film
production.

(not to say Im not more than happy to
be proved wrong here with a brilliant
work around that no one anticipated
could exist)

dorbie
04-01-2003, 04:59 AM
If you can do this fast enough try this for each packed pixel before you send it to OpenGL if you need to correct for the bug in the Cineon file reader:




packed_out =
(packed_in & 0x000000ff) << 24 |
(packed_in & 0x0000ff00) << 8 |
(packed_in & 0x00ff0000) >> 8 |
(packed_in & 0xff000000) >> 24;



I find the manual page on this informative:




GL_UNPACK_SWAP_BYTES
If true, byte ordering for multibyte color components, depth
components, color indices, or stencil indices is reversed. That is,
if a four-byte component is made up of bytes b , b , b , b , it is
0 1 2 3
taken from memory as b , b , b , b if GL_UNPACK_SWAP_BYTES is true.
3 2 1 0
GL_UNPACK_SWAP_BYTES has no effect on the memory order of components
within a pixel, only on the order of bytes within components or
indices. For example, the three components of a GL_RGB format pixel
are always stored with red first, green second, and blue third,
regardless of the value of GL_UNPACK_SWAP_BYTES.


This explicitly states that only bytes within a component should be swapped and component ordering of packed types are not swizzled. For a packed type where 10 bit components cross byte boundaries I would expect there would be no swizzle. I dunno what the spec says, perhaps it is clearer or should I say different.

I don't think it is appropriate to take an issue with file i/o related endianness and imply it's the driver's problem. The Cineon file reader should handle it's own endianness problems. When a loader gives you an in memory representation that is a mangled 10_10_10_2 that's actually something like a 6_2,4_4,2_6,8 it is buggy unless you deliberately plan this. Just getting this and hoping the graphics implementation can fix it is nice but not a given. It's definitely not a deficiency of Nvidia OpenGL vs SGI OpenGL, the SGI system gets a free ride with a native endian file that produces a correct in memory representation that requires no swizzle.

GL_UNPACK_SWAP_BYTES should be consistent on all code paths, but it looks like the real bug is in the software path. It would seem there is a need to swizzle packed multi-component types, but it's just not supported in OpenGL. You must therefore fix the Cineon file loader to correctly handle endianness, or write your Cineon files with x86 native endianness (if that's even possible).


[This message has been edited by dorbie (edited 04-01-2003).]

jwatte
04-01-2003, 07:03 AM
Dorbie,

It appears that the poster is getting different results based on whether he forces the software path (by using an identity color matrix) or the hardware path. This implies to me that there's a driver bug.

Further, I think I can defend my assertion that OpenGL is defined in big-endian order, because when the layout is 10-10-10-2, that means the 10 _top_ bits are "red" and the 2 _bottom_ bits are "alpha". This is contrary to how most little-endian architectures count bits; the first 10 bits in little-endian parlance would be the 10 LEAST significant bits of the pixel value.

ehart
04-01-2003, 07:04 AM
Because all this was starting to bother me, I decided to crawl through the GL spec. It says in section 3.6 that byte swapping occurs before the components are extracted from the packed format.

This tells me that there is indeed a bug in the implementation you are testing against. As always, please attempt to contact the developer support for the that company as bugs, especially ones like this, can be very obscure.

-Evan

ehart
04-01-2003, 07:12 AM
jwatte,

The spec explicitly states that the ordering of bytes within shorts, ints, etc are up to the implementation. (Section 3.6) What it does specify is at which bit each of the components start. As a result a packed format of 8888 will be RGBA order when read in bytes on one machine, but ARGB when read on another.

-Evan

V-man
04-01-2003, 10:56 AM
Originally posted by ehart:
jwatte,

The spec explicitly states that the ordering of bytes within shorts, ints, etc are up to the implementation. (Section 3.6) What it does specify is at which bit each of the components start. As a result a packed format of 8888 will be RGBA order when read in bytes on one machine, but ARGB when read on another.

-Evan

I think what jwatte means is that the way the #defines are expressed suggest BIG endian.

The most basic example being GL_RGBA. I think most of us consider 0xaabbggrr as a backwards storage of the colors. On a big endian, it should be 0xrrggbbaa
and if you look at the #define (GL_RGBA), you see the order matches.

PS: most people read from left to right, big endian style

PSS: now Im confused. Why doesn't GL_RGBA work on little endian?


[This message has been edited by V-man (edited 04-01-2003).]

jwatte
04-01-2003, 05:15 PM
GL_RGBA, UNSIGNED_BYTE is actually not a bigger-than-a-byte format. Thus, the bytes are stored in the order suggested, as GL_UNSIGNED_BYTES in the order R, G, B, A. It just so happens that when you load this into a longword on x86, you get 0xaabbggrr. The native format on Windows seems to be 0xaarrggbb, so hence the GL_BGRA extension (which is now promoted to required format).

What I was saying last was that OpenGL spec does, indeed, specify which bits contain which components, once you've unpacked a sequence of bytes into a larger native data type (through swapping, or not). What I'm complaining (only mildly) about is the fact that the way the formats are defined, they're written using big-endian byte counting.

jwatte
04-01-2003, 05:26 PM
And, to further clarify this small point:

GL_UNSIGNED_SHORT_5_5_5_1 is defined to store the alpha bit in the lowest alpha bit. I'm not aware of any image file format that actually packs images like this.

Note that there are now several ways of specifying a "default" loaded image (in B, G, R, A format in memory):

traditional: GL_BGRA, GL_UNSIGNED_BYTE

new1: GL_BGRA, UNSIGNED_INT_8_8_8_8, setting UNPACK_SWAP_BYTES to TRUE if you're on little-endian host, else FALSE

new2: GL_BGRA, UNSIGNED_INT_8_8_8_8_REV, UNPACK_SWAP_BYTES to TRUE if you're on a big-endian host, else FALSE

To tie back to the original question, if the file format is defined as follows:

- each pixel is stored as a 32-bit word
- the two highest bits are A, then come R, G, B, so the lowest 10 bits are "B"
- the words are stored in little-endian format in memory

Then, on a little-endian machine, you'd specify this format as GL_BGRA, UNSIGNED_INT_2_10_10_10_REV; on a big-endian machine you'd additionally set UNPACK_SWAP_BYTES to TRUE. If the file format was defined using big-endian 32-bit words, then you'd set SWAP_BYTES to TRUE on little-endian machines, but you'd still use 2_10_10_10_REV (assuming the file format definition above).

Clear as mud? Are we all agreeing aggressively with each other at this point?

dorbie
04-01-2003, 08:10 PM
Jwatte I agree that he shouldn't get different result on two code paths, I said exactly this, I hope you don't think I disagree on this? The key question is which code path is broken, as I said in my post? I was frankly surprised to see in the manual that unpacking was explixitly only at the component level so I was cautious in my remarks and even hinted that the spec may differ.

Now it looks like the manual disagrees with the spec since Evan kindly looked for us. And the swizzle should happen with the packed type before the components are separated.

Am I reading you correctly Evan? Is it clear that this means the packed type is swizzled not just the components?

This means that there are two bugs, one on the hardware path and one in the manual page. Or at the very least the manual page should be a little clearer.

I dislike some of these discussions about explicit endianness because people get confused over stuff that should be a non issue and start attributing reasons to the wrong stuff or making claims of explicit endianness for something or other. Most of the confusion over endian issues arises from this kind of loaded misleading discussion and I never seem to have trouble with it personally. I've even written anonymous endian handling code that doesn't care what the file endian is, it just knows if it is opposite or equal to the current system and handles it. It doesn't even know what the native endianness is. The only time you'd ever care would be a wreckless cast from int to byte for example.

0x000000FF is always your low order byte reguardless of the endianness of your system. Only *((char *)foo + 3) is system dependent.

For me it seems clear that a 10_10_10_2 format should produce the two bits of alpha in the 2 LSB of the format. Just as 0x00000003 masks the low order bits. There is no endianness dependency to this, it is clear and unambiguous. It is NOT system dependent and it says nothing about endianness. The location of the LSB in a cast *byte* stream is system dependent, but good code should either not care or at least be absolutely clear on the reason for casting and the effects.

The Cineon issue was caused by a loader that created an erroneous in memory representation because it loaded a file written as packed binary on an opposite endian system without correction. Hopefully byte swizzling in OpenGL will fix that, we'll see. The software swizzle I posted will work around in the meantime.

On the other stuff, I disagree. But you defined your own file format there. The guy said Cienon was 10_10_10_2, not the reverse. I also strongly disagree with your comments "on a little endian machine do this, and on a big endian do that.". NO, it depends if the bytes are swizzled because of a problematic file read. If you natively create 2_10_10_10 with MSB alpha etc, it works on either system (using your example).

jwatte, sorry, you *seem* to have a fundamental misunderstanding with MSB & endianness. The MSB is ALWAYS in the correct place. The MSB on a big endian vs a little endian system is in a DIFFERENT PLACE, there's no need to swizzle if the MSB data is in the MSB location. They are DIFFERENT bytes. That only matters for i/o.

If you read a file as packed ints that has been written as packed ints on another system you need to swizzle depending on whether that OTHER SYSTEM matched the endianness of THIS SYSTEM. It has nothing to do with the native endianness but whether the unadulterated binary byte order written to the file on the other system and read to this system is the correct native order. If it is not then the bytes wind up out of order for a native (in this case 10_10_10_2) representation, because they are in the byte order for an opposite endian system. This is exactly what happened with the Cineon file trasnsfer from SGI and is what happens with all other endian related binary file reads.

If the in memory representation is correct on either a big or little endian system you don't swizzle.

[This message has been edited by dorbie (edited 04-01-2003).]

dorbie
04-02-2003, 05:46 AM
I decided to check the spec for myself, and it is ambiguous (or more correctly broken).

In section 3.6 the spec clearly talks about data swizzling of "elements" in table 3.7. and accompanying text. Table 3.6 immediately preceeding it refers to "Element meaning and order" in the second column, and the text refers to the number of elements in a group. The implied meaning of "element" is an individual component when it should be a GL Data Type array element.

I think the ambiguity arises because you have a single "GL Data Type" representing multiple "Elements" in table 3.6 because these 'special interpretation' packed types weren't around when an "Element" = "GL Data Type" = "Component", all singular. The different behaviour in two code paths in the same driver is a real world example of the consequences of this ambiguity.

I think the behaviour should obviously be to swizzle at the packed "GL Data Type" - array elements level, not at the component level. It needs some clarification though, and of course the manual should be updated to reflect this.

To make this clear, life was simpler when parts of the spec were written and we had:




/--> GL Data Type (element) ----> Component
/
Pixel Data ---> GL Data Type (element) ----> Component
\
\--> GL Data Type (element) ----> Component


Now with packed formats we also have:




/---> Component
/
Pixel Data ---> GL Data Type (element) ----> Component
\
\---> Component


The heart of the problem is table 3.6 (I've decided :-) RGBA packed formats really have one 'Element' per pixel and four components, and unpacked 'traditional' formats have multiple elements so that table can no longer be created in it's present form for format names. The manual is particularly misleading because of similar assumptions but makes really emphatic statements that are wrong (or will be when table 3.6 is fixed).

The spec needs to avoid associating element counts and separate components with particular format tokens.


[This message has been edited by dorbie (edited 04-02-2003).]

jwatte
04-02-2003, 10:36 AM
> jwatte, sorry, you *seem* to have a
> fundamental misunderstanding with MSB &
> endianness.

I suppose I'll just go and re-implement those drivers and the linker/compiler I worked on for both x86 and PPC platforms, then. No, wait, they've been working fine for eight years, they didn't suddenly break because of an Internet post! ;-)

When you find something at all in what I said that's WRONG (as opposed to different from your own opinion on how best do it) then please let me know.

Meanwhile, let me justify why I prefer to do it the way I recommended (which I claim is correct):

I prefer to read the file into memory as-is, and then tell the driver to deal with the data as it arrives from the file. Memory mapping files wouldn't work at all if I did it your way (which means having the program touch the data before having OpenGL touch it). To me, it seems clearly superior to offload it all to the driver, because it's likely to either do the same job I'd be doing, OR do a better job, so it's either a wash, or a net win.

Btw: when you say "swizzle" it's somewhat un-clear, as the spec doesn't use that word. It uses the word "reverse" when it comes to component order, and the word "swap" when it comes to byte ordering of elements larger than a byte.

Now, that's not a correctness issue, just a preference issue. Your way is good too, except for these cases where it isn't.

I thought that clearly defining what the expectations for a file format might be, and then show how it resolves through reading the spec, would be a good way of illustrating how it works. The original post did not actually specify what the exact file conventions were, so I couldn't use those for the illustrative example, or I would have.

dorbie
04-02-2003, 03:31 PM
OK, looking back at your post I've tried to boil it down to the most objectionable part, and why we disagreed, in doing this I've realized we agree with each other and you're not wrong. In the definition before your last paragraph the second line you define your format as "- the two highest bits are A ... lowest bits are B". The reason I objected was that this does not define in any way the endianness of the data and whether UNPACK_SWAP should be true. The location of the "highest" bits is different on a big and little endian system and in files written on big and little endian systems. If you'd said the *first* and *last* bits it'd be defined (for a bigendian format), this is an important distinction because it is THE definition of endianness.

I thought you were making the assumption that big endian is somehow preferred in a file or memory representation.

Sorry, but I've just noticed that after saying all this in item 3 you state that the format is stored as little endian in memory. OK so you're not wrong. You do get it and we do both violently agree.

I find it slightly objectionable that anyone would define high & low bits in a format then say it's little endian in memory potentially changing the programmatic location of these on any system until after a swizzle, but you are correct in what you wrote.


[This message has been edited by dorbie (edited 04-02-2003).]

dorbie
04-02-2003, 04:38 PM
Table 3.6 in the spec still needs to be reworked. It's broken for packed types. The manual page is also wrong in this for packed types.

tbfx
04-02-2003, 11:42 PM
Thanks for all the discussion...

Btw,

GL_UNSIGNED_CHAR +
GL_RGBA, GL_RGB and GL_ABGR_EXT all
work without any endian issue across
both x86 and Irix. Ie the same raw
data file will appear identical across platforms.
The Component is UNSIGNED_CHAR, no endian issue there
(and definately not an argument that openGL
is bigendian.)

With 10_10_10_2 packing it is represented as GL_UNSIGNED_INT and so is justifiably
tied to endianess of an INT.

So I agree the software driver on nvidia is
correct, the hardware driver just appears
to be ignoring GL_PACK_SWAP_BYTES or is
unaware of how to properly handle 1010102
with this regard.

dorbie
04-02-2003, 11:51 PM
Did you try the workaround I posted? What kind of performance do you get with that?

tbfx
04-03-2003, 02:45 PM
Nope already had something similar, but I
imagine it might be useful if translated
into a register combiner or fragment shader.

Anyway, the point is
"life is too short, dma everything"

dorbie
04-03-2003, 04:32 PM
My point would be, where is the image from and at what data rate? You can still DMA after the swizzle and get on with the next frame. I doubt this workaround would be the performance limiting factor under the right circumstances.