Precision curiosity: 1/255 or 1/256?

marcus256 · April 17, 2003, 2:59am

I’m just wondering how GL hardware treats fragment values.

If we for instance have 8 bits per component, the OpenGL spec says that 0=0.0 and 255=1.0. This means that multiplications have to look like this: c = (ab)/255, and NOT (ab)/256, which is MUUUCH simpler to do in hardware. So my question is: how is it done in hardware?

Am I guaranteed that a*1.0 = a?

HS1 · April 17, 2003, 4:31am

I guess its (a*b)/256 because you can do that with a cheap shift or even hardwire it (then it would be for free).

When it comes to float’s or double’s they are just approximations so you can’t expect a mathematical correct results regardless of the operation.

marcus256 · April 17, 2003, 4:57am

Originally posted by HS:
I guess its (a*b)/256 because you can do that with a cheap shift or even hardwire it (then it would be for free).

Yes, I know (in hardware you simply “rewire” the bus, discarding the lower bits). That is exactly my concern.

I was thinking about situations where you would want to interpret the framebuffer values in a custom way (say, use the alpha channel as an exponent). Then it’s very important that 1*x = x. If you want to do it right, I imagine you need another 9-bit multiplication (per component) or something to do the 1/255 scaling, which sounds costly.

Doing it the simple way by scaling with 255/256 also means that some multipass operations will slightly darken the image…

vincoof · April 17, 2003, 5:36am

Actually it’s more likely to be a=b*255.0f rather than a=b/255

I think that you are guaranteed that a*1.0f=a
It is a necessary condition for invariance that I’ve read in many specifications (for instance vertex programs have some conditions about it)

vincoof · April 17, 2003, 5:41am

From the spec :

For an RGBA color, each color component (which lies in [0; 1]) is converted
(by rounding to nearest) to a fixed-point value with m bits. We assume that
the fixed-point representation used represents each value k/(2^m-1), where k belongs to {0, 1, …, 2^m-1}, as k (e.g. 1.0 is represented in binary as a string of all ones).

ehart · April 17, 2003, 6:53am

First, I have no comments on actual HW implementations, nor do I even know exactly how a multiply is implemented.

For an example of how to deal with a 1/255 term, I would suggest looking at one of Jim Blinn’s books. He has some interesting fixed point/repeating fraction tricks in one of them.

As far as the spec itself goes, it really leaves a lot of freedom on how to do the internal computations, that is why you see all these floating point shaders coming about. It is only specific on how to convert to/from fixed point and float. All the pixel ops are pretty much spec’d to occur logically in clamped floats.

-Evan

marcus256 · April 18, 2003, 9:50am

Ok, so 1.0*a = a should hold true in most cases (phew!). I think many algorithms would screw up otherwise…

First, I have no comments on actual HW implementations, nor do I even know exactly how a multiply is implemented.

I don’t know about how it’s done in gfx hw either. It’s always a tradeoff. A division is out of the question. A multiply by 1/255 (represented in some suitible finite form) might be managable (since it’s a constant, you can usually do pretty decent HW optimizations). Actually, 65536/255 = 257.00392…, which is 100000001.00000001… ~= 100000001 binary. Multiplying by 100000001 only requires one adder in hardware

You can probably do it as a LUT too, but for a heavily piplined and parallelized architecture such as a GL chip there would have to be a huge load of such LUTs, taking up too much silicon, I guess.

imported_jwatte · April 18, 2003, 11:22am

I suggest reading that Blinn article. It really isn’t THAT much harder to hard-wire transistors to multiply/divide/fractionalize by 255.

HS1 · April 18, 2003, 12:06pm

Jwatte, I am interessted in reading that article. I would appreciate it if you could point me to it (he wrote a couple of books and had his own colum “Blinn’s corner” in the IEEE).

And yes, I looked at the IEEE specification again and x*1.0f = x. I am just cautious when its comes to floating point precision.

To tell the truth, I assumed that GPU’s use the (a*b)/256 trick since that was what I used in my software renderer years ago…

[This message has been edited by HS (edited 04-18-2003).]

imported_jwatte · April 18, 2003, 8:49pm

Sorry, I just remember reading the article somewhere many years back – I don’t have a handy URL to paste.

marcus256 · April 20, 2003, 10:04am

I couldn’t find any on-line Blinn docs using Google (didn’t try that hard though)

Anyway, I think the “trick” with multiplying with 257 is quite promising. Here’s the approximation:

We want:
C = (A*B)/255 (with rounding)

We do (in integer math):
C’ = A*B;
C = (C’ + (C’ >> 8) + 128) >> 8

The latter is correct for 99.96% of all combinations of A*B. +128 is there for rounding purposes. The required hardware is not even a full 16-bit adder, and I’m sure you can tweak it down to a very limited number of transistors (compared to the 8x8->16 multiplier anyway).

The algo is simple enough to be used in software too, in my opinion.

It would be fun to know how it’s done in actual hardware.

HS1 · April 20, 2003, 10:25am

In addition: If the adder saturates at 2^16 (quiet easy to implement in hardware as well with MMX,SSE…) it will work in all cases.

Very nice Marcus.