complete the extended arithmetic support in GLSL

I will only talk for the addition. The subtraction is exactly the same.

Currently we only have uaddCarry which only produce carry out, but does not consume carry in.
uaddCarry can only serve as the first stage of extended-precision arithmetic. For the next stages we need a primitive that consumes the carry from the previous stage.
Of course we can simulate it with several extra operation but it is slower. uaddCarry also can be simulated (even with fewer operations) but still we have it.
Please add such function to GLSL. Something like uaddExtended(uint x, uint x, uint carry_in, out uint carry_out).
Every GPU and CPU architecture has native instructions for such operations (with the notable exception of MIPS, it neither has the analogue of uaddCarry).

I can not understand why they only added uaddCarry but not the complete extended-arithmetic primitive set. What are we supposed to do with uaddCarry alone?

It may be better the carry_in/carry_out bits to be implicit/hidden instead of explicit argument to the function because this way it will be closer to the actual hardware.
Otherwise the compiler may have hard time to extract the hidden hardware carry to a visible register and still keep the code as fast as possible.

About the name: the ‘u’ in the function name is superfluous. The arguments are logically neither signed nor unsigned by themselves - they are rather a part of extended operands that span multiple registers/variables. Or if you want to look at them as unsigned, then the “u” is still superfluous as it is implied by the “Carry” part and iaddCarry or saddCarry would not make any sense.

Well, you can already do 64 bit additions with uaddCarry:

resultLo = uaddCarry(op1Lo, op2Lo, carry);
resultHi = op1Hi + op2Hi + carry

You think that this is really so expensive?

Yes, maybe there should be a function that could get a carry value as input. While I suppose that current hardware would still most probably do two additions to implement it, but maybe future hardware can avoid that extra addition.

So, to sum it up, I agree with you that a new function that performs an addition with a carry bit previously got from uaddCarry would make sense, but i don’t understand your argument that doing this without such a function would be expensive. It’s just one more ALU.

bear with me please.
the full set is this (except for uaddCarry the names are just for designation):

uint uaddCarry(uint x, uint y, out uint carry_out)
uint uaddExtendedCarry(uint x, uint y, uint carry_in, out uint carry_out)
uint uaddExtended(uint x, uint y, uint carry_in)

uaddCarry is the first stage, uaddExtendedCarry is the middle stages if any and uaddExtended is the last stage
each stage processes 32 bits and for 64-bit arithmetic we don’t need uaddExtendedCarry

uaddCarry can be implemented for example with these 2 operations (I assume the selection (cond?a:b) is a basic operation):

res = x + y;
carry_out = res < x ? 1 : 0;

uaddExtendedCarry can be implemented with 4 operations (try to find algorithm with less than 4 - i was unable to find one, though i can’t prove it’s impossible):

res = x + y;
carry_out = res < x ? 1 : 0;
res = res + carry_in;
carry_out = res < carry_in ? 1 : carry_out

uaddExtended can be implemented with 2 operations:

res = x + y;
res = res + carry_in;

so if we only care for 64-bit arithmetic, then ignore uaddExtendedCarry. But still uaddExtended uses 2 operations, same as uaddCarry. Then I ask why do we have a special function for uaddCarry but not for uaddExtended?
I mean if, as you say, it is not big deal to do uaddExtended by hand with the 2 operations, why don’t do the same with uaddCarry and don’t bother to add special built-in function. Do you follow my logic now?
So i say, if we have uaddCarry then better have the complete set as well.

If we care for the >64-bit cases, then we need uaddExtendedCarry, which is even bigger win to have as function which maps to single hardware instruction.
>64-bit arithmetic could be useful e.g for doing cryptography on the GPU

Okay but do not use an uint for the bit.
Use a bool or something.
(Not an object but the primitive type, it gives better performance I think.)