Correct me if I'm wrong. If I were to order the "conditional power" of various fragment shading instruction sets, they'd come out (from best to worst) as:

NV_fragment_program
DX9 PS2.0
ARB_fragment_program

I base this on the nVIDIA extension having extensive predication implemented through write-masking based on a set of condition codes, PS 2.0 having a very simple version of this, and ARB not having any predication or conditional write-masking.

While you can implement these things using SGE and multiplication, it seems like a piece of hardware that can "do more" must have a very good optimizer to actually recognize what's going on, and evenso, it would be hit or miss whether the instruction sequence could be collapsed to a single conditional write mask.

Is this a correct interpretation, or have I over- or under-estimated the power of any of these instruction sets?