Fragment prog gpu cycles

I read in some ATI papers that all fragment program instruction did not take the same number of GPU cycles.

Is there some kind of rule of thumb of what instruction should be used carfeuly because they take more than one GPU cycle ?

Specifically, CMP instruction seems pretty powerful, how many cycles for this one ?
And LRP ?

SeskaPeel.

ATi’s website has an OpenGL SDK, which contains a file on performance. It has what you need to know.

Well actually I was speaking of this paper.

I was wondering if it was the same on Nvidia cards, and if it will stay the same on future cards.

SeskaPeel.

probably this will give you some information http://www.3dcenter.org/artikel/cinefx/index3_e.php

“2d texture read, cycles : 1, Function can be performed twice per clock”
“2D and cubemap texture accesses are special cases as two of them can be executed per pass”

Will it be optimized this way even if tex calls are not one after another ?

TEX stuff ;
MAD stuff ;
TEX stuff ;

will be optimized ? or do I have to

TEX stuff ;
TEX stuff ;
MAD stuff ;

for the 2 TEX be executed within the same clock cycle ?

SeskaPeel.

>TEX stuff ;
>MAD stuff ;
>TEX stuff ;

It can be dependent from VPU/Memory speed even inside one product line(NV30-31-34-35, and for R3xx as well).

I test this situation on NV34 onlt and didn’t find any difference in oder of the ALU/Texture instruction. But it’s right only for this card…