PDA

View Full Version : Mat4x3 Uniform Registers?



xerzi
08-10-2014, 01:03 AM
I remember reading that mat4x3 takes 1 more register than mat3x4 as it stores it as 4 columns of vec3. At least i can't seem to find anywhere to confirm this anymore. Has this changed in the spec at all? Does it automatically store it in 3 as well or do i have to use mat3x4 instead?

Also:




// assuming i believe these do the samething...

mat3x4 a;

result = transpose(a) * vec4(somevalue, 1); // better as maintains "order"
result = vec4(somevalue, 1) * a; // similar performance as above?

Dark Photon
08-10-2014, 03:50 PM
I remember reading that mat4x3 takes 1 more register than mat3x4 as it stores it as 4 columns of vec3. At least i can't seem to find anywhere to confirm this anymore. Has this changed in the spec at all? Does it automatically store it in 3 as well or do i have to use mat3x4 instead?

Just tried it, cross-compiling GLSL to NV assembly (gp5vp profile), and it looks like what you say is the case. mat4x3 takes 4 uniform slots. mat3x4 takes 3. Which makes intuitive sense. GLSL is column major by default, so it's all about the number of column vectors.

In a test I just did, passing in a mat4x3, postmultiplying it by a vec4 directly, and outputting the vec3 from the shader consumes 8 instructions (w/ 2 R-Regs). However, if I pass in a mat3x4, transpose it, and postmultiply that by the vec4 to output a vec3, I get 19 instructions (3 R-regs). Lots of extra moves. So a penalty of 11 instructions and 1 R-reg to eliminate use of one uniform slot while keeping the v2 = A*v multiplication order.

This also talks about it:

* http://www.horde3d.org/forums/viewtopic.php?f=8&t=1537

You can try using row_major. Then ideally it should be all about the number of rows.




// assuming i believe these do the samething...

mat3x4 a;

result = transpose(a) * vec4(somevalue, 1); // better as maintains "order"
result = vec4(somevalue, 1) * a; // similar performance as above?


Yep. The cost of the latter is 8 instructions (w/ 2 R-reg), so that's definitely one way to go. As I said, here the former consumes considerably more assembly instructions, and so prob isn't your best bet. But check into this and see.