Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 2 of 2

Thread: Mat4x3 Uniform Registers?

  1. #1
    Intern Newbie
    Join Date
    Jun 2011
    Posts
    36

    Mat4x3 Uniform Registers?

    I remember reading that mat4x3 takes 1 more register than mat3x4 as it stores it as 4 columns of vec3. At least i can't seem to find anywhere to confirm this anymore. Has this changed in the spec at all? Does it automatically store it in 3 as well or do i have to use mat3x4 instead?

    Also:

    Code :
     
    // assuming i believe these do the samething...
     
    mat3x4 a;
     
    result = transpose(a) * vec4(somevalue, 1); // better as maintains "order"
    result = vec4(somevalue, 1) * a; // similar performance as above?

  2. #2
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,220
    Quote Originally Posted by xerzi View Post
    I remember reading that mat4x3 takes 1 more register than mat3x4 as it stores it as 4 columns of vec3. At least i can't seem to find anywhere to confirm this anymore. Has this changed in the spec at all? Does it automatically store it in 3 as well or do i have to use mat3x4 instead?
    Just tried it, cross-compiling GLSL to NV assembly (gp5vp profile), and it looks like what you say is the case. mat4x3 takes 4 uniform slots. mat3x4 takes 3. Which makes intuitive sense. GLSL is column major by default, so it's all about the number of column vectors.

    In a test I just did, passing in a mat4x3, postmultiplying it by a vec4 directly, and outputting the vec3 from the shader consumes 8 instructions (w/ 2 R-Regs). However, if I pass in a mat3x4, transpose it, and postmultiply that by the vec4 to output a vec3, I get 19 instructions (3 R-regs). Lots of extra moves. So a penalty of 11 instructions and 1 R-reg to eliminate use of one uniform slot while keeping the v2 = A*v multiplication order.

    This also talks about it:

    * http://www.horde3d.org/forums/viewtopic.php?f=8&t=1537

    You can try using row_major. Then ideally it should be all about the number of rows.

    Code glsl:
    // assuming i believe these do the samething...
     
    mat3x4 a;
     
    result = transpose(a) * vec4(somevalue, 1); // better as maintains "order"
    result = vec4(somevalue, 1) * a; // similar performance as above?
    Yep. The cost of the latter is 8 instructions (w/ 2 R-reg), so that's definitely one way to go. As I said, here the former consumes considerably more assembly instructions, and so prob isn't your best bet. But check into this and see.
    Last edited by Dark Photon; 08-10-2014 at 04:16 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •