Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 5 of 5

Thread: GLSL pads int arrays to 16 bytes, taking 4 times local memory

  1. #1
    Junior Member Newbie
    Join Date
    Mar 2011
    Location
    Australia
    Posts
    20

    GLSL pads int arrays to 16 bytes, taking 4 times local memory

    I have an NVIDIA GTX 670, linux driver 319.49. In my shader I declare an array of integers...

    int myArray[512];

    When I use glGetProgramBinary and have a look at Nvidia's handy plaintext output I can see the lmem declaration as follows:

    TEMP lmem[512];

    Re-declaring the array as ivec4 myArray[128] and modifying my code to index the array with myArray[i/4][i%4] produces the following line in the binary output:

    TEMP lmem[128];

    Also, and this is the really cool part, my program doesn't crash on the first draw call using this shader.

    Has anyone else seen this annoying behaviour? Why would padding an int array to 16 bytes per element help? I can understand padding user-defined structs to align accesses for performance, but taking 4 times memory seems a little overkill.

    Finally, I can't imagine [i/4][i%4] gives great performance. Is there a nicer way to have a linear integer array I can access with a single index AND not run out of memory due to padding?
    Last edited by sleap; 10-21-2013 at 07:15 PM.

  2. #2
    Junior Member Newbie
    Join Date
    Mar 2011
    Location
    Australia
    Posts
    20
    A short update... both divide and modulus op are not cheap. It's quite a bit faster to use bitwise ops:

    myArray[i>>2][i&3]

    Still not optimal, and I'm guessing part of the slowdown is related to the actual memory operation itself. It all depends on the context - is it better to use less memory or have faster access to it. Keeping in mind that less memory may ultimately be faster.

    [EDIT]
    I should also mentioned this whole ivec4 trick seems to only work half of the time. The other half I'm pretty sure the indexing works mathematically but I get the wrong results. Maybe a driver bug with indexing vectors? I hate jumping to driver-bug conclusions with GLSL but they're not exactly uncommon.
    Last edited by sleap; 10-23-2013 at 01:53 AM.

  3. #3
    Junior Member Newbie
    Join Date
    Mar 2011
    Location
    Australia
    Posts
    20
    Can anyone think of a nice way to tightly pack vec2 data?

    The trouble is indexing elements .xy and .zw. Ideally I'd like to be able to have something like this...

    Code :
    float arrayUnpacked[N];
    #define array(x) arrayUnpacked[x]
     
    vec4 arrayPacked[N/4];
    #define array(x) arrayPacked[(x)>>2][(x)&3]

    but for vec2, not float.

    Code :
    vec2 arrayUnpacked[N];
    #define array(x) arrayUnpacked[x]
     
    vec4 arrayPacked[N/2];
    #define array(x) //something ???
     
    struct Vec2In4 {vec2 data[2];} //unfortunately vec2 gets aligned to 16 bytes, taking 32 bytes total
    Vec2In4 attempt[N/2];
    #define array(x) attempt[(x)>>1].data[(x)&1]

    As far as I know all the layout() modifiers only affect uniforms/blocks and there's no way to change structs in local memory.

  4. #4
    Senior Member OpenGL Pro
    Join Date
    Jan 2012
    Location
    Australia
    Posts
    1,104
    I may be wrong but I thought all local variables where likely to be vec4 unless the compiler optimizes 4 variables into parts of vec4. This is because of the nature of the hardware which is optimized around vec4. GPU's are not yet really general purpose computers and so are often sub-optimal when not working on data such as colour (vec4) and vertices (typically vec4 for manipulating).

  5. #5
    Senior Member OpenGL Pro
    Join Date
    Jan 2007
    Posts
    1,136
    I don't know the GLSL syntax for this, but in HLSL it's achievable by using a cast:

    Code :
    float4 Stuff[128];
    static float StuffValues[512] = (float[512]) Stuff;

    I'd expect that you could do similar in GLSL.

    Another option is to just not worry about it. As you're already realising, reducing the memory usage comes at a cost of it's own, so it may be an acceptable tradeoff to just take the extra memory usage in exchange for better runtime performance when doing the array indexing. Memory usage isn't everything.

    A third way would be to encode your array in a texture, using GL_R32I. Texture lookups are, of course, slower than ALU on modern hardware so that may or may not work well.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •