GLSL pads int arrays to 16 bytes, taking 4 times local memory

I have an NVIDIA GTX 670, linux driver 319.49. In my shader I declare an array of integers…

int myArray[512];

When I use glGetProgramBinary and have a look at Nvidia’s handy plaintext output I can see the lmem declaration as follows:

TEMP lmem[512];

Re-declaring the array as ivec4 myArray[128] and modifying my code to index the array with myArray[i/4][i%4] produces the following line in the binary output:

TEMP lmem[128];

Also, and this is the really cool part, my program doesn’t crash on the first draw call using this shader.

Has anyone else seen this annoying behaviour? Why would padding an int array to 16 bytes per element help? I can understand padding user-defined structs to align accesses for performance, but taking 4 times memory seems a little overkill.

Finally, I can’t imagine [i/4][i%4] gives great performance. Is there a nicer way to have a linear integer array I can access with a single index AND not run out of memory due to padding?

A short update… both divide and modulus op are not cheap. It’s quite a bit faster to use bitwise ops:

myArray[i>>2][i&3]

Still not optimal, and I’m guessing part of the slowdown is related to the actual memory operation itself. It all depends on the context - is it better to use less memory or have faster access to it. Keeping in mind that less memory may ultimately be faster.

[EDIT]
I should also mentioned this whole ivec4 trick seems to only work half of the time. The other half I’m pretty sure the indexing works mathematically but I get the wrong results. Maybe a driver bug with indexing vectors? I hate jumping to driver-bug conclusions with GLSL but they’re not exactly uncommon.

Can anyone think of a nice way to tightly pack vec2 data?

The trouble is indexing elements .xy and .zw. Ideally I’d like to be able to have something like this…


float arrayUnpacked[N];
#define array(x) arrayUnpacked[x]

vec4 arrayPacked[N/4];
#define array(x) arrayPacked[(x)>>2][(x)&3]

but for vec2, not float.


vec2 arrayUnpacked[N];
#define array(x) arrayUnpacked[x]

vec4 arrayPacked[N/2];
#define array(x) //something ???

struct Vec2In4 {vec2 data[2];} //unfortunately vec2 gets aligned to 16 bytes, taking 32 bytes total
Vec2In4 attempt[N/2];
#define array(x) attempt[(x)>>1].data[(x)&1]

As far as I know all the layout() modifiers only affect uniforms/blocks and there’s no way to change structs in local memory.

I may be wrong but I thought all local variables where likely to be vec4 unless the compiler optimizes 4 variables into parts of vec4. This is because of the nature of the hardware which is optimized around vec4. GPU’s are not yet really general purpose computers and so are often sub-optimal when not working on data such as colour (vec4) and vertices (typically vec4 for manipulating).

I don’t know the GLSL syntax for this, but in HLSL it’s achievable by using a cast:

float4 Stuff[128];
static float StuffValues[512] = (float[512]) Stuff;

I’d expect that you could do similar in GLSL.

Another option is to just not worry about it. As you’re already realising, reducing the memory usage comes at a cost of it’s own, so it may be an acceptable tradeoff to just take the extra memory usage in exchange for better runtime performance when doing the array indexing. Memory usage isn’t everything.

A third way would be to encode your array in a texture, using GL_R32I. Texture lookups are, of course, slower than ALU on modern hardware so that may or may not work well.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.