I have an NVIDIA GTX 670, linux driver 319.49. In my shader I declare an array of integers...
When I use glGetProgramBinary and have a look at Nvidia's handy plaintext output I can see the lmem declaration as follows:
Re-declaring the array as ivec4 myArray and modifying my code to index the array with myArray[i/4][i%4] produces the following line in the binary output:
Also, and this is the really cool part, my program doesn't crash on the first draw call using this shader.
Has anyone else seen this annoying behaviour? Why would padding an int array to 16 bytes per element help? I can understand padding user-defined structs to align accesses for performance, but taking 4 times memory seems a little overkill.
Finally, I can't imagine [i/4][i%4] gives great performance. Is there a nicer way to have a linear integer array I can access with a single index AND not run out of memory due to padding?