I am trying to write a shader for the Radeon 9700 using glslang, but I am continually running into a problem where the shader runs in software because I have exceeded the number of available ALU instructions. What are some ways of breaking the thing up into smaller pieces? If you create a couple of modules and compile them separately, then link them and call the functions from main, does that allow you to have a slightly bigger program? I would go ahead and write this in a couple of passes, but I have no place to store the intermediate calculation values, and I would hate to have to go to all the trouble of initializing another float pbuffer to hold the intermediates. I just spent a bunch of time getting a double buffered pbuffer working so I could avoid the speed hit of a context switch, and I would really like to avoid using another buffer. I suppose if I could get my pbuffer to have three or more buffers, that might work, but I have never seen any code that actually works with that.
In any case, any pointers would be highly appreciated.