I am trying to write a shader for the Radeon 9700 using glslang, but I am continually running into a problem where the shader runs in software because I have exceeded the number of available ALU instructions. What are some ways of breaking the thing up into smaller pieces? If you create a couple of modules and compile them separately, then link them and call the functions from main, does that allow you to have a slightly bigger program? I would go ahead and write this in a couple of passes, but I have no place to store the intermediate calculation values, and I would hate to have to go to all the trouble of initializing another float pbuffer to hold the intermediates. I just spent a bunch of time getting a double buffered pbuffer working so I could avoid the speed hit of a context switch, and I would really like to avoid using another buffer. I suppose if I could get my pbuffer to have three or more buffers, that might work, but I have never seen any code that actually works with that.
In any case, any pointers would be highly appreciated.
Yeah, I looked at Ashli. The problem is that I am writing to and from float pbuffers (doing numerical, rather than graphical calculations), and I didn’t see any good way to impart that information to the program. When I tried to plug my fragment program in anyway, it crashed, hard. So, I am back to looking at about 7-8 passes, with using a lot of auxiliary buffers to hold intermediate results. Yuck.