vertex shaders -> varying frag. inputs

there’s a computation that i’ve been trying to figure out how to perform on graphics cards. the problem is that we need to avoid texture lookups because of their expense – our code needs to be really fast (it’s not graphics related).

the issue is that i have a resonably large amount of data that changes too much to be passed as uniform inputs. i can not send varying parameters to a fragment shader – as i understand, the only varying parameters allowed must be interpolated.

my idea is this. is it feasible to run both a vertex shader and a pixel (fragment) shader at the same time such that there is one vertex exactly aligned with each pixel? i.e. have a one-to-one correspondence between vertex and fragment kernels. i know i can pass arbitrary arrays of data into vertex shaders (for example, via cgGLSetParameterPointer), and that the vertex shader can then output (into texture coordinates) available to fragment shaders.

my thinking is that this will allow me to submit varying data to fragment shaders. will this work the way i’m thinking it will?

does my avoiding actual texture lookups in this fashion save me the amount of time (a lot) i’m thinking it would?

it seems to me the answer is yes… but i don’t see sample code doing this where it would (seems to me) work well.

thanks!

You can probably make that work. You should know that there’s a limit to the number of varyings that you can send to the fragment shader – on the order of 10 float4 values.

Texturing isn’t necessarily slow. It introduces latency, but the hardware pipeline is pretty deep, and will probably hide that from you. And, if you need more data than you can get through in attributes and varyings, well, you have the choice of using texturing, or not working at all :slight_smile:

Don’t optimize prematurely.

well, we’re trying to do something similar to the matrix multiplication written up here:
http://graphics.stanford.edu/papers/gpumatrixmult/

so at the very least, that is possible – so this may not be considered premature optimization… the problem is the incredible number of memory accesses required in matrix multiplication. we are hoping to develop a way that includes other calculations into this matrix multiplication and thereby improving the flop to texture access ratio.

i know there are as many as 256 registers available in some of the shader profiles, but the matrices we are multiplying are around an order of mag larger than 256.