Un-possible! (aka, The Feasability of Automatic Multi-Pass)

I recently made the assertion that automatic multipass was an impossible problem to solve in a general way. This is

not the sound-barrier or 4-minute-mile type of impossible, but 2+2=5 kind of impossible.

People seem to be assuming that a compiler can do what John Carmack has done for Doom 3. That is, take a single

expression of a lighting system (say written in GL2 shading language) and compile it to fit into several different

types of rendering paths. This includes breaking it into multiple passes.

I define multiple passes as sending vertexes and fragments through the pipeline several times with non-final passes

using results from previous passes (temporaries). The temporary values can be stored in the RGBA of the

framebuffer, the stencil, and rendered-to texture maps.

I define the problem of automatically multipassing as taking a vertex and fragment program P, which could be run in

a single pass on hardware with enough resources and breaking it into several sub programs which each use less

resources and have to be run in series. Each program (P1, P2, … PN) would be a pass. Each produces temporaries

which are then used by the next program until PN is used. PN outputs the final fragments.

I have to modify this definition, because I cannot honestly hold it up because I could be accused of strawman

tactics. It should be obvious that not all programs can be broken up into several chunks which run in series to the

end and then stop. If a branch in P5 references P1, then P1 would have to be reloaded. Now the driver can not even

tell you how many passes a program will take (think halting problem, and if that doesn’t convince you, realize at

least that you would have to tell the driver what the data will be, because the number of passes becomes data

dependent!)

Branching gives yet more obvious problems if every other vertex can branch in a different way. It is imaginable

that every single vertex in a mesh would require a different chunk of P to be loaded! So, at any time all the

vertexes in your mesh together may require the sum total of all the resources your shader will ever call upon,

therefore it would impossible to choose any one chunk of P (which uses less resources) to load to do even a single

pass on all vertexes if the shader cannot call upon all resources at once. What good does breaking a program apart

do if you may have to load it all at once anyway?

Some have suggested multipass per primitive, but the paragraph above suggests you need multipass per vertex. Does

that even make sense?

Maybe I cannot really define automatic multi-pass. I was hoping I could at least give a fair definition with an

actual meaning beyond ‘shader automatic multi-passing is when shaders automatically multi-pass’

Lets say we use a Quantum GPU to solve the problems above…

Can we really figure out what P1, P2, … PN should be? At first brush one might think that you could just take the

assembly and put context saving commands at the beginning and end. Like multitasking a CPU. But the closer

resource usage is together the smaller the chunks will be, at some point these chunks would become so small that a

much better route is to just forget doing it on a GPU and do it on the host CPU. Basically, I am saying here that

you cannot look at the assembly code in order to figure out how to break up P.

One would have to analyze the high level code of the shader in order to determine how whole statements can be

rewritten and chunked. Suddenly P1, P2, … PN are not straightforward pieces of the original program, but pieces

of P which have been rewritten, rearranged, and mangled, so that they can be run seperately and spit out

intermediate results which can then be used by the next chunk. Each chunk must use resources in a way that fits

into the GPU. Whats so hard about that? Well, I believe that analyzing a program this way is equivalent to writing

a program which will tell you if a another program will halt and give you an answer (that is impossible, so chunking

a program this way is impossible too). Even if someone could show me how to do this, the other reason I gave above

for saying its impossible still stands. How do you take these chunks and actually apply them to more than one

vertex at a time?

I wish a driver writer would chime in and defend me on this. Someone, cass I think, mentioned something which

sounded similar in that it mentioned Turing but did not go into detail.

If anyone feels like mentioning the Stanford Shader Language or the paper showing the dependent texture reads and

floating point buffers are all that are needed to implement renderman in OpenGL, let me remind you that these are

fundamentally different approaches to the problem of shading. People want shaders, compiled to opcodes, to be

broken into as few passes as the compiler can figure out (preferably as few as possible). The Stanford Shader

Language compiles shaders into predefined passes which act as the shading langauges opcodes. There is a big

difference between ‘compile my shader into a series of predefined passes’ and 'break my shader up into arbitary

passes each doing as much as it can’

My tentative conclusion is that auto-multi-pass is only possible if you have predefined pass types which you can

treat as opcodes for the shading language itself. That is approaching the problem from the other direction.

I would be the first to admit that I could be hugely wrong or sorely mistaken or both. But please, I am looking for

‘counter-proof by example’ here, which means that I’ll only change my mind if you can tell me an actual method by

which my concerns can be addressed.

The first thing I expect is for people to say that I have defined the problem completely wrong. But I do not think

I have. I have gleaned this from what people have said they want or how they expect such a system to behave. For

instance, a recent post suggested that one should be able to query how many passes a shader will take, while may

people seem to believe that this is possible. But, this is obviously a mistaken notion because one simply cannot

predict how many passes a shader will take unless one considers the vertex data and current state, and even then

most people with computer science degrees should know that you cannot predict the outcome of a program without

running it (even if its in your own head).

Wow, this is long. Who am I to set everyone straight anyway…