ARB_fragment_program parameters: cost of updating?

Suppose I’m given a fragment program in a text file, and it binds some named parameters using some external meta-data (“shininess” and “frobbicity,” say).

Suppose I’m using different geometries with the same program, but different values for “shininess” and “frobbicity”.

Is it cheaper to instantiate multiple copies of this program, each with a different value, or is it cheaper to instantiate one copy of the program, and re-load these program local parameters per geometry using the program?

I guess the question is what’s more limited: bandwidth/latency of updating these parameters, or available storage wherever these fragment program opcodes are stored? My guess would be that the program storage is some ultra-optimized zero-latency memory near the fragment processors, and that it would be limited in size, so it would be quite limited. But then, shuffling parameters into this space might be constrained, too?

I think changing the parameters is the fastest way to go about it.

Having a lot of program and switching between them just doesn’t sound… Very fast (nor practical).

Well, I could think of an implementation where the fragment program file is just a cache of various segments of main memory (or VRAM). If that were the case, then paging in a new program might be farily low start-up cost, and coding constants into the stream may be faster than reading them from (very limited) temporary register space.

I could also think of an implementation where the driver has to stop the rasterizer, then spoon-feed the program into some magic memory-mapped address, and then start up the pipe again; that would make parameters vastly superior.

That being said, the actual implementation for GeForce FX and for Radeon 9700 might be different again from these two concepts; hence, the question.

For either card I don’t see those two methods being a problem. Probably just as fast either way.

Though I could see the multiple program way being useful for doing materials, but I think using parameters would be the fastest (even though you probably wouldn’t notice a difference).

Originally posted by jwatte:
Well, I could think of an implementation where the fragment program file is just a cache of various segments of main memory (or VRAM). If that were the case, then paging in a new program might be farily low start-up cost, and coding constants into the stream may be faster than reading them from (very limited) temporary register space.

If there is enough temporary registers inside the GPU, it will probably load in them every thing the program needs.

Reloading a new program means setting up the GPU all over again.

I could be wrong. What do I know?

V-man