ARB_fragment_program: indexed TEX?

Wouldn’t it be awesome if the next generation of hardware supported a TEX instruction where the actual texture object to sample was a register/parameter?

I would be interested in what you want to use that for.

How about a programmable rasterizer?

I can think of a lot of different uses.

For a practical example, you could submit material maps, where the actual material was selectable per triangle in your mesh. Right now, you either have to expand out all possible combinations and kill the ones you don’t want (which is horribly inefficient) or re-submit a larger number of smaller vertex buffers, one for each material you want to apply.

For example, if you have a human that can swap out hair, clothing, retina, skin, etc, then either you submit each of those as a separate buffer, or you just submit one buffer, and let some per-triangle parameter sift through to the fragment shader to tell you which color map, bump map, etc you’re interested in.

Another example would be terrain: you could say that each triangle interpolates between three different terrain textures, but you can bind 16 textures in total. Thus, you pick 3 textures out of the total palette of 16 in your shader. It’s very analogous to ARL and how you implement matrix palette skinning in a vertex shader.

There’s all kinds of procedural shading and layering tricks you can pull with this kind of instruction; the examples I’m quoting are just pedestrian ones that I can see immediate need for. Reducing the number of vertex buffer submissions is a worthy goal, in my experience.

For example, if you have a human that can swap out hair, clothing, retina, skin, etc, then either you submit each of those as a separate buffer, or you just submit one buffer, and let some per-triangle parameter sift through to the fragment shader to tell you which color map, bump map, etc you’re interested in.

It’d be just as easy to swap (in your data structures) the texture object with the proper one before rendering. After all, you’re going to have to render the object multiple times anyway.

One problem with this idea (in general) is efficiency. The driver has to make sure that all these textures are avaliable simultaneously. If you only ever use, say, 8, of them, you may be slowing things down unnecessarily.

Korval,

No, I cannot “just swap out that texture”. That’s the whole point. To paint the entire mesh, I need to bind at least 6 textures. Either I can bind each texture, and draw 1/6th of the mesh, in a total of 6 submitted buffers. Or I can bind all 6 textures, submit a single buffer that contains all the data, and use texture unit indirection to choose which one I sample.

DirectX 9 requires the ability to bind 16 textures at the same time, and pass 8 texture coordinates from vertex shading to pixel shading. Sure, some enterprising hardware implementor will probably implement this with a single texture fetcher, looped 16 times, but that hardware hopefully (probably) will not find traction in the market. At a minimum, I would say you need 16 texture cache blocks to feed that single unit and keep it happy :slight_smile:

No, I cannot “just swap out that texture”. That’s the whole point. To paint the entire mesh, I need to bind at least 6 textures. Either I can bind each texture, and draw 1/6th of the mesh, in a total of 6 submitted buffers. Or I can bind all 6 textures, submit a single buffer that contains all the data, and use texture unit indirection to choose which one I sample.

Why is one way slower than the other? You still have to do the same number of texture loads; that’s what slows down texture state changes.

In any case, here’s the problem. If I really have a character with 6 different textures over different parts of the model, they probably have different material/shader characteristics as well. At the very least, the lighting characteristics of skin vs. cloth demand different shaders.

The concept of picking a texture based on some arbiturary computation is a useful concept. I just don’t think that this particular use of it is a good idea. It’s, basically, grounded on the theory of very simple shaders.

Sure, some enterprising hardware implementor will probably implement this with a single texture fetcher, looped 16 times, but that hardware hopefully (probably) will not find traction in the market.

That’s what the 9700 does. Granted, it’s also got 8 pixel pipes…

Ibelieve that the 9700 has sufficient texture cache memory to not suffer from thrashing.

Regards cloth vs skin, depending on shading complexity that may or may not need different shaders. For example, you could have one shader that supports a gloss map with both specular exponent and specular amount per pixel.

Also, I’m under the impression that submitting 6 buffers in 6 calls is of higher overhead than one buffer in one call not so much because of texture state change, but because of a lot of state changes. And, in some driver models, also just pure call/marshaling overhead…

Anyway, I think it would be a useful addition. We already have similar functionality in vertex programs (which are not as performance constrained as fragment programs, though). Thus, I’m suggesting this as a next-gen feature (DX10 level hardware?) because it enables a whole new level of functionality. Just think of the noise functions you could implement :slight_smile:

Granted, you could do SOME of all this with a 3D texture, but 3D textures suffer from MIP mapping problems. Hence, my second feature request: being able to separately control the MIP and anisotropy behavior in the three axes :slight_smile: