Multitexturing substitute for massive blending

I’m trying to render a model that has 10+ layered textures. At the moment, my implementation uses blending exclusively to accomplish this. Blending in this way is very powerful since the user who creates the model can specify a different combination of blending parameters for each texture layer (by default it is additive as (GL_ONE, GL_ONE), but could be any of the legal blending methods). The resulting speed, of course, from all this blending is not so good, so I am trying to incorporate multitexturing into the procedure. I am finding though that the color mixing process is quite different. I can simulate additive blending with the texture operation GL_ADD, but many of the more complex blending combinations do not seem to have a multitexturing equivalent. Are there any good ways to deal with this?

I ran into this issue awhile back. I came to the conclusion, that it just isn’t possible to do this in a general way for N textures, unless you have N texture units, and something like NV_texture_env_combine4 available. And then it still isn’t fully general as the texture units do not have access to the original destination pixels.
If there was some way to set one of the combiner sources to the framebuffer, then it would be easily possible. Of course I suppose one could read the framebuffer in the desired region and use it as a texture source. Though I bet the texture coordinates would in general by a pain to deal with in this case, and I bet it wouldn’t be all that fast or be worth the added complexity in general.

[This message has been edited by DFrey (edited 08-07-2001).]

So maybe I’ll have to restrict users to blending operations that work identically when multitextured. I’ve already pointed out the equivalency of (GL_ONE,GL_ONE) and GL_ADD, but what others are there? I’m most interested in the effects of GL_MODULATE and GL_MULT.

Look into the GL_EXT_texture_env_combine extension (resp. GL_ARB_texture_env_combine, not sure if it’s an ARB yet). It gives you the possibility to flexibly combine the alpha and color component results of different texture units.

Extending texture_env_combine is GL_NV_texture_env_combine4, look into that one as well.
texture_env_combine should be available on most any hardware that has more than one texture unit. With 2 texture units (TNT2, GeForce) you should so be able to collapse your 10 passes into 5, with 3 units (Radeon) into 4, with 4 units (GeForce3) into 3.
Also, to increase performance on multiple rendering passes, look into using Compiled Vertex Arrays, Display Lists or Vertex Array Range (the latter is AFAIK available on GeForce only) to avoid having to send your geometry from host to GPU and to T&L for every pass.

Nope, you can not in general collapse 10 passes into 5 using the GL_NV_texture_env_combine4 extension if you only have 2 units. Why? Because again, the combiner’s do not have access to the framebuffer (not surprising since the combiner’s are deeper in the pipeline). Only a subset of the possible pure multipass combinations can be accurately reproduced with the combiners. Try writing out the equations for a non-trivial 10 pass algorithm that uses destination blending and then you’ll see why it can’t be collapsed without making compromises. Also if different depth, alpha, or stencil functions are needed for one or more the passes but not all, then this instantly breaks the multipass collapse option.

Thanks for the help so far! Things are becomming clearer for me, but just need a little more help on this one.

Is it documented anywhere exactly which subset of blending operations has a multitexturing eqivalent?

At the moment, it looks like I should just give up on multi because it isn’t flexible enough. This means that I’ll have to find a way to speed up blending. Right now my LOD engine just cranks out triangles and submits them one at a time for rendering. Since the model has the same shape for each pass, I think using a display list would help the speed greatly as Dodger pointed out. The only problem is that, while the vertices are always the same on each pass, the texture coordinates are not. AFAIK, the texture coordinates must be submitted along with each vertex. If the tex coords change, the list will have to be rebuilt.

Now it was also suggested that vertex arrays could be used. If I submit texture coords as arrays, it’s actually a pointer to the data (right?) and changing the data in the arrays will adjust the information within the display list. Will this work, or does submitting things to a list store a second copy of the original values (ignoring the array pointer)?

[This message has been edited by Decimal Dave (edited 08-08-2001).]

If your texture coordinates change, then display lists are not suitable. Also, when compiling a vertex array in a display list, only the data in the array at the time the list is compiled will be used. Changes to the array will not be reflected in the display list, unless you rebuild the display list. In which case, you are better off not using a display list to begin with. However, compiled vertex arrays can help substantially with complex multipass algorithms at least on non-tnl hardware.

The subset of available multitexture simplifications can be readily obtained, by writing out the possible combinations available from the combiners.

Two common simplifications for two pass algorithms are:

Pass 1: blend one,zero
Pass 2: blend one,one

can be replaced with

blending disabled
Unit1: GL_REPLACE (or GL_MODULATE if color modulation is needed)
Unit2: GL_ADD

Pass 1: blend one, zero
Pass 2: blend src_alpha, 1-src_alpha

can be replaced with

blending disabled
Unit1: GL_REPLACE (or GL_MODULATE if color modulation is needed)
Unit2: GL_COMBINE_EXT (set up combiner to use interpolating combiner)

I often would write the algorithm symbolically and use algebra to pick out any possible combinations that could be done using multitexture.
E.g.

Pass 1: blend one,one
Pass 2: blend one,one

p1=s1+d1
p2=s2+d2=s2+p1=s2+s1+d1=(s1+s2)+d1=s’+d1

// make s’
Unit 1:GL_REPLACE
Unit 2:GL_ADD
// add it to d1
blend one,one

[This message has been edited by DFrey (edited 08-08-2001).]

Ok; that helps a lot! It looks like blending with compiled vertex arrays are the way to go, except in a few cases where multitexturing can be substituted for a blend.

I found an interesting document showing benchmarks for using compiled arrays.< http://herakles.zcu.cz/~jdobry/opengl/opengl_maximum_performance.html >. Their results suggest that the fastest frame rates can be achieved with arrays of triangle strips. It would be nice to try this out, but I’ve been staying away from triangle strips because they introduce a weird sort of asymmetry to the look of things. The triangle ordering from my LOD engine atempts to fix that problem by alternating which opposing pair of vertices in a quad become an edge. Hence, a strip generated by my system would look like |/||/|| wheras an OpenGL triangle strip looks like |||||. Is there any way to accomodate this, or am I resigned to working with individual triangles?

[This message has been edited by Decimal Dave (edited 08-08-2001).]

I got the fastest performance with trianglestrips not in a array… about 10~mil triangles

With the triangle strips in arrays i got about 6~mil