# Orthogonal Illumination Mapping for Generalized Per-Pixel Lighting

Copyright 2000 by Cass Everitt. Commercial publication in written, electronic, or other forms without expressed written permission is prohibited. Electronic redistribution for educational or private use is permitted.

###### Preface

In the first posting of this work, I described a technique for diffuse bump mapping using only core OpenGL functionality. The technique was limited to diffuse illumination by a directional light source, but after some helpful suggestions from Mark Kilgard, the technique was extended to support diffuse and specular illumination, multiple light sources, and local-viewer. After implementing these generalizations to the OIM technique, I provided an update with binary-only demo programs illustrating the enhancements. Since then, I have gotten my thesis finished and cleaned up the demo programs - which are available here.

What I have not gotten to (until now) is writing up a short-form description of how OIM has been generalized to support all these new features. Refer to my thesis or the demo source for more detailed and specific descriptions.

###### Big Picture

OIM is actually only a technique for performing per-pixel dot products through multiple rendering passes. When I first began developing this work, it was the central focus. Optimizations of the technique through multitexture and various texture environment and blending extensions have been demonstrated to improve the quality and speed of the calculation - to make better implementations of the per-pixel dot product "black box".

The implied underlying need for the per-pixel dot product is in the implementation of per-pixel shading equations. In fact, most of what I have called OIM "extensions" or "generalizations" are really just the use of OIM in the implementation of a particularly important variety of shading equation - illumination equations. As demonstrated by RenderMan shaders, the use of complex, per-pixel math can be applied *far* beyond simple illumination.

John Carmack has made the point before (in a Slashdot interview) that we used to be able to rely on the model of SGI as an indicator of how PC graphics hardware should develop and what the "right" directions were, but now PC accelerators have outgrown that model as a reference. As far as per-pixel shading goes, however, the RenderMan Shading Language offers a good model for the kind of operations needed to perform high quality rendering.

The dot product is just one operation in the illumination equation, but it is arguably the trickiest one to implement in unextended OpenGL. The Big Picture is that per-pixel shading is the future of hardware-accelerated real-time photorealistic rendering. Don't think of what's presented here as "a technique" to be used. Think of it as an approach for evaluating complex expressions at every pixel. It is a very malleable approach - to be altered and tinkered with in the same way you would tinker with a RenderMan shader.

###### Multipass Dot Product

The challenge of per-pixel dot products is somewhat less daunting if we use the time- honored, imperialistic approach of divide-and-conquer. The dot product is defined as

<**a**,**b**> = *a*_{x}*b*_{x} + *a*_{y}*b*_{y} + *a*_{z}*b*_{z}

which is simply a sum of products. The catch is that the products are of signed scalars. OpenGL computes products per-fragment, but only on unsigned quantities. We side-step this problem by rewriting the product of signed scalars as the signed sum of products of unsigned scalars. [I have tried (unsuccessfully) to word the previous sentence more clearly.] If we consider *f* and *g* to be signed scalar functions, we can write their product as

*f g* = *f ^{+}g^{+} + f ^{-}g^{-} - f ^{+}g^{-} - f ^{-}g^{+}*

where

*f ^{+}* = clamp(

*f*, 0, infinity),

*g*= clamp(

^{+}*g*, 0, infinity)

and

*f ^{-}* = clamp(-

*f*, 0, infinity),

*g*= clamp(-

^{-}*g*, 0, infinity)

The image below illustrates this decomposition graphically.

This formulation is expensive (at four passes), but it is numerically stable for clamped, unsigned arithmetic. If subtractive blending is available, this equation can be implemented directly. Without subtractive blending, an approximation must be used. I chose to use a blend function of (ZERO, ONE_MINUS_SRC_COLOR). This approximates the expression

*a* - *b*

as

*a*(1-*b*) = *a* - *ab*

This is not a terribly good approximation on an unbounded range, but since *a* and *b* are in [0,1], it's not so bad. The following image illustrates the error of this approximation as a bivariate plot defined as

err(*a*, *b*) = (*a* - *ab*) - clamp(*a* - *b*, 0, 1)

For some numerically sensitive uses of the dot product, this much error may be problematic, but for simple illumination shaders, it has proven to be satisfactory. This is because the error is greatest when *a* is approximately equal to *b*, and generally in the range [0.3, 0.7]. That is, some regions that should be black or nearly black appear somewhat lighter than they should. The images below contrast actual and approximated subtractive blending.

real subtractive blending | approximated subtractive blending |

Extending the product of signed scalars to dot products is straightforward. It is simply the sum of three scalar products. This is now a 12-pass technique. The only structural change to the passes is that all subtractive passes are performed at the end to avoid underflow clamping errors. Allen Akin cautioned me to make a point about overflow concerns as well. Performing all additive passes first *could* produce overflow when re-ordering the passes of arbitrary scalar products, but dot products are special - particularly when the vector operands have unit length or less. The dot product of normalized vectors is, by definition, bounded to the range [-1,1]. Further, when <**a**,**b**>==1, all the scalar products must be positive because *a*_{x}== *b*_{x}, *a*_{y}==*b*_{y}, and *a*_{z}==*b*_{z}.

I will refrain from showing a listing of the passes and their ordering, the first posting of this work, my thesis, and the demo code all do a better job of showing all relevant state configuration. Suffice it to say that texture modulation is used to perform the scalar products, and blending is used for addition and subtraction. There are some artifacts that result from per-vertex interpolation of the light or half-angle vector, but they can be managed somewhat with additional geometry.

###### Optimization

The multipass dot product is pretty expensive at 12 passes. Luckily, there are ways to reduce that number substantially. There are two classes of pass reduction for this technique. One class is had by working in the problem domain to eliminate the need for a particular half-component. When a given half-component of a normal or light vector is known to be zero over an entire object, the passes it is used in can be eliminated. This optimization is useful for directional light sources, tangent space normal mapping, and in height field rendering. The other class of optimization is through use multitexture and various combiner extensions to combine two or more passes into a single pass. Using these two pass-collapsing techniques together can have a big payoff. For example, using a directional light source to illuminate a height field, the 12-pass dot product can be performed in five passes. Further optimization using ARB_multitexture and NV_texture_env_combine4 reduces it to only *three* passes.

###### Application

When I began this research, implementing the dot product operation was only part of the problem. The ultimate goal was per-pixel illumination. For basic diffuse illumination by a single, directional light source, the dot product is 99% of the problem. However, I was spurred onward by Mark Kilgard to implement a more complete illumination equation using this unextended dot product technique. He explained the handy trick of hiding scalar computations in destination alpha while protecting the RGB channels with a color mask. This allowed me to build up any number of dot products and associated scalar terms through multiple passes, and finally dump those contents into the color buffer with a blend func of (DST_ALPHA, ONE) in a final pass.

The results are quite nice, although the number of rendering passes required to implement the full illumination equation in unextended OpenGL can easily reach 26 passes per light source.

###### Conclusion

There are obvious performance limitations in the 12-pass dot product. It requires a lot of texture state changing and is wasteful of texture memory. Blending must be enabled the whole time, and that taxes memory bandwidth. These issues are not of great concern, because the multipass dot product is rapidly becoming unnecessary. Most new hardware will do dot products in a single pass in the combiners. Four or five years ago it might have been worth debating the issue.

The limitations that bear more careful study - the ones that are relevant today - are the those of shader implementation. When I started OIM development, I did not know the trick of using alpha for scalar accumulation. It enabled me to extend a shader that could calculate diffuse illumination from a single, directional light source into a shader that can calculate diffuse and specular illumination from multiple directional and local light sources. This trick effectively solved the problems of implementing the illumination equation, but it is not an adequate solution for general-purpose shader implementation. General-purpose shaders are likely to require more than one intermediate, intermediates that require more range or precision than the framebuffer supports, and multi-component element types.

John Carmack's finger message of 4/29 talks about a number of issues that he is facing now regarding shader implementation. Particular limitations he is concerned about are precision and range of the entire graphics pipe. For example, using render-to-texture you can build up arbitrarily complex "shade trees" (as he calls them), but each pass through the frame buffer clamps your range and degrades your precision relative to what is available in the combiners. In order to implement high quality shaders that are insensitive to the number of passes required to implement them, precision must be preserved over the pixel-feedback process. It is yet to be determined how this precision will be preserved, but it is clear that multiple passes will always be necessary if shaders are allowed to be arbitrarily complex.

It would be a massive understatement to say that things are moving rapidly in the computer graphics industry, and these issues will certainly be resolved. It is an exciting time to be a hardcore OpenGL programmer!

###### Egregious Omission

I have failed to point out that Andreas Brinck of the Chalmers University of Technology suggested the generalized 12-pass dot product on comp.graphics.api.opengl shortly after my first posting of this work which described only a 6-pass approach. I did not use the same per-polygon logic that he describes here, but he was the first to point out that my 6-pass approach was just a special case of the 12-pass one. My sincere apologies for not recognizing his contribution. This omission will be corrected in my thesis as well.