I just tried to speed up my app by using glTexEnvi (…, GL_REPLACE) instead of GL_MODULATE.
However there was no speedup at all (on a Geforce 4200).
So can i assure that the modulation of the current color and a the texture is free on any card? If yes, i would allow this “feature” in my engine, but if it depends on the hardware i don´t think i will use it.
I think you can assume that most fragment operations “in the pipe” are more or less for free (they are deeply pipelined and are designed to work without stalling the pipeline). I even think things like multi texturing and register combiners are nearly free (if you do not count the overhead for reading textures from texture memory).
What costs is usually memory accesses, such as:
Writing pixels to the framebuffer
Z buffer tests
Stencil buffer tests
Blending
Texturing (reading textures from texture memory)
Plus of course the usual:
State changes
Gets/Reads of various kinds that stalls the pipeline
Expensive vertex lighting
Long vertex or fragment programs
Non HW-accelerated operations (e.g. accumulator buffer, for most cards)