It sounds to me as if Golem has a problem something like:
He’s writing something like an effects editor for online compositing (say, for a TV station).
He uses a particle system.
Particles may be spawned by some real-time input (moving cursor around, and whatnot)
Thus, he can’t pre-view effects and run canned, proven-working effects that some artist already tweaked until it was done.
If you don’t have any Z testing, and use specific hardware that you know the statistics of (i e, you specify “always a Radeon X800 XT” or whatever), then using occlusion query is actually probably going to be a pretty decent measurement, for a limited problem such as stated above. You might want to add the total size of all textures bound during the frame, too, for an additional input to the estimate.
Another measurement might be to just add up all visible particles, weighted by one over distance from camera; this is similar to calculating area filled on the CPU, but a little lighter on the CPU cycle consumption.
If you cannot specify the hardware in use, then you have another problem: even if you get a good measurement of pixels rendered, you don’t know how much would be too much for the card in use at runtime. Thus, you’d have to start by doing some kind of profiling, so you know what to shoot for.
One additional caveat: reading back a query result means that you serialize the GPU up to the point of the query. Thus, you may get reduced CPU/GPU parallelism by doing it that way. That may be OK, depending on the application. Summing up all particles, weighted by one over distance to camera, causes no serialization, and if you’re GPU limited, the CPU may be there for “free”, so it’s worth considering.