High precision sum?

I’m working on a project where I need to be able to add the contributions from thousands of faint particles into the framebuffer.

Obviously, this causes clamping nightmares in 8 bit and there is no blend support on floating point buffers (why oh why?)

Using a floating point buffer hack like ping-ponging is not appropriate because of the large numbers of swaps it would take, and the horrible efficiency of copying all pixels to sum a few. Same thing for an accumulation buffer.

Is there another way I’m not considering? Some way to pack 32 bit precision into the four components of the framebuffer and get an add operation out of it?

Thanks in advance.

That question sounds familiar http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008303.html

That other thread is asking how to sum all the pixels in the framebuffer, which is a different question.

I want to be able to draw many particles (textured quads) into the framebuffer and get an accurate, unclamped sum of their colors at each pixel. I can do this by just rendering with the blend function (GL_SRC_ALPHA, GL_ONE), but the problem is you get either horrible artifacts or rapid saturation or both in an 8-bit-per-component buffer.

[This message has been edited by Zeno (edited 07-10-2003).]

Sorry, yes I read your post too fast.

Originally posted by Zeno:
Using a floating point buffer hack like ping-ponging is not appropriate because of the large numbers of swaps it would take,

No don’t swap, just change DrawBuffer.I have suggested this before: You can use two single buffered pbuffers and ping-pong between them, or between a double buffered pbuffer.The latter way is defined in the specs to cause undefined results(render_to_texture),but I haven’t experienced any problems on NVIDIA hw at least.I know some people on this board consider the latter a disaster waiting to happen.

No don’t swap, just change DrawBuffer.I have suggested this before: You can use two single buffered pbuffers and ping-pong between them

Perhaps “swap” was an unfortunate choice of words. The problem is this: for every small particle that I render into one floating point buffer (say it covers 10x10 pixels), I have to do an add operation in a fragment program for every pixel in that buffer to get it added into my accumulator. All the pixels have to be sloshed back and forth every time I render a particle. That is so inefficient it makes the idea not even worth considering.

A simple software renderer may be your best option.

A simple software renderer may be your best option.

I was afraid someone might say that . Any idea what sort of performance I can expect to get software rendering tens of thousands of single-textured quads?

Can’t you try data consolidation of some sort? Draw the feint objects to aggregate objects and accumumulate the aggregate objects.

When you draw the initial objects you draw them more solid (not as feint) to a texture, then you contribute the texture to the framebuffer modulating that with alpha so precision is preserved. At least you won’t be using floats and you may be ably to afford the render to texture of copytex.

Yeah, dorbie’s idea is a good one. You can do it on a Radeon pretty cheaply by rendering the particles to the frame buffer with the alphas scaled by 2 or 4. (Assuming this won’t cause extra clamping) Then after n particles throw it into the accum buffer with the proper factor. You could also do it with pbuffer etc, but the accum is a bit easier to set up.

-Evan

Thanks guys, I’ll give it a try.

Hi,

If you really want to avoid the ping-pong blending I suggest that you use a single pbuffer and make that buffer twice as wide. Use glViewport to control which area you are rendering to, and adjust your texture coordinates to read from the previous rendered area.

I know that the OpenGL specs states that reading from the current render target is undefined, but this technique seem to do the trick for me.

Edit:
Just realized that this won’t do the trick in your case. Although this will allow you to get rid of the overhead of switching between buffers - you still have the overhead of copying the buffer.

– Niels

[This message has been edited by Niels Husted Kjaer (edited 07-11-2003).]

Originally posted by Zeno:
Using a floating point buffer hack like ping-ponging is not appropriate because of the large numbers of swaps it would take, and the horrible efficiency of copying all pixels to sum a few. Same thing for an accumulation buffer.

I’m working on a similar problem, I’m not sure why you need to copy all the pixels?

Can’t you work out the bounding box of the quad in pixel co-ordinates, and just copy back the sub-image?

At the moment I’m using the combination of a pbuffer and a texture which shadows it pixel-for-pixel.

The quick tests I’ve had time to perform suggest that for small-ish quads (say 10x10 pixels), I might get of order 150,000 particles per second on a GFFX5900U. Not stunning (I’m trying to render 3+ million particles), but maybe I can find a way to eek some more performance out of it, and I’m not really after real-time anyway.

Oh, and in case any hardware people are reading: put blending for FP buffers in future GPUs, please. Even if it’s just an add operation. Thanks.

Originally posted by nutball:
I’m working on a similar problem, I’m not sure why you need to copy all the pixels?

Can’t you work out the bounding box of the quad in pixel co-ordinates, and just copy back the sub-image?

After thinking about it, you’re right. So how does this sound:

  1. Fill float buffer 1 with background.
  2. Render particle into back-buffer.
  3. copytexsubimage the screen space bounding rectangle from both float buffer and back buffer (you can do this right?).
  4. add them in fragment program and place sum in float buffer.

Oh, and in case any hardware people are reading: put blending for FP buffers in future GPUs, please. Even if it’s just an add operation. Thanks.

Careful, you’ll upset Korval by making requests like that . http://www.opengl.org/discussion_boards/ubb/Forum7/HTML/000395.html

[This message has been edited by Zeno (edited 07-11-2003).]

There are also fixed point 32 bit color buffer, does the blending support that currently? It it does, that might be the solution.

You could also try simultaneously rendering to and texturing from the same surface.

See this example: http://cvs1.nvidia.com/DEMOS/OpenGL/src/fp_blend/

Beware! You’ll have to avoid read-after-write hazards yourself. There’s no interlock to protect you.

Cass

Originally posted by paladinzzz:
There are also fixed point 32 bit color buffer, does the blending support that currently? It it does, that might be the solution.

If I understand what you’re saying correctly, the answer is, no, float buffers don’t support blending. This is the source of my trouble

You could also try simultaneously rendering to and texturing from the same surface.

For some reason I would never have guessed that is possible. If, in the fragment program, I am only reading from the same pixel I end up writing to, isn’t this functionally equivalent to blending support? Actually, it’s better, because I would not be limited to reading from the same fragment I write to…what am I missing here?

[This message has been edited by Zeno (edited 07-11-2003).]

“Actually, it’s better, because I would not be limited to reading from the same fragment I write to…what am I missing here?”

Read-after-write or write-after-write hazards. Because several fragments can be in the pipeline at a given time, if you have two fragments writing to the same pixel, bad things can happen. For example…

Fragment 1 reads framebuffer value
Fragment 2 reads framebuffer value
Fragment 1 adds to framebuffer value
Fragment 2 adds to old framebuffer value
Fragment 1 writes to framebuffer
Fragment 2 writes wrong value to framebuffer
The world ends in a sudden burst of evil energy…

You can try and avoid this by setting particles up to that they don’t write to exactly the same fragment twice in a very short time span.

j

i have already done something like that for global illumination during my PhD.
I render 100 000 to million of particles on the screen.
To avoid the problem of accuracy, i have played with the equations so i can represent the color(power) of the particles with integer value.
Then, to avoid overflow, i accumulate the buffer in a higher precision every M-th particles, using the horribly slow glReadPixels :slight_smile: .
but in fact, it causes only a 15% slowdown, or something like that.
the main bottleneck of the method is that you render a LOT of overlapping particles, and graphics hardware does not seem to like that :-), and also each particle is represented by a quad so you have a lot of geometric data to upload.
so i am not sure that FP blending will solve all the problems of this kind of method (and you need 4x more bandwidth!).

Originally posted by cass:
[b]
You could also try simultaneously rendering to and texturing from the same surface.

Cass[/b]

Cass - I just gave this a try…no success. I’m using Mark Harris’ pbuffer class. I have the floating point buffer enabled as a render target and then I bind it as a texture. Unfortunately, anything I try to read from it turns out black. I have confirmed that I can render to this float buffer and read from it as a texture, but I can’t seem to do both at the same time. Any insight about what I might be doing wrong?