PDA

View Full Version : High precision sum?



Zeno
07-10-2003, 11:35 AM
I'm working on a project where I need to be able to add the contributions from thousands of faint particles into the framebuffer.

Obviously, this causes clamping nightmares in 8 bit and there is no blend support on floating point buffers (why oh why?)

Using a floating point buffer hack like ping-ponging is not appropriate because of the large numbers of swaps it would take, and the horrible efficiency of copying all pixels to sum a few. Same thing for an accumulation buffer.

Is there another way I'm not considering? Some way to pack 32 bit precision into the four components of the framebuffer and get an add operation out of it?

Thanks in advance.

Adrian
07-10-2003, 11:42 AM
That question sounds familiar http://www.opengl.org/discussion_boards/ubb/smile.gif http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008303.html

Zeno
07-10-2003, 12:00 PM
That other thread is asking how to sum all the pixels in the framebuffer, which is a different question.

I want to be able to draw many particles (textured quads) into the framebuffer and get an accurate, unclamped sum of their colors at each pixel. I can do this by just rendering with the blend function (GL_SRC_ALPHA, GL_ONE), but the problem is you get either horrible artifacts or rapid saturation or both in an 8-bit-per-component buffer.


[This message has been edited by Zeno (edited 07-10-2003).]

Adrian
07-10-2003, 12:03 PM
Sorry, yes I read your post too fast.

roffe
07-10-2003, 12:04 PM
Originally posted by Zeno:
Using a floating point buffer hack like ping-ponging is not appropriate because of the large numbers of swaps it would take,

No don't swap, just change DrawBuffer.I have suggested this before: You can use two single buffered pbuffers and ping-pong between them, or between a double buffered pbuffer.The latter way is defined in the specs to cause undefined results(render_to_texture),but I haven't experienced any problems on NVIDIA hw at least.I know some people on this board consider the latter a disaster waiting to happen.

Zeno
07-10-2003, 12:14 PM
No don't swap, just change DrawBuffer.I have suggested this before: You can use two single buffered pbuffers and ping-pong between them

Perhaps "swap" was an unfortunate choice of words. The problem is this: for every small particle that I render into one floating point buffer (say it covers 10x10 pixels), I have to do an add operation in a fragment program for every pixel in that buffer to get it added into my accumulator. All the pixels have to be sloshed back and forth every time I render a particle. That is so inefficient it makes the idea not even worth considering.

Coriolis
07-10-2003, 12:33 PM
A simple software renderer may be your best option.

Zeno
07-10-2003, 12:41 PM
A simple software renderer may be your best option.

I was afraid someone might say that http://www.opengl.org/discussion_boards/ubb/frown.gif. Any idea what sort of performance I can expect to get software rendering tens of thousands of single-textured quads?

dorbie
07-10-2003, 02:04 PM
Can't you try data consolidation of some sort? Draw the feint objects to aggregate objects and accumumulate the aggregate objects.

When you draw the initial objects you draw them more solid (not as feint) to a texture, then you contribute the texture to the framebuffer modulating that with alpha so precision is preserved. At least you won't be using floats and you may be ably to afford the render to texture of copytex.

ehart
07-10-2003, 04:14 PM
Yeah, dorbie's idea is a good one. You can do it on a Radeon pretty cheaply by rendering the particles to the frame buffer with the alphas scaled by 2 or 4. (Assuming this won't cause extra clamping) Then after n particles throw it into the accum buffer with the proper factor. You could also do it with pbuffer etc, but the accum is a bit easier to set up.

-Evan

Zeno
07-10-2003, 06:59 PM
Thanks guys, I'll give it a try.

Husted
07-10-2003, 11:47 PM
Hi,

If you really want to avoid the ping-pong blending I suggest that you use a single pbuffer and make that buffer twice as wide. Use glViewport to control which area you are rendering to, and adjust your texture coordinates to read from the previous rendered area.

I know that the OpenGL specs states that reading from the current render target is undefined, but this technique seem to do the trick for me.

Edit:
Just realized that this won't do the trick in your case. Although this will allow you to get rid of the overhead of switching between buffers - you still have the overhead of copying the buffer.

-- Niels

[This message has been edited by Niels Husted Kjaer (edited 07-11-2003).]

nutball
07-11-2003, 02:04 AM
Originally posted by Zeno:
Using a floating point buffer hack like ping-ponging is not appropriate because of the large numbers of swaps it would take, and the horrible efficiency of copying all pixels to sum a few. Same thing for an accumulation buffer.


I'm working on a similar problem, I'm not sure why you need to copy all the pixels?

Can't you work out the bounding box of the quad in pixel co-ordinates, and just copy back the sub-image?

At the moment I'm using the combination of a pbuffer and a texture which shadows it pixel-for-pixel.

The quick tests I've had time to perform suggest that for small-ish quads (say 10x10 pixels), I might get of order 150,000 particles per second on a GFFX5900U. Not stunning (I'm trying to render 3+ million particles), but maybe I can find a way to eek some more performance out of it, and I'm not really after real-time anyway.

Oh, and in case any hardware people are reading: put blending for FP buffers in future GPUs, please. Even if it's just an add operation. Thanks.

Zeno
07-11-2003, 08:35 AM
Originally posted by nutball:
I'm working on a similar problem, I'm not sure why you need to copy all the pixels?

Can't you work out the bounding box of the quad in pixel co-ordinates, and just copy back the sub-image?


After thinking about it, you're right. So how does this sound:

1) Fill float buffer 1 with background.
2) Render particle into back-buffer.
3) copytexsubimage the screen space bounding rectangle from both float buffer and back buffer (you can do this right?).
4) add them in fragment program and place sum in float buffer.



Oh, and in case any hardware people are reading: put blending for FP buffers in future GPUs, please. Even if it's just an add operation. Thanks.

Careful, you'll upset Korval by making requests like that http://www.opengl.org/discussion_boards/ubb/wink.gif. http://www.opengl.org/discussion_boards/ubb/Forum7/HTML/000395.html



[This message has been edited by Zeno (edited 07-11-2003).]

paladinzzz
07-11-2003, 12:46 PM
There are also fixed point 32 bit color buffer, does the blending support that currently? It it does, that might be the solution.

cass
07-11-2003, 01:23 PM
You could also try simultaneously rendering to and texturing from the same surface.

See this example: http://cvs1.nvidia.com/DEMOS/OpenGL/src/fp_blend/

Beware! You'll have to avoid read-after-write hazards yourself. There's no interlock to protect you.

Cass

Zeno
07-11-2003, 03:01 PM
Originally posted by paladinzzz:
There are also fixed point 32 bit color buffer, does the blending support that currently? It it does, that might be the solution.

If I understand what you're saying correctly, the answer is, no, float buffers don't support blending. This is the source of my trouble http://www.opengl.org/discussion_boards/ubb/wink.gif


You could also try simultaneously rendering to and texturing from the same surface.

For some reason I would never have guessed that is possible. If, in the fragment program, I am only reading from the same pixel I end up writing to, isn't this functionally equivalent to blending support? Actually, it's better, because I would not be limited to reading from the same fragment I write to...what am I missing here?


[This message has been edited by Zeno (edited 07-11-2003).]

j
07-11-2003, 04:20 PM
"Actually, it's better, because I would not be limited to reading from the same fragment I write to...what am I missing here?"

Read-after-write or write-after-write hazards. Because several fragments can be in the pipeline at a given time, if you have two fragments writing to the same pixel, bad things can happen. For example...

Fragment 1 reads framebuffer value
Fragment 2 reads framebuffer value
Fragment 1 adds to framebuffer value
Fragment 2 adds to old framebuffer value
Fragment 1 writes to framebuffer
Fragment 2 writes wrong value to framebuffer
The world ends in a sudden burst of evil energy....

You can try and avoid this by setting particles up to that they don't write to exactly the same fragment twice in a very short time span.

j

tayo
07-11-2003, 10:31 PM
i have already done something like that for global illumination during my PhD.
I render 100 000 to million of particles on the screen.
To avoid the problem of accuracy, i have played with the equations so i can represent the color(power) of the particles with integer value.
Then, to avoid overflow, i accumulate the buffer in a higher precision every M-th particles, using the horribly slow glReadPixels :-) .
but in fact, it causes only a 15% slowdown, or something like that.
the main bottleneck of the method is that you render a LOT of overlapping particles, and graphics hardware does not seem to like that :-), and also each particle is represented by a quad so you have a lot of geometric data to upload.
so i am not sure that FP blending will solve all the problems of this kind of method (and you need 4x more bandwidth!).

Zeno
07-14-2003, 01:48 PM
Originally posted by cass:

You could also try simultaneously rendering to and texturing from the same surface.

Cass


Cass - I just gave this a try...no success. I'm using Mark Harris' pbuffer class. I have the floating point buffer enabled as a render target and then I bind it as a texture. Unfortunately, anything I try to read from it turns out black. I have confirmed that I can render to this float buffer and read from it as a texture, but I can't seem to do both at the same time. Any insight about what I might be doing wrong?

Xmas
07-14-2003, 11:11 PM
Maybe you could use a n-pass algorithm (where n is the maximum number of overlapping particles). In combination with stenciling so you always render one layer per pass, and an occlusion query extension to know which particles have been rendered completely.

1. clear the stencil buffer to zero, set the stencil function to EQUAL with reference value 0. Stencil operation should be INCR for stencil pass.
2. render all particles. Because of stenciling, you only get one layer.
3. increment the stencil reference value.
4. swap render target and texture
5. render all particles with occlusion query (maybe for particle groups)
6. mark all particles that had zero pixels rendered as done, don't use them in the next pass.
7. go to 3. until all particles are done.