ATI_pixel_format_float very slow... normal ?

When I use a standard format, an iteration of my program is made in a fraction of second, but if I use ATI_pixel_format_float with a 64 or 128 bits color buffer, the same iteration takes about one minute to complete !
Is this difference normal ? I know that pixel format float is slower, but I’m surprise to see such a difference in performance !

Hello!
those times mean that the rendering process is done in software mode.
glGetString(GL_RENDERER) should tell you that.

glGetString(GL_RENDERER) returns “Radeon 9700 Pro x86/SSE2”, so I guess it isn’t done in software mode…

Originally posted by Acheum:
When I use a standard format, an iteration of my program is made in a fraction of second, but if I use ATI_pixel_format_float with a 64 or 128 bits color buffer, the same iteration takes about one minute to complete !
Is this difference normal ? I know that pixel format float is slower, but I’m surprise to see such a difference in performance !

Are you using it on the main frame buffer or a pbuffer? It’s only suppose to work on pbuffers, so it’s probably falling back to a software raster because the hardware can’t support it.

Check the attributes of your pixel format, and make sure you don’t have too many aux buffers or MSAA enabled, etc.

You shouldn’t be able to select a float format for a window.

The easiest way to get performance like you describe is to use either alpha blending or alpha test on a float format buffer, because that would cause the driver to fall back to SW.

-Evan

But I render to a pbuffer !
I tried to render to the main frame buffer, but I couldn’t get a valid pixel format, and I saw it isn’t supported yet.
But I use alpha blending in my program, I didn’t kwow it causes the driver to go to SW, so the problem must be there…

Edit : yes, when I disactivate the blending, the performances come back to normal. But I need to use the blending ! grrr…

[This message has been edited by Acheum (edited 05-14-2003).]

Yup, the “no blending in float mode” is a big bummer. The solution I heard about is to render into a texture and read that texture in the next frame.

Has anybody tried that? Does it work? How fast is it?

Originally posted by dirk:

Has anybody tried that? Does it work? How fast is it?

Kind of. If you have a double buffered pbuffer you can render to back, bind back to texture, switch draw buffer to front and render again. No need to swap. I tried this implementing functionality like textureA = textureA + textureB. Worked great.

But what can I make this operation textureA = textureA + textureB ?
What do I rendre to front buffer ? (I guess it isn’t the same that in the back, it wouldn’t make blending). How do I use the new texture rendered in back buffer ?
It could work nice, but I nedd more explanations on the method…

Originally posted by Acheum:
But what can I make this operation
textureA = textureA + textureB ?

This was only an example. With fragment programs and doing multiple passes you can basically implement any fp blending operation.

[b]

What do I rendre to front buffer ? (I guess it isn’t the same that in the back, it wouldn’t make blending). How do I use the new texture rendered in back buffer ?
[/b]

I’ll give you a better example.Let’s say you wanted to perform a modulate blending operation between incoming fragment and the framebuffer. You would allocate a double buffered pbuffer with fp support. Clear both front and back buffer.Render your scene to the back buffer of the pbuffer. Depending on how many fragments your modulation operation will generate you need to copy the back pbuffer to the front.If you know the operation works on the entire buffer no need to copy, it will be overwritten anyway. Now, bind a texture(fp) to the back buffer. Switch draw buffer to front buffer(glDrawBuffer).Use the fragment program that implements your blending operation on the bound texture and whatever input fragment you generate. The only non-trivial thing here is to get the texture coordinates for your newly generated fragments in the bound texture(framebuffer).In NV_fragment_program, the WPOS register could help. Maybe arb_fp has something similiar.

For practical usage of pbuffers/Render To Texture look up specs in the ogl-ext-registry and sample code across the web. I have some demos on my site but they aren’t that good for learning this stuff.

“In NV_fragment_program, the WPOS register could help. Maybe arb_fp has something similiar.”

The WPOS equivalent in ARB_fragment_program is fragment.position.

While I am not against people experimenting with fp blending in this manner. I need to provide a strong caution. This is undefined as per the spec and is typically undefined on current HW.

According to the spec, all of the buffers of a renderable texture provide undefined results while any of them is being rendered to. On most modern HW, this only shows up for reading and writing the same buffer simulatneously.

The texel and pixels do not have any cache coherency, so if you attempt to read and write buffer A simultaneously the results are undefined. As a practical matter you will get results you did not intend. Worse yet, some of the things that impact this could even change from driver release to driver release, so a particular set of operations wouldn’t show artifacts on one, but would on another one.

I think fp blending is cool, so I won’t say don’t play around with it. On the other hand, it is so filled with pot holes, I would ask that anyone doing so please not release code that does it or actively tell others how to do it. It is a bug waiting to happen.

-Evan

Originally posted by ehart:
According to the spec, all of the buffers of a renderable texture provide undefined results while any of them is being rendered to.

The technique I described works just as well with 2 single buffered pbuffers, which does not violate any specs. You have to context switch between the passes though. Since the buffers have the same pixel format you can use the same rendering context.

Hm, ok. So it works, and it’s even legal.

The problem I have is that I would need to switch for every single polygon (think particle system). I need to accumulate a large number (millions) of very small contributions to create a final image. I won’t have millions of contributions per pixel, but every single one is very small.

I was hoping to be able to just render into the texture I’m reading from, but reading Evan’s post doesn’t encourage me to do that.

Is there any alternative approach? If not, the value of fp buffers seems to be significantly diminished, as AFAICS one main advantage is the ability to accumulate more without quantization.

If not, the value of fp buffers seems to be significantly diminished, as AFAICS one main advantage is the ability to accumulate more without quantization.

There are far more useful advantages to floating-point buffers than simply being able to add very small numbers together.

For example, HDR (saturation, and color values > 1.0f). Also, if you do shadow mapping with shaders rather than hardware extensions, floating-point luminance textures are very useful; even better than ARB_depth_texture in terms of precision.