the performance of blend operation

I found the blend operation of my Radeon9800 is poor,and I heard that blend operation can slow down the render speed,is it true?how about the performance of blend operation in modern GPU?

Originally posted by pango:
I found the blend operation of my Radeon9800 is poor,and I heard that blend operation can slow down the render speed,is it true?
Sure it is. Everything takes some time anyway. Blend op is especially painful for GPUs because they usually do:
Compute pixel, put in framebuffer,
Compute pixel, put in framebuffer…

With blending:
Compute pixel, ask framebuffer what color is at this location, wait for data to get here, compute blending, put in framebuffer. It takes much longer.

Originally posted by pango:
how about the performance of blend operation in modern GPU?
I don’t know of anything which performs much differently from what you’re saying. I wonder what ‘modern’ stands for however.

EDIT: next time consider posting this in beginner’s forum.

[This message has been edited by Obli (edited 02-24-2004).]

Well, a Radeon 9800 is ne most modern GPU you can get at them moment, so…

However blending with multiple passes can sometime be faster than using a lot of texture units and a complex pixelshader.

Jan.

To Jan2000:
what’s the “multiple pass”?Where can I get the sample code?The performance of blend is important for me,please help.

You can use a lot of textures in ONE pass, which is called “multitexturing”,
or you can use only one or at least only a few textures, calculate an intermediate result, write it to the framebuffer and then use another pass and blend that with the stuff in the framebuffer to get the final result. This is called “multipass”.

It´s nothing special, you simply render the same geometry over and over again, but each time you do another step of your computations.

Especially on Radeons this is often faster, since the Radeons have a very good early z-rejection. This means, that when you do your second pass over the geometry, a lot of pixels aren´t even considered to be processed, since the GPU already knows, that they won´t be visible.
Opposed to that, a complex pass, which does everything at once would calculate a lot of pixels, which are not visible in the end, giving the GPU a lot of needless work to do.

Jan.

However, that being said, there are a lot of things that shaders can do that you can’t (efficiently) do with multipass. Even when you do get floating-point blending, you can’t really expect passing that much data back and forth to memory is going to be fast.

If you have a complex shader and want to make sure that only the visible pixels are being rendered, then draw a “blank” pass over all of your geometry where you only write to the z-buffer. Then, draw a real pass where you actually write colors to the frame buffer.

In any case, I think this has gotten somewhat OT of the original post.

In what way did you find the blending performance of the 9800 poor? How much were you using blending? What blend operation was it?

The card probably doesn’t work on a single pixel at a time.

Typically, I’d assume that the card has a memory fetch/write unit which works a lot like a cache line in a CPU. If you want to touch ANY pixel within the “cache line” (DRAM page sized, perhaps?) then all of it is slurped into the controller. Then, a bunch of changes can be made in “near memory” to nearby pixels, and at the end, when no more changes are made, the memory controller writes out the result.

I’d go so far as to say that the hardware might have TWO copies of memory for each cache line / framebuffer block; one for what’s arriving from RAM, and one for what’s being written (with dirty bits and blend functions). That way, it doesn’t have to wait for memory reads at all, because that’ll be done by the time it’s time to evict this “cache line” back to DRAM.

This is all a guess, but my take-away is that blending is free on all modern hardware; it’s the fact that you’re touching pixels which matters. Measurements seem to agree. This also means that writing sparsely using alpha testing is still pretty expensive, because any pixel written with a RAM page pays the cost of the entire RAM page to memory.

This is all a guess, but my take-away is that blending is free on all modern hardware; it’s the fact that you’re touching pixels which matters.

Well, yes, the actual color blending operation itself is free. But by turning on blending, you’re asking for a read-modify-write operation, which is never free.

Writing to memory should never provoke a read operation. So, unless alpha-test logic is somehow bound to blending logic, there is no reason for an alpha-test to be substantially slower than regular operations.