Multisample AntiAliasing and number of samples

devdept · March 23, 2009, 2:37am

Hi All,

Generally, should we expect a considerable drop of FPS changing the number of samples from 2 to 4, 8 and 16 ?

Thanks,

Alberto

matchStickMan · March 24, 2009, 2:50am

yes i believe you should expect a drop in fps.

I’m no expert but from what I understood, a multisampling rate of 2 means you are grabbing two neighbors of a fragment and smoothing.
Same goes for multisampling rate of 4, 8 etc…

So the more samples you use, the more intensive the smoothing operation is which inevitably leads to a drop in fps.

In our software, we had to settle for a multisampling rate of 4 although we initially went for 16.

I hope that helps.

matchStickMan · March 24, 2009, 2:50am

yes i believe you should expect a drop in fps.

I’m no expert but from what I understood, a multisampling rate of 2 means you are grabbing two neighbors of a fragment and smoothing.
Same goes for multisampling rate of 4, 8 etc…

So the more samples you use, the more intensive the smoothing operation is which inevitably leads to a drop in fps.

In our software, we had to settle for a multisampling rate of 4 although we initially went for 16.

I hope that helps.

devdept · March 24, 2009, 3:03am

matchStickMan,

In our software, we had to settle for a multisampling rate of 4 although we initially went for 16.

I think we need to do the same, it looks like on some GPUs our program drops from 100fps to 1-2fps and the quality improvement from 4 to 16 samples is almost not noticeable.

What do you think?

Thanks.

Alberto

Ilian_Dinev · March 24, 2009, 3:23am

Cache coherency is nulled, ROPs get overwhelmed, imho.

devdept · March 24, 2009, 6:16am

Hi Ilian,

I don’t understand your post, please explain.

Thanks,

Alberto

Ilian_Dinev · March 24, 2009, 2:46pm

Basically fillrate increases. And cache is very useful up to a certain threshold.
I.e if on a cpu-only app you work intensively over only 32kB of contiguous data (fits in L1 cache), bandwidth easily reaches 100GB/s, getting limited down only by the arithmetic ops you do. Make that 33kB, and you overstep the size of L1, arithmetic ops become less of a bottleneck. Then overstep the L2 by a good margin, make your memory-accesses more random, and your 1-cycle ops can each take 300+ cycles.

GPUs try to merge outputs from ROPs, as GDDR is with high-latency/high-granularity access. In GDDR3/4/5 datasheets I didn’t see in-stream masks for which bytes to be updated - there is only a whole-stream mask that is applied from start to finish of transfer (glColorMask indirectly creates this mask). So, when you update depths (and/or colors in CSAA) of subpixels, the previous 163232 depths have to be prefetched, masked, merged, uploaded to GDDR. (assuming 32x32 tile-size, which I deducted from some Insomniac Games reports).

Caches help tremendously up to a certain threshold. After that, the latency horrors get visible

Metal Gear Solid 4 uses framebuffers only 1024px wide for this reason, even though the RSX is claimed to have “enough cache to fit ANY dxt1 texture” (4096x4096 = 8MB?). They simply found that’s the threshold on that hardware, for their set of scenes.

matchStickMan · March 24, 2009, 8:14pm

… and the quality improvement from 4 to 16 samples is almost not noticeable.

Yea true. The increase in smoothness does not match the slowdown in fps.

imho, a sampling rate of 4 is good enough.

devdept · March 25, 2009, 1:49am

Thanks Guys, now everything is clear to me!

Alberto