PDA

View Full Version : Multisample AntiAliasing and number of samples



devdept
03-23-2009, 02:37 AM
Hi All,

Generally, should we expect a considerable drop of FPS changing the number of samples from 2 to 4, 8 and 16 ?

Thanks,

Alberto

matchStickMan
03-24-2009, 02:50 AM
yes i believe you should expect a drop in fps.

I'm no expert but from what I understood, a multisampling rate of 2 means you are grabbing two neighbors of a fragment and smoothing.
Same goes for multisampling rate of 4, 8 etc..

So the more samples you use, the more intensive the smoothing operation is which inevitably leads to a drop in fps.


In our software, we had to settle for a multisampling rate of 4 although we initially went for 16.


I hope that helps.

matchStickMan
03-24-2009, 02:50 AM
yes i believe you should expect a drop in fps.

I'm no expert but from what I understood, a multisampling rate of 2 means you are grabbing two neighbors of a fragment and smoothing.
Same goes for multisampling rate of 4, 8 etc..

So the more samples you use, the more intensive the smoothing operation is which inevitably leads to a drop in fps.


In our software, we had to settle for a multisampling rate of 4 although we initially went for 16.


I hope that helps.

devdept
03-24-2009, 03:03 AM
matchStickMan,


In our software, we had to settle for a multisampling rate of 4 although we initially went for 16.

I think we need to do the same, it looks like on some GPUs our program drops from 100fps to 1-2fps and the quality improvement from 4 to 16 samples is almost not noticeable.

What do you think?


Thanks.

Alberto

Ilian Dinev
03-24-2009, 03:23 AM
Cache coherency is nulled, ROPs get overwhelmed, imho.

devdept
03-24-2009, 06:16 AM
Hi Ilian,

I don't understand your post, please explain.

Thanks,

Alberto

Ilian Dinev
03-24-2009, 02:46 PM
Basically fillrate increases. And cache is very useful up to a certain threshold.
I.e if on a cpu-only app you work intensively over only 32kB of contiguous data (fits in L1 cache), bandwidth easily reaches 100GB/s, getting limited down only by the arithmetic ops you do. Make that 33kB, and you overstep the size of L1, arithmetic ops become less of a bottleneck. Then overstep the L2 by a good margin, make your memory-accesses more random, and your 1-cycle ops can each take 300+ cycles.

GPUs try to merge outputs from ROPs, as GDDR is with high-latency/high-granularity access. In GDDR3/4/5 datasheets I didn't see in-stream masks for which bytes to be updated - there is only a whole-stream mask that is applied from start to finish of transfer (glColorMask indirectly creates this mask). So, when you update depths (and/or colors in CSAA) of subpixels, the previous 16*32*32 depths have to be prefetched, masked, merged, uploaded to GDDR. (assuming 32x32 tile-size, which I deducted from some Insomniac Games reports).

Caches help tremendously up to a certain threshold. After that, the latency horrors get visible :)

Metal Gear Solid 4 uses framebuffers only 1024px wide for this reason, even though the RSX is claimed to have "enough cache to fit ANY dxt1 texture" (4096x4096 = 8MB?). They simply found that's the threshold on that hardware, for their set of scenes.

matchStickMan
03-24-2009, 08:14 PM
... and the quality improvement from 4 to 16 samples is almost not noticeable.



Yea true. The increase in smoothness does not match the slowdown in fps.

imho, a sampling rate of 4 is good enough.

devdept
03-25-2009, 01:49 AM
Thanks Guys, now everything is clear to me!

Alberto