First thing's first: How many lights? If a forward shading pipe can easily handle your light count, you don't have a bunch of small triangles, and you don't have a bunch of overdraw with really complex shading, then just throw your scene at forward shading. Deferred techniques have some great advantages, but they demand special handling of not just edge-AA (which you know) but also translucency.
Like Ilian, I've done MSAA Deferred Shading. Just want to highlight a few things mentioned above that might not be crystal clear for you that helps with implementing Deferred techniques with MSAA.
You can allocate MS (multisample) render targets and MSAA rasterize to them with no problems. Definitely, do it!
- For the subsequent steps, you can do MS -> MS rendering by running the frag shader per-sample -- works, but can be expensive.
- Another easy option you can make use of is MS -> SS (SS = single sample) rendering by running the frag shader per-pixel. Have each frag shader thread read each sample from the MS buffer (via texelFetch), perform its operation on it, average the results over all samples, and then write out its downsampled per-pixel result. Saves you write bandwidth and a lot of size on the output buffer, when you can get away with it. This is easier with Deferred Shading and harder with Deferred Lighting.
- If you need to (make sure you do first!), you might be able to improve the efficiency of either of these by running a pass to classify the MS buffer and mark which pixels are "edge" pixels (i.e. which require per-sample shading) and those which aren't (which don't). Then you can process all samples in each pixel for edge-pixels, and just process the first sample for non-edge pixels. (note: there's a trick to mark edge pixels while rasterizing versus in a sep pass, but it doesn't handle intersecting triangles; note also: you might be able to do something like this in the prev step without even having sep passes by looking at your G-buffer data).
More advanced techniques mentioned by some graphics gurus take the latter step and ++ it by repacking the samples into adjacent GPU threads for greater GPU efficiency, but this isn't simple and it's very likely you'll find you don't need to go anywhere near this (or possibly the edge pixel classify optimization either).




