PDA

View Full Version : should i turn off AA with fullscreen quads



zed
08-15-2007, 01:36 PM
the title saiz it all!!

im assuming theres practically no speed difference between the two, is this correct?

At the moment, im disabling it during the shadowmap creation stage, which makes a large difference (due to on average im rendering far more geometry in this stage than in the normal stage)

are there any other times i should disable AA?

PaladinOfKaos
08-15-2007, 02:42 PM
If all of your rendering is to an offscreen buffer, I'd disable AA on your main window and use the fbo_multisample extension for when you need something to be antialised. The latest nVidia drivers support that extension at least back to the GeForce 6, I'm not sure about AMD drivers, but I suspect they support it as well.

Humus
08-16-2007, 09:21 AM
Yes, call glDisable(GL_MULTISAMPLE) before any form of rendering that does not need its edges smoothed, like for instance fullscreen quads (it's better to do a fullscreen triangle though since that eliminates the diagonal edge) or for things like skyboxes (works even if they are drawn last (which they should be) because the sample will still be tested against all destination depth values, so you'll still get AA against the scene geometry).

Lindley
08-16-2007, 09:25 AM
Skyboxes drawn last? That's not the way I learned it.

I always figured skyboxes should be drawn first with DEPTH_TEST disabled, so that everything else drew in front of them.

Humus
08-16-2007, 09:35 AM
Nope. That was true in the Rage 128 era. ;) But ever since Radeon it's been better to draw the skybox last since you can cull large chunks of it with HyperZ. It's the same front-to-back logic as with everything else, and the skybox is most definitely in the back.

Somehow this seems very unintuitive to a lot of developers, and seven years later we're still struggling with educating developers to do this right. Loads of games shipping today still draw the skybox first, even though it's slower and it should be a trivial fix in many cases.

Brolingstanz
08-16-2007, 09:44 AM
of course you could benefit from early z if you render front to back, assuming you don't need to blend with the background.

But then again, skyboxes are a bit of a snooze IMHO...

Humus
08-16-2007, 10:00 AM
Originally posted by modus:
of course you could benefit from early z if you render front to back, assuming you don't need to blend with the background.Well, if you need to blend with the background the correct order is like this:
1) Opaque objects
2) Skybox
3) Blended objects

But the skybox should never be draw first, unless everything needs to be blended with it.


Originally posted by modus:
But then again, skyboxes are a bit of a snooze IMHO... Well, if you have more advanced representation of the environment then most certainly it should be drawn last. The more advanced the more you can gain from this optimization. A plain cubemap may see a couple of percent speedup of the game with this, whereas an advanced multi-layered sky could see much larger gains from it.

Brolingstanz
08-16-2007, 10:09 AM
Well, it depends on what you mean by advanced.

If you're actually rendering a nontrivial atmosphere with volumetric clouds and such, then there is no one sort order... it's view dependant.

And what about light shafts, or other cool volumetric atmospheric effects, not to mention single/multiple scattering, arial perspective, etc.

zed
08-16-2007, 01:17 PM
Yes, call glDisable(GL_MULTISAMPLE) before any form of rendering that does not need its edges smoothed, like for instance fullscreen quads (it's better to do a fullscreen triangle though since that eliminates the diagonal edge)champion humus as usual.

yes i was leaving it on for the skybox (rushes out the room)

ill change over to drawing a triangle as well instead of a quad (ive heard this before)
luckily i have a function that goes draw_fullscreen_quad(W,H) so that'll be quick

i assume the best method is
point A = corner0
point B = width*2
point C = height*2
or would i need to add 1 to point B (+ perhaps also C) in case it misses the point in the upperright corner?

Jan
08-16-2007, 02:03 PM
Erm, may i ask, what is the advantage of rendering a triangle over a quad for fullscreen-effects? I mean, it's only ONE triangle less.

Jan.

Humus
08-16-2007, 02:16 PM
Originally posted by modus:
Well, it depends on what you mean by advanced.

If you're actually rendering a nontrivial atmosphere with volumetric clouds and such, then there is no one sort order... it's view dependant.

And what about light shafts, or other cool volumetric atmospheric effects, not to mention single/multiple scattering, arial perspective, etc. Sure, you can always come up with exceptions, but I'm trying to give a simple generic answer here. :) Things like volumetric clouds and stuff kind of fall outside what I consider "the environment", I'm talking about the "at infinity environment" or whatever you want to call it. But I suppose many of those effects fall in the group that should be drawn last or late anyway because they are blended.

Humus
08-16-2007, 02:23 PM
Originally posted by zed:
i assume the best method is
point A = corner0
point B = width*2
point C = height*2I'm not sure if there's a "best" method. As long as it covers the entire [-1, 1] range in clip space it's fine. Two layout's I've been using are:
(-1, -1), (3, -1), (-1, 3)
and
(0, 2), (3, -1), (-3, -1)

Humus
08-16-2007, 02:33 PM
Originally posted by Jan:
Erm, may i ask, what is the advantage of rendering a triangle over a quad for fullscreen-effects? I mean, it's only ONE triangle less.

Jan. The diagonal edge in the middle introduces a slight inefficiency. The pixels on the edge will be shaded twice, once for each triangle. With multisampling enabled you'd also get these pixels decompressed.
Of course, it's not a huge gain. In an extreme case I was able to squeeze 4% gain out of it, but in more common cases you'd maybe see a fraction of a percent. But on the other hand, it's a trivial change so why not? The question should rather be, what's the advantage of a quad, to which the answer is that there's none. The only reason people use quads is because the screen is rectangular, so I suppose a quad is more intuitive.

Lindley
08-16-2007, 03:25 PM
If the texture you're rendering to is larger than the quad, is there still a gain to using a triangle with the scissor test?

Humus
08-16-2007, 04:45 PM
I haven't tested that, but I would guess so.

Korval
08-16-2007, 05:13 PM
But on the other hand, it's a trivial change so why not?Is it? I mean, how exactly do you draw a rectangular area with a single triangle? Wouldn't the triangle have to be very large, compared to the size of the screen?

sqrt[-1]
08-16-2007, 06:24 PM
I find this interesting as in some console hardware docs, they recommend doing the fullscreen passes in a grid of quads (6x8 tiles? - not sure)

Something about not flooding the fragment pipe or something.. (and most consoles use PC-like hardware)

Lindley
08-16-2007, 11:33 PM
More details on the tradeoffs of the various techniques for full-screen "quad" drawing would be greatly appreciated.

wizard
08-17-2007, 12:48 AM
Originally posted by Korval:

But on the other hand, it's a trivial change so why not?Is it? I mean, how exactly do you draw a rectangular area with a single triangle? Wouldn't the triangle have to be very large, compared to the size of the screen? Yep :) But when not using AA clipping should even lessen the "superiority" of the triangle way, right?

V-man
08-17-2007, 01:00 AM
If you draw a triangle to cover the entire screen, I think the hw will clip it and you will end up with 2 triangles still. Unless if todays GPUs are working differently from the past stuff.

Korval
08-17-2007, 02:20 AM
If you draw a triangle to cover the entire screen, I think the hw will clip it and you will end up with 2 triangles still.I would guess that it doesn't actually clip triangles to the scissor box. It merely culls fragments outside the box.

My main question is this: if I use a quad, I am guaranteed pixel-perfect alignment. How can I get that guarantee when I'm going to get floating-point rounding error on interpolation?

zed
08-17-2007, 03:05 AM
Wouldn't the triangle have to be very large, compared to the size of the screen?does it for matter?, surely clipping a small part of the tri aint much better than clipping a lot of it. (my mentioning of adding one offset was more for safety with rounding)


I'm not sure if there's a "best" method. As long as it covers the entire [-1, 1] range in clip space it's fine. Two layout's I've been using are:
(-1, -1), (3, -1), (-1, 3)
and
(0, 2), (3, -1), (-3, -1)related to the above, method A clips less than method B (though ultimately does it matter?), method A seems more natural as well since u align a tris edge with the screens edge

sqrt[-1] mentioned something though about divided the screen up into smaller quads ( + i think ive heard something similar as well ) personally it seems not logical (but what would i know, ok the card processes everything in chunks, so perhaps theres something)

in my game i have (at least)

draw fullscreen quad depth pass
draw fullscreen quad horizontal bloom
draw fullscreen quad vertical bloom
draw fullscreen quad horizontal bloom
draw fullscreen quad vertical bloom

draw fullscreen quad horizontal DOF
draw fullscreen quad vertical DOF
draw fullscreen quad horizontal DOF
draw fullscreen quad vertical DOF

plus i think particle buffer + another depth pass

so thats quite a few fullscreen(*) quads im drawing, so improving this even by a single percent is worthwhile, since its a hell of a lot of pixels (more at lesser resolutions, then again the trend is for higher res's so importance is less though the counterpunch is postprocessing something new )

(*)note fullscreen is actually 1:1 or 1:4 or 1:16 sized :) depending on the rendering buffers mapping

btw wizard 3 posts in 6 years :) great,
still trying to decifier that last one though.

(rant mode)
deleted
(/rant)

(edit) actually this would be a great topic for a pdf from nvidia or amd 'how to draw a fullscreen quad' which today is more pertinent then ever.

what's funny is, theres no consensus here + its prolly the simpliest thing that a person can do in graphics

wizard
08-17-2007, 03:35 AM
zed: Ain't it great. I've been working not posting ;) But I promise I'll be writing more in the future, lol.

Korval: I'm sure clipping is done in any case. Rasterizing areas outside the viewport and then discarding them would be a waste of time.

Humus
08-17-2007, 10:20 AM
Originally posted by sqrt[-1]:
I find this interesting as in some console hardware docs, they recommend doing the fullscreen passes in a grid of quads (6x8 tiles? - not sure)

Something about not flooding the fragment pipe or something.. (and most consoles use PC-like hardware) I'm not a console guy, but I believe those tiles are screenspace points, so they are rasterized as squares instead of as two triangles. You could try implementing something similar on PC with pointsprites, but that would add some math to the shader for texture coordinate computation, so I'm not sure if that would be a gain.

^Fishman
08-17-2007, 10:35 AM
Predicated Tiling (http://msdn2.microsoft.com/en-us/library/bb464139.aspx) .

Humus
08-17-2007, 10:38 AM
Originally posted by V-man:
If you draw a triangle to cover the entire screen, I think the hw will clip it and you will end up with 2 triangles still.Not unless it goes outside the guardband. It's a bit old, but there's a fairly good overview of how it works here:
http://developer.nvidia.com/object/Guard_Band_Clipping.html

Korval
08-17-2007, 10:41 AM
I'm sure clipping is done in any case. Rasterizing areas outside the viewport and then discarding them would be a waste of time.If clipping were happening, then there would not simply be one diagonal line as in the quad case; there would be many. Which would make this a totally meaningless idea from a performance standpoint.

Normally, clipping only happens if it is absolutely necessary. That is, if the polygon would break the plain of the camera.

Humus
08-17-2007, 10:47 AM
Originally posted by Korval:
My main question is this: if I use a quad, I am guaranteed pixel-perfect alignment. How can I get that guarantee when I'm going to get floating-point rounding error on interpolation? I really don't think this would ever matter for anything. Not sure if you'd be "pixel-perfect aligned" with quads even. The triangle would be twice as large as the quad, so I assume at worst you lose one bit of precision.

Lindley
08-17-2007, 12:32 PM
Well, I tried using a scissored triangle in place of a quad. It did show a very slight speedup. Thanks for the tip!

ZbuffeR
08-17-2007, 03:06 PM
This "Predicated Tiling" reminds me of ... Tiled rendering on the PowerVR-based cards ... been a long time.

dorbie
08-19-2007, 12:30 AM
ZbuffeR, predicated tiling is a very old idea, you could go back to pixel planes and see it implemented.

Various contemporary architectures have similar styles of framebuffer management, but it has long been understood that it is not free.

http://www.cs.unc.edu/~pxfl/

ZbuffeR
08-19-2007, 01:22 PM
Thanks Dorbie, for the background info.

tarantula
08-20-2007, 02:32 AM
The full screen Triangle vs Quad performance seems to be a bit better known in the GPGPU community. GPUBench (http://graphics.stanford.edu/projects/gpubench/) has a test dedicated to this. You can see that using a full screen triangle is slightly faster. Check the third graph on this page for results on 7800GTX: http://graphics.stanford.edu/projects/gpubench/results/7800GTX-7772/

sqrt[-1]
08-20-2007, 08:29 PM
I looked in console docs about the grid for fullscreen passes and it states that it can be better due to the GPU's rasterization rules and minimizing texture cache misses. (8x1 grid seemed to be good for 1280x720)

Perhaps if someone is really keen they could write a test that cycles through a lot of different grid pattens for a given resolution to find the optimal one for different cards?

zed
08-21-2007, 12:32 PM
ill do some tests tonight
so thats 8 quads of (1280/8) x 720
ill try diving it up vertically as well
also perhaps 4 triangles centered on the screencenter is the way to go

zed
08-25-2007, 04:33 PM
there is something to spliting the screen up into smaller areas

using GPUBench
fpfilltest -r triangle -c1 -k 256 -n == ~1120m/pix
fpfilltest -r triangle -c1 -n == ~1090m/pix