GeForce wireframe performance & occlusion testing

I was wondering why wireframe performance on most Geforce 3/4 cards is much lower than filled polygons, even when unlit & untextured. Isn’t it supposed to be faster, like in the old days? Is it because vendors these days don’t optimize the drivers for wireframe rendering as much as they do on newer stuff like shaders these days? Or am I missing something here?

Also, I noticed my OpenGL powered engine renders much faster (on my GF4Ti4200) when objects are rendered from back to front (depth sorted) rather than when I skip sorting and rendering the objects in no particular Z-order. The framerate differs as much as 25%.
I know my card supports HP’s occlusion testing, and NV’s occlusion query, so my guess is that the driver uses some sort of similar occlusion culling at all times (the card isn’t really fill rate limited btw). I don’t know that much of what goes on after you send stuff down the GPU, so can anyone confirm this?

For line performance : I believe now that Nvidia wanted a better difference between the consumer cards and the professional (quadro) ones.

It fits nicely in the “main key selling points” :
get better wireframe perf with your quadro !

>> faster when objects are rendered back to front (depth sorted)

Are you sure it is not “front to back” ? That way early z optimisations would work. Or maybe you disabled the zbuffer for this case (painter’s algorithm).

Yes NVIDIA deliberately designs faster wireframe into Quadro DCC class systems (or at least used to), however, high quality AA wireframe can be difficult, many systems have slower wireframe than poly performance, consider also how the application draws data. It’s all down to design priorities, but don’t underestimate the problems just because drawing a line seems simple.

Occlusion querry extensions etc require specific application level code support but they also have the potential to save T&L load. Features like coarse z & early z don’t require application level support and automatically increase performance when things are drawn front to back or when you just get lucky with the draw order, but they only have the potential to reduce the pixel fill load.

Originally posted by remdul:
I was wondering why wireframe performance on most Geforce 3/4 cards is much lower than filled polygons, even when unlit & untextured. Isn’t it supposed to be faster, like in the old days? Is it because vendors these days don’t optimize the drivers for wireframe rendering as much as they do on newer stuff like shaders these days? Or am I missing something here?

Line drawing is a CAD feature, IHVs normally try to split their business between workstation (expensive) & desktop (cheap) lines of cards. They don’t want workstation users to buy their cheap desktop cards instead of the expensive workstation ones.

One way they do that is by “crippling”/removing workstation features of their desktop cards and line drawing is one of them (wide lines, line antialiasing, display list optimzation are other examples of those).
As those features are not normally used in desktop applications (read: games), taxing those doesn’t affect the desktop market.

You will see that the performance of line drawing in Quadro cards will more than probably excel those of the GeForce series.

Line rendering is almost always vertex transfer, vertex transformation or rasterizer setup limited (almost never fill-rate limited). In desktop cards, the driver may need to fallback to a slow path when rendering lines. For example:

  • Rendering antialiased lines via textured lines.
  • Rendering each line as two triangles, which will probably penalize some or all the typical line drawing bottlenecks (depending on where the conversion to triangles is done).


Also, I noticed my OpenGL powered engine renders much faster (on my GF4Ti4200) when objects are rendered from back to front (depth sorted) rather than when I skip sorting and rendering the objects in no particular Z-order. The framerate differs as much as 25%.

I guess you mean “front to back” rather than “back to front”.
The behaviour you are seeing is called “early reject”. Graphics cards have a coarse test for groups of pixels (“tiles”, “chunks” or “stamps”, depending on the IHV’s naming conventions) in such a way that it can discard those regions at once if the whole region won’t be visible (because of depth testing, alpha testing or stencilling out). Those regions range from 2x2 to 8x8 pixels, for example.

In some graphics cards this is also helped by a hierarchical z-buffer which gives you even bigger wins, as the regions you can discard are bigger.

That’s why “warming” or rendering your unlit scene to your depth-buffer when you are heavily fill-rate limited is normally a good thing.

Thanks. I already feared the wireframe performance had something to do with the vendors and the difference between the lines of cards.

But that aside, I realised just after posting that the two topics might actually may have a connection;

When rendering wireframe polygons occlusion testing, z-culling etc. is almost useless. So that might also explain the performance drop; the GPU just has to process all polygons. Still, I’m sure wireframe can be a lot faster if the vendor wouldn’t cripple the desktop cards like mentioned by you gentlemen.

Actually, I’m using AA-lines for my shadow mapping, so there’s definatly a use for it in games. Though the use of it doesn’t cause a noticable performance hit in my implementation.

And by the way, yes, I am rendering stuff from front to back.