Would display lists help draw grass faster?

I’m rendering grass using a method like UT2004’s “decolayers”. First I divide the terrain into a grid and create a list of mesh instances. Then I find the grid cells near the camera and draw only those cells’ instances, using VBOs and setting the vertex arrays only once.

It’s pretty fast, but because I am drawing thousands of instances, any little speed boost would help. Are display lists totally obsolete, or might they be a good idea for this?

If you do things right, VBO’s will run at GPU’s edge of performance. The optimal case is when you draw all the grass with just one glDrawElements.

Display lists are always worth a try, especially because it’s so easy to convert existing code to display lists :wink:

How much performance you gain, if any, depends on many factors. So you’ll just have to try it out in your concrete situation.

Display Lists are very fast on nVidia hardware, but pretty slow on ATI hardware.

So, if you want to have a speed up in general, Display Lists won’t help you, since ATI doesn’t optimize them. I would advise you to try it using VBOs, that should give you nearly optimal performance on all hardware, since it is the de-factor standard to send geometry to the hardware and thus, is optimized by all vendors.

Jan.

Thank you for the feedback.

I found that in this case, it was much, much faster to collapse the meshes into chunks. For each cell in the grid, I collapsed meshes into one array. Before, I was rendering 2000 instances of a 15-poly mesh, so I imagine that must have been pretty wasteful compared to collapsing them:

I have to agree with Jan. I had one project that was still holding on to display lists instead of making the switch to grouped VBOs (with vertex arrays as the fall back)…

With DLs on NVIDIA it rocked. On ATI it didn’t - even on recent ATI hardware/drivers. Once I made the switch to VBOs, NVIDIA is measurably faster and ATI is way faster than it was with display lists.

what do you mean by grouped VBO’s?

what do you mean by grouped VBO’s?
I think he means that he does not create separate VBO for every single object, but put a group of objects into each VBO.

I imagine that must have been pretty wasteful compared to collapsing them
Yes. Each call to glDrawElements / glDraw arrays must perform a lot of stuff, so the less calls you have, the better. I’m using one draw call for every few hudreds or thousands polygons.

k_szczech is correct. I make a single VBO for each logical group of objects and draw them all with one glDrawElements call per group instead of per objects.

Here is a little more detail… While studying VBO’s for the transition from display lists I made a C++ class that accepts immediate mode like commands (who hasn’t :slight_smile: and it generates a VBO with indices for each primitive that I use (yes I use quads too, not just tris).

It includes trifan and tristrip support so I can cache the output of the glu tessellate callbacks right into the VBOs. Then I make a single glDrawElements call per primitive type and per material/state.

I’ve got a similar class, but it takes transforms too, and pre-cooks the vertices. It’s not all that useful in practice.
I wish nvidia would release the source code for their display list compiler, as I have a hard time matching their performance with my own optimising code.

Mine has to cook the transformed vertices (and transformed normals) too since I’m basically merging the polys from objects in different parts of the spatial scene graph just for display optimization.

Of course, if you want good-looking alpha blended grass, you need to draw them from back to front, so collapsing the geometry into a single array is impossible.

Oh well, I am pretty happy with the solution I have implemented:

you can get around that to some extent by drawing the grass unsorted with alpha test enabled, then draw the grass again unsorted with alpha blending enabled. You don’t notice the artifacts on the edges as much.

I wish nvidia would release the source code for their display list compiler, as I have a hard time matching their performance with my own optimising code.
I believe display list optimization inside a driver cannot be beaten by display list implemented outside the driver since driver’s display list can bypass some operations in the driver.
It’s more less the same problem as optimizing the loop by moving some calculations outside it’s body - this time we cannot remove anything from the loop.

You’re probably right…it may not even be a driver operation in nvidia hardware, could be that the card maintains the display lists.

bugger alpha test use alpha to coverage instead,

  • btw theres somke coming out of the chimney, would u be seeing grass like that next door (well 'cept maybe at the workers smoko breaks)