Fast display of textured quads - alternatives?

Hi all,

There are many ways to render a textured quad in 2D in OpenGL. What is the best way to do it? Are any of the following much faster that the others, or much slower?

  1. Straight vertexes

gl.begin(GL.TRIANGLE_FAN) ;
gl.texCoordf(0.0f, 0.0f) ;
gl.vertex2i(54, 67) ;

gl.end() ;

Nothing is stored on the card. Coordinates are pumped straight out.

  1. Translated display list

gl.translatef(x, y, 0) ;
gl.callList(MY_QUAD) ;
gl.translatef(-x, -y, 0) ;

Coordinates are stored on the card, but you need two translations before and after rendering as the quad will have to be drawn at the origin.

  1. Recreated display list

if(moved) rebuildDisplayList() ;
gl.callList(MY_QUAD) ;

Coordinates stored on card at correct position, no translations required. However, display list must be rebuilt each time the quad moves.

  1. Vertex arrays

gl.drawArrays(GL.TRIANGLE_FAN, i, 4) ;

Data stored in vertex arrays and texture coordinate arrays.

It’s hard to fault the readability of number 1. I can’t imagine 3 being a good option, unless the quad moved very infrequently! It’s hard not to fault the readability of number 4, but it looks like it should work faster.

Are any of these definitely better or worse than the rest, and have I missed some significant way of getting a textured quad onto the screen?

For a single quad (or even 10 quads), the difference will probably be insignificant.

Now if you have thousands of quads to display, you may have a look at relevant extensions (ARB_vertex_buffer_objet, ATI_vertex_array_oject, NV_vertex_array_range, with a clear preference for the first, as it is multi-vendor). If you don’t have extension support, you’re stuck with vertex arrays. In some circumstances, immediate mode performs surprisingly well, in spite of the function calls overhead.

Originally posted by Charlie (still a guest):
[b]

  1. Translated display list

gl.translatef(x, y, 0) ;
gl.callList(MY_QUAD) ;
gl.translatef(-x, -y, 0) ;

Coordinates are stored on the card, but you need two translations before and after rendering as the quad will have to be drawn at the origin.
[/b]

That’s not necessarily true. You could use the matrix stack by modifing that code as follows:

gl.PushMatrix();
gl.translatef(x, y, 0) ;
gl.callList(MY_QUAD) ;
gl.PopMatrix();

In any case, I think that with just a single quad the performance difference is going to be fairly insignificant. That being said, I definitely wouldn’t rebuild the display list each time it moved as you do in Option 3.

With option 4, you could store multiple quads in the list and display the one you want simply by supplying the appropriate index. I’d probably go for that or option 1. Display lists for a single quad just don’t seem right to me, though there’s nothing technically wrong with using that approach.

Thanks for your comments. I’m glad you don’t consider there to be much of a difference with a small number of primitives - my quad-count is going to be reasonably low for a while, so I’ll stick with a brute-force solution until it obviously shows strain.

I’ll have to take a look at vertex arrays past that, then vertex buffer objects I presume. I’m still working from the 1.0 spec, so this is all rather new to me!

Deiussum, is the performance hit on a push/pop going to be less than a translate then?

I find with OpenGL that there’s lots of description about how to do different things, but no information as to why you’d want to do them and what effect it’ll have. Which makes sense if you consider the way the ARB works!

Display lists are a good example - I’ve seen no information about how many you should use, how much memory they’ll take up on-card etc. Technically, going by the spec, you could throw every vertex into a seperate display list and get a theoretical speed up. I don’t think that would work in practice though!

Generally a push/pop is just going to be the equivalent of a pointer change. A glTranslate is going to be a matrix multiplication, so yes, a push/pop would almost certainly perform better.