How best to update model matrices for many sprites (Opengl >= 3.2)

fazzmatazz · August 7, 2015, 5:10am

I’m fairly new to OpenGL retained mode. I’ve been hunting around for best practices with regards to handling large numbers of sprites. If I have thousands of sprites, with anything from none to all requiring updating per frame, how best to pass that information to the shader? I’ve read plenty of different approaches without finding any consensus.

I have put all my sprite vertices into a single VBO. If I want to update their individual positions, rotations etc within a scene, I could,

Recalculate all the vertex positions on the CPU per frame and call glBufferData(…) to update the VBO. Pros, only requires one glDrawArrays(…) call. Cons increased bandwidth and CPU.
Recalculate the model matrix for each sprite. On render, call glUniformMatrix4fv(…) to update the model matrix uniform in the vertex shader, then call glDrawArrays(…) per sprite (six verts). Pros, lower bandwidth and CPU, moved matrix calcs to GPU. Cons, many more draw calls and uniform state changes.
Store a model matrix for each sprite as 3*vec4 stream/dynamic VBOs and pass to the vertex shader as attributes. Update using glBufferData(…). Pros, only requires one glDrawArrays(…) call and moved matrix calcs to GPU. Cons, increased bandwidth.

Each approach seems to have at least one significant drawback. Have I missed a trick? Can anyone give me advice on which approach I should take?

Thanks in advance!

Alfonse_Reinheart · August 7, 2015, 6:03am

OpenGL doesn’t have a “retained mode”.

You should use appropriate buffer object streaming approaches when uploading data like this.

As for the whole “increased bandwidth” downside… you’re talking about “thousands of sprites”, yes? If each sprite has 4 vertices, and each vertex is 32 bytes in size (which is rather large; you should be able to squeeze one into 16 or at least 24), and you have 5000 sprites, then you’re talking about 640,000 bytes. That’s not even 1MB. It’s nothing, really.

Having one draw call (with a UBO or uniform state change inbetween) for each of 5000 sprites is very bad, performance-wise. Basically, a non-starter as an idea.

Also, it would make each vertex that much larger, since each vertex would need its own matrix. So also a non-starter.

Of the 3 options you’ve outlined, #1 is the only one worth using (again, assuming proper streaming is done).

What you haven’t considered is an alternative approach based, not on vertices, but on sprites. In this case, you’re relying on instancing. You draw all of your sprites in one draw call, but they all use the same 4 vertices. Each sprite would be rendered as an instance, and your per-instance data are the important elements of the sprite: its location and dimensions in space, the location and dimensions of its texture coordinate, and transform information (not as a full matrix, but perhaps a rotation and scale). And maybe a color.

The 4 vertex shader invocations for each instance would use gl_VertexID to tell which of the 4 vertices of the sprite it is working on. They would each use the per-instance data to compute that particular vertex’s output values.

It may not be better than #1 in performance; you’ll have to measure it to see.

fazzmatazz · August 7, 2015, 9:05am

Thank you, Alfonse. Your answer was a great help. I put together a quick test program and for my particular implementation instancing performed slightly better, and only required a handful of small code changes to my 1# implementation.

For ref, I built two VBOs, one containing a simple three vertex triangle, the other an instanced VBO containing a few thousand vec3. Each vec3 representing each of my triangles’ 2D translation and rotation. I used glVertexAttribDivisor(…) to set how often to iterate over the instanced VBO. In the vertex shader, I used the vec3 to build a transformation matrix, which I then applied to the vertices. I used glDrawArraysInstanced(…) to render all of my triangles.