Minimal Instanced Tiled Map With Focus On Performance (i.e. Dwarf Fortress)

I have the basis of a tiled map with multiple layers implemented, and I’m looking for advice on my approach, given my somewhat unique intended application.

I’m building a simulation game that only requires a fairly minimal graphical interface. It is a tiled map where all of the tiles have the same size (something similar to Dwarf Fortress). There are going to be 3 or 4 layers of tiles, each layer pulling its tiles from a large texture with an array of tiles. There won’t even be a need to animate sprites moving across the tile map. The characters are simply tiles of the same size as the map tiles, and they move discretely one tile at a time.

The primary focus is performance, because I am building fairly complex game logic. The only exception to the tile map will be a simple menu system.

Here is my basic approach so far:

I have one array buffer object holding four vertices to create a single tile. These four vertices are then drawn multiple times using glDrawArraysInstanced to create the skeleton of the tile map.

Then I have a second VBO that holds the per instance data for each tile. It has the world position for each tile (three floats) and the texture coordinates for each tile (four floats) interleaved like this:


{
  0.0f, 0.0f, -2.0f,
  0.0f / TILESET_WIDTH, 13.0f / TILESET_HEIGHT,
  0.0f / TILESET_WIDTH, 14.0f / TILESET_HEIGHT,
  ...

This is sent to the shaders as vertex attributes like this:


glVertexAttribPointer(                                                         
  1, 3, GL_FLOAT, GL_FALSE, 7 * sizeof(GL_FLOAT), 0                            
);                                                                             
glVertexAttribPointer(                                                         
  2, 4, GL_FLOAT, GL_FALSE, 7 * sizeof(GL_FLOAT), (void*)(3 * sizeof(GL_FLOAT))
); 

This allows me to get all the per instance data for each tile into the shaders, but I’m a bit concerned because my current approach relies on some conditional logic in the shaders.

In the vertex shader, I use the gl_VertexID value to correctly set up the texture coordinates for each vertex. I only pass in the bottom-left and the top-right coordinates, and then use this to derive all four corners. But that logic is in the shader, and I’ve read some negative things about branching like this. I’ve looked into the non-branching functional versions of this logic, but I’m not sure if that’s the right approach.


if (gl_VertexID == 0) {                                                                                       
  tex_coord.x = aTexCoord[0];                                                                                 
  tex_coord.y = aTexCoord[1];
} else if (gl_VertexID == 1) {
  ...

In the fragment shader, the correct texture is chosen with another conditional based on the z-value of the tile, because each layer takes from its own unique texture atlas.


if (tileset_id == 0.0) {                                                                                      
  FragColor = texture(character_tileset, tex_coord);                           
} else if (tileset_id == -1.0) {                                                                              
  FragColor = texture(object_tileset, tex_coord);                              
} else if (tileset_id == -2.0) {                                                                              
  FragColor = texture(map_tileset, tex_coord);                                                                
}        

I’ll link to the repo with the full codebase to help clarify any confusion: https://github.com/ecssiah/last-ditch

I haven’t started on the UI system at all yet, and I would also really appreciate any suggestions on how to do this in a simple, efficient way. I have ruled out the Dwarf Fortress approach of simply building it within the tiled system. I think this is the one exception I’ll make for aesthetic reasons. I want a very a simple windowed menu system.

Thanks!!!

Anecdotal evidence is that such small instances aren’t efficient. If the implementation doesn’t process multiple instances in a single work group, you’ll only get a fraction of the GPU’s performance.

That will be inefficient because gl_VertexID will differ between concurrent invocations of the vertex shader. The usual way to do this with instancing would be to have the base texture coordinates as a per-instance attribute and the offset as a per-vertex attribute, and add the two in the vertex shader.

In the fragment shader, the correct texture is chosen with another conditional based on the z-value of the tile, because each layer takes from its own unique texture atlas.

This should use branches on hardware which has them, assuming that tileset_id is dynamically-uniform. But tileset_id should be an integer variable. But it would be better to use either an array of sampler variables or an array texture (GL_TEXTURE_2D_ARRAY and sampler2DArray).

Personally, I wouldn’t use instancing for this. Just create a screen-sized grid of disconnected quads (triangle pairs) and compute everything in the vertex shader based upon gl_VertexID, uniforms and textures. Or just construct the entire mesh client-side each frame. This won’t exactly stress modern graphics hardware (even low-end hardware) however you do it.

Thank you! This reply was very helpful. I just want to clarify a few things before I start making significant changes.

Anecdotal evidence is that such small instances aren’t efficient. If the implementation doesn’t process multiple instances in a single work group, you’ll only get a fraction of the GPU’s performance.

Do you think it would be best then to simply avoid the instanced rendering entirely and simply declare all of the vertices directly?

Also, I’ll read up on this for myself, but how do I recognize if multiple instances will be processed in a single work group?

This should use branches on hardware which has them, assuming that tileset_id is dynamically-uniform. But tileset_id should be an integer variable. But it would be better to use either an array of sampler variables or an array texture (GL_TEXTURE_2D_ARRAY and sampler2DArray).

What would be your preferred way to render these layers in a tile map? I don’t mind doing some significant refactoring to get this right. I have read about texture arrays, but I noticed they were always referred to as containing multiple mipmap levels, so I didn’t realize they were also used as something like a multi-dimensional texture. Could you give just a simple outline for how this would be used to allow tile textures to be pulled from these three different input tileset textures?

Personally, I wouldn’t use instancing for this. Just create a screen-sized grid of disconnected quads (triangle pairs) and compute everything in the vertex shader based upon gl_VertexID, uniforms and textures. Or just construct the entire mesh client-side each frame. This won’t exactly stress modern graphics hardware (even low-end hardware) however you do it.

Is this in reference to the UI menu system, or was it a comment about an alternative way of doing the tile map? It was quoted after my comment about doing the UI system, but it seems like it’s in reference to an alternative implementation of the tile map. I might just be confused, though.

Thanks for your help. It’s already clarified a number of things.

Probably. You don’t even need any per-vertex attributes; you can just use gl_VertexID to read from textures/UBOs/SSBOs (i.e. “fake instancing”). There’s some penalty for such accesses compared to attributes, but a tile map isn’t exactly demanding, and it may simplify the CPU-side code.

Either way (vertex arrays or textures/UBOs/SSBOs), copying data from the CPU to the GPU should be done in such a way to avoid synchronisation. Update buffers by writing to a new buffer, or a new data store (glBufferData() rather than glBufferSubData()) or an unused region of an existing buffer; update textures by writing to a PBO then updating the texture from the PBO. In-place modifications may stall the CPU until the GPU has finished reading the existing data.

Increase the number of vertices per instance (split each quad into more triangles) and see if it runs any slower (but if the frame rate is locked to the refresh rate, you’d need to use timer queries to measure this).

A 2D array texture is similar to a 3D texture except that there’s no filtering or wrapping in the third dimension, and the third texture coordinate is non-normalised (it’s just rounded to the nearest integer to select the layer). A 2D array texture is less flexible than an array of samplers (all layers have the same dimensions, format and sampling modes), but it only requires a single texture unit. Also, although it probably doesn’t matter for this application, the layer doesn’t need to be dynamically uniform (arrays of samplers can only be indexed with dynamically-uniform expressions).

I’m referring to the tile map.