Approaches for 2D tile rendering

Hi,
I’m wanting to render a 2D tile map made up of 8x8 squares on a screen 1280x1024 (so 160*128 => 20k tiles => 40k triangles). There are 256 different 8x8 textures. The actual map is larger than the visible screen and is dynamic i.e. texture for a tile can change. I’d like to target GeForce 6800 hardware and above with 30-60fps which I think is reasonable.

My first idea (not thought through) is a static VBO with a triangle strip mesh of tiles 1 row and column bigger than the screen. On the CPU I’d generate texture coords for each of the tiles, updating it as the screen pans. A single 2048x2048 texture can be used for all the tiles.

I’ve got a feeling I could probably do something like encode the map as tile index (0…255) data in a texture and generate the texture coords in the shader.

I’m quite happy to try different ideas to see what’s best, but it’s the general ideas I’m looking for.

Thanks.

A triangle strip won’t work in your case as, if I understand it correctly, you want to switch the texture of a single tile independently from others so you definitely cannot share vertices between tiles.
If I would be you, I would create a separate VBO for your vertex position data and some separate store for the texcoords/texture indices whatever you want to be able to change to replace the used texture or texture region.
Actually, if you change your tiles often, maybe it would be better not to use VBOs for the tile information but rather store it in a client side data. I think 40K triangles shouldn’t be an issue even if you use client side data.

I agree with aqnuep, but I’d like to add a little bit. You wrote:

I think it would be easier to leave your texture coordinates static and to translate your grid of triangles with a transform matrix as the screen pans. So, you should be able to have static vertices and static texture coords in your VBO(s).

The part that confuses me is that you wrote:

There are 256 different 8x8 textures. The actual map is larger than the visible screen and is dynamic i.e. texture for a tile can change… A single 2048x2048 texture can be used for all the tiles.
It’s not clear what you have or mean by texture or texture map. If you have just one large 2048x2048 texture map, then you only need one quad that fills your window, and map that one large texture map to that one large quad. If that is the case, I guess you’d be updating the entire 2048x2048 texture map every time any change is made to any 8x8 region within it, and sending that entire huge texture map to the GPU for each little change. That sounds like something that will kill performance if you modify the giant texture map frequently.

On the other hand, if you don’t have any large texture maps, but just 256 8x8 texture maps and just change which of those 256 small texture maps are mapped to any particular region of the window, then you do need a static grid of vertices (to define a small square for each small texture map) and a static set of texture coordinates for each of the vertices for each of the small squares. I haven’t used them, but I think you’d be best off using a 256x8x8 3D texture map in that case, and just associate a 1D texture map with one element per small square (160128 of them in total). Each element in the 1D texture map would be the index (0 to 255) into the 3D texture map telling which 8x8 texture map is associated with each small square. To change the mapping of 8x8 texture maps to small squares, all you’d need to do is change the indices stored in the 1D texture map (which can just be one byte each since you have no more than 256 8x8 texture maps). That would require sending just 20480 bytes per mapping change rather than 20482048*3 (12MB) bytes per mapping change as would be required with a single large texture map. (Sending the big texture map requires 614 times as much data as just sending the indices to the changed texture mappings.)

Finally, to pan your image as you described, I’d use a transform matrix to translate the vertices in a static VBO and don’t touch the texture coords at all.

I don’t program with old OpenGL so I’m not sure how you’d do everything I’ve described regarding the texture coordinate that selects which of the 256 8x8 texture maps for each tile. That’s really easy to do with shaders, though.

I’ve given this some more thought. I think I have an easier and more efficient method. Define one big quad (four vertices) for the entire window. Put it in a static VBO, and put static texture coords in the VBO for each of the four vertices. Define one 2048x2048 texture map. Whenever you need to update some of the individual 8x8 pixel subregions within the 2048x2048 texture map, just send the pixels for just those regions within the texture map to the GPU. Offhand I’m not sure of the best GL function to use for modifying a subregion of a 2D texture map that’s already in GPU memory. Anyway, this is easier than messing with 3D texture maps, and it only involves sending 8x8x3 bytes per changed subregion of the big texture map rather than 20480 bytes for a changed 1D texture map of indices.

So, basically, your hardware only has to draw two triangles and map one texture map to each of them, and send 8x8x3 bytes per subregion when changed. Should be very efficient.

I think you misunderstood Stuart. He meant he uses the texture as an atlas that contains all of his tile textures. He wants to set completely independent tile textures from the atlas to his individual 8x8 quads on screen, not wanting to move just the texture background dynamically. Each tile is an entity with its own material.

Stuart needs to clarify things. The reason I didn’t interpret things as you did is because he said he has 256 8x8 textures. If he has a texture atlas consisting of those 256 8x8 textures, then that texture atlas would be something like 2048x8 pixels in size. Instead, he wrote about a single 2048x2048 texture that can be used for all tiles, and he used the word tiles to mean squares in his window. I understood that he wants to set 8x8 textures on tiles independently. My second proposed solution was, as I wrote, a simpler way of doing that. However, I just thought of a third solution, and I think this is the best yet:

  1. create a static VBO with the four vertices of a large quad, and assign each vertex static texture coordinates.

  2. have a large 2D texture map that entirely covers the large quad.

  3. have a 8x2048 2D texture atlas (consisting of a single column of 256 8x8 textures).

  4. whenever any 8x8 texture needs to change in the large 2D texture map, use a bit blit to copy that 8x8 texture from the texture atlas to the appropriate place in the large texture map.

  5. pan the large quad with a simple transform matrix. The large texture map consisting of many 8x8 textures will pan along with it.

This requires very little traffic across the bus, since the GPU has all the data except for the transform matrix to pan the quad, and the commands to bit blit 8x8 textures from the texture atlas to the large texture map whenever any change. Only two triangles need to be drawn (so only four vertices need to be transformed since a triangle strip can be used), and one texture map is applied to both of them, so this should be very efficient and fast.

Stuart said this is a 2D problem, so there is no need to have a separate quad for each tile. Let the large texture map define all the tiles implicitly. That is, the large texture map is tiled so the geometry needn’t be. Two triangles/four vertices is all that’s needed for the geometry. Stuart didn’t mention anything about material properties for his tiles. If he also has some additional, unstated requirements for his 2D grid of tiles, he might better implement those with multiple large texture maps, or by blending particular tile effects into the large 2D texture map where appropriate.

I like your last idea about using blit. Actually I agree that that is the best and most efficient solution as the data stays on the GPU.

Yes aqnuep is correct. I have a map (larger than the screen) of tile indices (0…255). Each index refers to an 8x8 texture which I was planning to store in a 2048x2048 texture atlas.

I need to take a visible screen on indices (160x128) and render it. Hence the 40k triangles. As the screen pans the set of indices changes, so it’s still the same 40k triangles, but I need new texture coords to change which tiles are displayed.

David, your idea of a 1D texture with 160x128 elements. So calc an index into it in the vertex shader (using the X,Y of the vertex) to pass to the fragment shader. The FS looks up the 1D texture and uses the index (0…255) to calc the texture coords of the 8x8 tile in the texture atlas.

My mesh would need to be 1 row and 1 column wider than the screen to allow for panning (so the 1D texture would be 161x129 as well). When a new row and/or column of tiles became visible I would update the 1D texture with the new indices.

Seem reasonable?

Seem reasonable?

More or less. But that’s probably not the easiest or the fastest solution, because the entire 1D texture containing the 161x129 indices has to be sent from the CPU to the GPU whenever any 8x8 texture mapping changes. aqnuep and I agree that the best solution is probably the third one I suggested, which involves bit blitting 8x8 pixel textures from the texture atlas to one large display texture map mapped to one large quad.

Incidentally, when you pan far enough that one column or row of 8x8 pixel textures will fall off the edge and a new column or row of 8x8 pixels will appear on the other edge of the window, rather than bit blitting numerous 8x8 textures from the texture atlas to the large display texture map, you could just bit blit all but the column or row that falls off the display texture map eight pixels over on the display texture map and bit blit individual 8x8 textures from the texture atlas to just the newly visible column or row of the large display texture map. That would dramatically reduce the number of bit blit commands necessary.

Sorry, I had stated a reply, went back and posted it during which time both of you had posted replies which I missed until now. Oh and sorry about the confusion of a 2048x2048 texture atlas. I meant 128x128.

Still a beginner, so I have some questions.

So a single quad with a single texture, 1 row and 1 column bigger than the screen i.e. 1288x1032?

  1. have a 8x2048 2D texture atlas (consisting of a single column of 256 8x8 textures).
    Do you know if there are any limits on the texture size ratio on older hardware? I guess making it 128x128 would still work.
  1. whenever any 8x8 texture needs to change in the large 2D texture map, use a bit blit to copy that 8x8 texture from the texture atlas to the appropriate place in the large texture map.
    Is that using a FBO bound to the display texture and RTT to blit from the texture atlas to it? Each tile update would require 1 command?
  1. pan the large quad with a simple transform matrix. The large texture map consisting of many 8x8 textures will pan along with it.
    Either move the camera position or move the single quad? I’d probably move the camera since that makes more sense in my head.

Ok, I think I understand that, but it’s how to keep the large texture up to date as the screen pans I don’t follow. In simplistic terms when a new row or column appears I could re-blit the entire large 2D texture, but that seems too slow to me since I’d have to send 161x129 blit commands.

Incidentally, when you pan far enough that one column or row of 8x8 pixel textures will fall off the edge and a new column or row of 8x8 pixels will appear on the other edge of the window, rather than bit blitting numerous 8x8 textures from the texture atlas to the large display texture map
Is that the 161x191 blits I mention above?
…you could just bit blit all but the column or row that falls off the display texture map eight pixels over on the display texture map
Does this mean blit the display texture over itself, but shifted 8 pixels to make room for the new row/column? Can the source and destination be same? Or would I need to ping-pong between two large display textures? i.e. copy the entire display texture to a copy, but shifted over?
…and bit blit individual 8x8 textures from the texture atlas to just the newly visible column or row of the large display texture map.
and then do 161/129 blit commands to update the new row/column?

Thanks for the help. I’d never have thought updating a large texture would be the way to go.

Yes, it would require 1 command per blit, but it is still better than transmitting texcoords or texture data over the bus.

  1. pan the large quad with a simple transform matrix. The large texture map consisting of many 8x8 textures will pan along with it.
    Either move the camera position or move the single quad? I’d probably move the camera since that makes more sense in my head.

No you don’t. Rather than moving the camera matrix, just move the texture matrix and use a repeating texture wrap mode. This way when a new row or column appears you have to re-build only one row or column which is just 161 or 129 blit commands as david_f_knight said.

Ahhh, so I use the GL_TEXTURE matrix to move the texture and the quad remains unchanged. I had actually never heard of the texture matrix until just the other day. Ok, hopefully I’ve got it now. Certainly got enough start.

Once again thanks.

[Note: I didn’t even notice that the thread went to a second page before I posted this whole lengthy thing below, in response to your last post on page one of this thread. I didn’t see the last post from either aqnuep or Stuart before posting mine. Use aqnuep’s advice regarding the texture matrix rather than what I wrote below.]

So a single quad with a single texture, 1 row and 1 column bigger than the screen i.e. 1288x1032?
Yes.

Do you know if there are any limits on the texture size ratio on older hardware? I guess making it 128x128 would still work.
I think that older hardware has a requirement that texture maps be power of 2 texels long on a side, but I don’t know whether they have a requirement that they also be square. I am sure a number of other people on this forum can answer that question. I am sure that a 128x128 texture atlas will work with any hadware, but it is a little more difficult to work with than an eight pixel wide column of little textures (also, an eight pixel wide column of textures will access memory most efficiently (i.e., consequtively) which makes it the fastest possible organization for your texture atlas.

Is that using a FBO bound to the display texture and RTT to blit from the texture atlas to it? Each tile update would require 1 command?
I don’t know what RTT stands for. Yes, each tile update would require one blit command. I think an FBO would be the way to have both GPU read and GPU write access to the display texture. (There are others here that can answer that definitively.)

Either move the camera position or move the single quad? I’d probably move the camera since that makes more sense in my head.
You can kind of think of it however is easiest for you, but ultimately it will all need to end up the same. Anyway, the transformation matrix manipulates vertices, and vertices define geometry (i.e., the quad your display texture is mapped to). The viewpoint (i.e., camera) typically is fixed at the origin and points down the +Z axis, and the rest of the virtual world is rotated and translated so that the viewpoint is where it needs to be. Ultimately, you need to map the visible portion of your quad to a -1 to +1 box (in X and in Y) and can define all Z coordinates the same (zero is good) and all W coordinates the same (1 is good) [this is called the clip coordinates]. If so, you don’t need any projection transform at all. Before your quad is rasterized or the texture map applied, the fixed OpenGL pipeline applies the Viewport Transform to convert normalized device coordinates (same as clip coordinates if all W equal 1) to window coordinates (which you will want to set to 1280x1024 with glViewport()).

Ok, I think I understand that, but it’s how to keep the large texture up to date as the screen pans I don’t follow. In simplistic terms when a new row or column appears I could re-blit the entire large 2D texture, but that seems too slow to me since I’d have to send 161x129 blit commands.
Yes to everything you wrote.

Is that the 161x191 blits I mention above?
It is the 161x129 blits you mentioned above.

Does this mean blit the display texture over itself, but shifted 8 pixels to make room for the new row/column?
Yes.

Can the source and destination be same?
Yes.

Or would I need to ping-pong between two large display textures? i.e. copy the entire display texture to a copy, but shifted over?
No, not necessary. But the blit has to copy in the proper order (i.e., from left-to-right or right-to-left or top-to-bottom or bottom-to-top) so that it doesn’t copy what’s already been copied.

and then do 161/129 blit commands to update the new row/column?
Exactly.

Thanks for the help. I’d never have thought updating a large texture would be the way to go.
You’re welcome. The proof is always in the pudding; nothing beats actually implementing it and testing it out. But this approach minimizes the amount of traffic over the CPU to GPU bus, which is a major bottleneck, and minimizes the number of transforms and triangles, which are other major bottlenecks. Also, blitting is very fast.

No they don’t, so you can use textures of sizes e.g. 1024x128. Also mipmaping is working flawless with these. Actually you can even use rectangular textures on older hardware if you cannot even respect the power of 2 rule.

Just to update this with some info in case a google search finds it here are the timing results for the various ways of updating the display texture from the atlas. i.e. a single large (screen sized) display texture which gets updated with tiles from the atlas texture. The display texture is then drawn to the screen using a single quad.

  1. Two FBOs. One with the display texture bound to it the other with the atlas texture. Then use glBlitFramebuffer to copy a tile from the atlas to the display texture.

  2. A PBO with the tiles stored one after another. Then unpack from the PBO to the display texture using glTexSubImage2D.

  3. A FBO bound to the display texture. A VBO with a quad for each tile (256 in my case). Each quad is the size of a tile (8x8) and has texture coords defining which part of the atlas contains the tile data. Then use Render To Texture (RTT) to draw the quad (with glDrawArrays) to the display texture.

  4. Use glTexSubImage2d to update from the atlas to the display texture.

The results for my ATI 5850 with a display texture of 1280x1024 an atlas of 256 8x8 tiles and updating 256,512,1024 random tiles on the display texture per frame


                                256     512     1024
--------------------------------------------------------
3) FBO + VBO of quads          1174     715      415
1) FBOs + glBlitFramebuffer     225     112       52
2) PBOs                         128      63       33
4) glTexSubImage2d              108      56       29

I don’t know enough about the pipeline to understand these results, but I guess it’s obvious the FBO+quads would be best since it’s really just 2x<Number of Updates> triangles being drawn.

Unfortunately, my old PC (with a GeForce 6800) just blew up so I can’t test on that.

Thanks again for the help.

Sorry but can you edit your post so that the unit you measure is apparent ? Is it FPS or milliseconds ?
If this is fps, try to put milliseconds instead, thanks :slight_smile:

Sorry, the above table was FPS. Here are the values in milliseconds (1000/FPS, which I think is what you wanted)


   Tile Updates                256       512     1024
--------------------------------------------------------
3) FBO + VBO of quads        0.8518    1.3986   2.4096   ms
1) FBOs + glBlitFramebuffer  4.4444    8.9286  19.2308   ms
2) PBOs                      7.8125   15.8730  30.3030   ms
4) glTexSubImage2d           9.2593   17.8571  34.4828   ms

What Zbuffer means is the CPU time (ms) taken by the CPU as measured by a high resolution timer rather than just using your FPS value.

That’ll probably be a few days. Family life means I’ve maybe a few hours a week to do this stuff (but I could do with having a timer in my code)

Ok, here are the timings in milliseconds. The values are ±0.01 (roughly). BTW my card is a 5850 (Asus ATI Radeon HD 5850 1024MB GDDR5), think I had a typo of 4850 earlier.


   Tile Updates               256       512     1024
--------------------------------------------------------
3) FBO + VBO of quads         0.36     0.72     1.42
1) FBOs + glBlitFramebuffer   3.71     7.50    15.05
2) PBOs                       7.60    15.14    30.40
4) glTexSubImage2d            8.04    16.10    32.27

Now, here’s an interesting number. I bought an Acer 5741 to replace my old PC that blew up. It has Intel GMA 4500M (I think). Anyway it’s rubbish, no FBO,PBO or Frameblit, but look at the timings (same code, same size since it was connected to my monitor, not the laptop screen)


   Tile Updates               256       512     1024
--------------------------------------------------------
4) glTexSubImage2d            5.60     7.70    10.42

That’s faster than the 5850, which seems ridiculous given the card almost cost as much as the laptop.