I take it binding a lot of small textures is bad...

ToolChest · March 5, 2003, 3:53pm

First off I have a GF2 with det 43.00 drivers (also tried 41.09)…

My app has a lot (10K) of very small textures (4x4 – 8x8). When my app was using more memory loading the textures than the textures used on disk I tore my app apart looking for a memory leak. What I found was that glBindTexture will use about 4k of memory if the texture object is new. For instance:

for(i=1; i<=32000; i++)
glBindTexture(GL_TEXTURE_2D, i);

Will generate around 131M of memory.

I guess that it makes sense that the drivers would need to alloc some memory when creating a new texture even if it is still undefined, but this much makes using any small textures pointless. Has anyone else tried to do this?

[This message has been edited by john_at_kbs_is (edited 03-05-2003).]

Korval · March 5, 2003, 4:06pm

Why do you need to make a lot of small textures, when you can make fewer, much larger, textures, and just use texture coordinates to pull the data out? Not only will that work better, it’ll be faster even if your card had the memory for the small texture version.

ToolChest · March 5, 2003, 4:25pm

I tried that once before, but the texture filtering was causing pixels from bordering texture areas to bleed into the current texture area. After that I tried adding a pixel border around the texture areas and adjusting the texcoords, but then I got some funky results with mipmaping. How would you recommend using a few large textures without the bleed?

Thanks…

John.

JONSKI · March 5, 2003, 7:28pm

May I ask what you are doing with these 10k of tiny textures?

Are you cutting up JPEG’s or something?

Ysaneya · March 5, 2003, 11:19pm

Sounds like he’s doing lightmapping. If so, don’t forget that you don’t have to using mipmapping, a standard linear filter on your lightmaps will give satisfying results…

Y.

mattc · March 5, 2003, 11:33pm

there is one thing that many-textures-in-a-single-image approach won’t solve: what if you want to repeat a sub-texture? you have to use more polys or use sub-textures on only one axis so you get at least 1d repeat…

system · March 6, 2003, 4:02am

Originally posted by john_at_kbs_is:
I tried that once before, but the texture filtering was causing pixels from bordering texture areas to bleed into the current texture area. After that I tried adding a pixel border around the texture areas and adjusting the texcoords, but then I got some funky results with mipmaping. How would you recommend using a few large textures without the bleed?
John.

Don’t put the other immediatly next to other obviously. Just duplicate the border textures of each texture and all that’s left is texture mapping.

You might want to make a small program that does this automatic.

fritzlang · March 6, 2003, 6:26am

Originally posted by Ysaneya:
[b]Sounds like he’s doing lightmapping. If so, don’t forget that you don’t have to using mipmapping, a standard linear filter on your lightmaps will give satisfying results…

Y.[/b]

I disagree, you still get flicker when far away from the polygon.

Cheers.

ToolChest · March 6, 2003, 6:26am

Ysaneya,

Yes, I am light mapping per light with no shadows, the shadows I’m going to do dynamically so the maps are very small in most instances. I still like light maps compared with the newer per-pixel lighting methods, the light map only chews up one texture unit and on a GF2 with slow 3d texture support that can be important. Your right I did get better results with the mipmapping off, however this will cause a performance drop, right (something to do with the hardware loading parts of the texture for filtering??)? Actually the area I will be working on will be so small it shouldn’t make a difference.

V-man,

Sounds like a good idea.

Thanks guys, I’ll try it out…

John.

Humus · March 6, 2003, 2:33pm

Originally posted by fritzlang:
I disagree, you still get flicker when far away from the polygon.

Realistically, a lightmap at 4x4 or 8x8 will hardly cause aliasing, not even on distance. They don’t contain (in general) a whole lot of high frequency components. And at the distance a polygon will be at (in most kind of lightmapping apps) to ever need minification is probably already beyond the far clipping plane.

Humus · March 6, 2003, 2:36pm

Originally posted by john_at_kbs_is:
Your right I did get better results with the mipmapping off, however this will cause a performance drop, right (something to do with the hardware loading parts of the texture for filtering??)?

At those sizes you’ll not see any difference at all.

imported_titan · March 6, 2003, 5:57pm

If you ask for contigous memory (D3D) that won’t be paged out for you to store you data in (e.g. store your texture in AGP memory) you get your memory back on 4k alligned bounderies.

My guess is that you are seeing this; your driver memory allocations must be 4k alligned.

Could very well be different on an SGI, GameCube, or software renderer, but I think you’re stuck with this wasted memory in the name of better performance.

ToolChest · March 7, 2003, 4:26am

Humus,

That’s what I was hoping to hear, thanks for the reply.

Originally posted by titan:
[b]If you ask for contigous memory (D3D) that won’t be paged out for you to store you data in (e.g. store your texture in AGP memory) you get your memory back on 4k alligned bounderies.

My guess is that you are seeing this; your driver memory allocations must be 4k alligned.

Could very well be different on an SGI, GameCube, or software renderer, but I think you’re stuck with this wasted memory in the name of better performance.[/b]

At least I’m not crazy (at least not completely), I can live with the problem. Some of the preliminary research I’ve done suggests that although I will waste an extra texture unit using per-pixel lighting, I may still be able to implement my lighting equation in the same number of passes as with the per-light light mapping. That will save a lot of texture binding.

zeckensack · March 7, 2003, 6:59am

john,
does the memory (by any chance …) come back after TexImage-ing it?

I can imagine the driver will have to make a good guess for a fresh texture object, so there’s your 4k. But it should be able to release that extra memory when the texture is fully specified. No idea if, how and when this happens, but technically it should be possible.

ToolChest · March 7, 2003, 9:34am

I’ll take another look tonight, but the memory usage was significantly higher than the total size of all of the textures, even after they were loaded.

imported_jwatte · March 7, 2003, 2:41pm

MIP mapping is a speed increase when it lets the rasterizer only read a 64x64 image even though your max texture size is 512x512. However, a 4x4 or 8x8 texture is likely to fit in the cache of the rasterizer, so lack of MIP mapping probably won’t impact speed at all. In fact, if you use LINEAR instead of LINEAR_MIPMAP_LINEAR, you may get less memory traffic, as trilinear filtering requires two separate MIP level reads.

ToolChest · March 7, 2003, 3:57pm

Just tested the calling TexImage after the bind, same thing… oh, well…

Originally posted by jwatte:
MIP mapping is a speed increase when it lets the rasterizer only read a 64x64 image even though your max texture size is 512x512. However, a 4x4 or 8x8 texture is likely to fit in the cache of the rasterizer, so lack of MIP mapping probably won’t impact speed at all. In fact, if you use LINEAR instead of LINEAR_MIPMAP_LINEAR, you may get less memory traffic, as trilinear filtering requires two separate MIP level reads.

but what if the 8x8 area is embedded in a 512x512 texture with lots of other light maps. The whole thing would still get loaded, I mean obviously I would try to keep all of the texture that would be likely to be rendered together grouped together, but worst case scenario could be crappy. the issue is minimizing the texture size pulled across the agp bus, right? not a cache line type thing like I originally thought.

imported_titan · March 8, 2003, 7:05am

Originally posted by john_at_kbs_is:
what if the 8x8 area is embedded in a 512x512 texture with lots of other light maps. The whole thing would still get loaded, I mean obviously I would try to keep all of the texture that would be likely to be rendered together grouped together, but worst case scenario could be crappy. the issue is minimizing the texture size pulled across the agp bus, right? not a cache line type thing like I originally thought.

Actually some cards, like those made by 3DLabs over the last four or five years, break your texture into chunks that work like virtual memory. If you have a 2048x2048 32RGBA texture and you can see only one texel of it rather than pull the whole thing across like the more primitive chips do (16meg transfer for a single texel!) they pull a 4k chunk of it over.

So basically don’t worry, some cards will do the right thing automatically.

And remember when allocating video memory (such as textures) you allocate on 4k bounderies. If you only use 256bytes of the 4096 you get then you’re wasting a lot of memory as you figured out. Might as well be using 32x32x32bit textures instead of 8x8x32bit.

Either that or combine them into a single giant texture, which would be best as it would eliminate the texture state changes. nVidia recommends this actually if you read their optimization doc.

zeckensack · March 8, 2003, 7:25am

Originally posted by titan:
And remember when allocating video memory (such as textures) you allocate on 4k bounderies.
Not necessarily. That’s clearly ‘implementation dependant’ behaviour and it would seem like a huge waste to me if it were indeed 4k boundaries.

The early Voodoos for example required allocations on 8 byte boundaries. I’m not as familiar with current architectures in that respect, but I don’t see any good reason yet to go above 512 bits (=64 bytes).

imported_jwatte · March 8, 2003, 10:04am

If you have your light map be a 512x512 and that contains ALL the light maps for a level, then that will just stay on the card, period. The rasterizer is still happy, because it’s likely to tile the texture in little sub-chunks (on the order of 8x8 in size). Thus, if you make sure to align each little light map snippet on an 8x8 boundary, you’re likely to get good throughput per primitive being rasterized, as each triangle will have locality of reference.