PBO+mipmap DXT, reusing texture ID handler

zipponwindproof · December 7, 2018, 5:37am

Hello,
do I need to spawn PBO for each mipmap?
Or can I load whole compressed DXT texture with mipmaps to one PBO? Normally I bind PBO and use glCompressedTexImage2D with nullptr, but how to offset it at PBO to loop all mipmaps to set glCompressedTexImage2D for corresponding mip level?
What is GL_TEXTURE_BASE_LEVEL and GL_TEXTURE_MAX_LEVEL?

Side question:
Can I reuse textureID from glGenTextures, while keeping it’s handler but removing texture content? Or every time after I unload texture I have to glDeleteTextures and thus I can not reuse ID? Or what is the idea behind preallocating textures using glCompressedTexImage2D and then loading it with glCompressedTexSubImage2D? How do I gain performance here?

Sorry I am a bit in a hurry before weekend, maybe asked too many chaotical questions. But I am a bit confused how glCompressedTexImage2D+glCompressedTexSubImage2D give me any performance boost, if every time I load textures if model is in a range and unload if it is out of range, so i still use both functions next to each other instead of preallocating it somehow.

edit: Or maybe I need to use one huge PBO and map multiple areas via glMapBufferRange?

Osbios · December 7, 2018, 1:04pm

If you bind an buffer as PBO, then the data pointer of glCompressedTexImage2D becomes an offset into that buffer. So you could loop over a single buffer with glCompressedTexImage2D for each mipmap with the appropriated size/offset.

Now to the ID stuff… Rusty OpenGL lets you define each texture mipmap separately, and also is all bitchy if you tried to use it with incorrect sample parameters. (e.g. often the default sample parameters)
You also can freely change each mipmap later one, again with lots of chances to make errors and create something OpenGL would bitch about.

Then came GL_ARB_texture_storage (And GL_ARB_texture_storage_multisample)
Here you only define the texture memory layout once and the mipmap count is a simple parameter. For driver simplifications (and performance) you can not change the layout afterwards. You have to deleted the ID, and create a new texture for that.

My personal recommendation is to always work with this limitations and in the background use GL_ARB_texture_storage/GL_ARB_texture_storage_multisample if available.

About mapping. Don’t use that except if you wan’t to use persistently mapped buffers! (Also part of GL_ARB_texture_storage)

The old style mapping/unmapping aged terrible and will give you very bad performance issues!

Dark_Photon · December 7, 2018, 5:21pm

Osbios gave you some good info, so I’ll just add to that.

No, you don’t need PBOs per MIPmap or per texture. Nothing so wasteful as that.

In fact you can have one big buffer object that you use as a transfer buffer between the CPU and the GPU for all data (e.g. any/all of vertex attribute blocks, index list blocks, texel data blocks, etcetc.), and just fill it from front-to-back in classic ring buffer fashion using Buffer Object Streaming techniques (if you haven’t already, you should definitely read this wiki page, twice, and ask questions about anything that doesn’t make sense). Then bind that buffer to whatever OpenGL bind target that you need it on, and latch the data into GL by providing the offset in the buffer object (or the GPU address, if you’re using NVidia bindless buffers) to the relevent API call (e.g. glCompressedTexSubImage2D, glVertexAttribPointer, glDrawElements, etc.)

On that wiki page, pay closest attention to PERSISTENT/COHERENT mapped buffers, and to UNSYNCHRONIZED buffer maps Those are your two best options for avoiding internal driver synchronization. Use one or the other; not both.

What is GL_TEXTURE_BASE_LEVEL and GL_TEXTURE_MAX_LEVEL?

These can be useful for limiting what subrange of a texture’s MIPmaps that you allow OpenGL to read from. One use for this is when you’re dynamically uploading content to textures at render time. If you only have M out of N MIPmaps fully populated with data, you can use BASE/MAX to constrain OpenGL’s texture sampling so that it stays within the M MIPmap levels you’ve populated so far.

Can I reuse textureID from glGenTextures, while keeping it’s handler but removing texture content?
Or every time after I unload texture I have to glDeleteTextures and thus I can not reuse ID?

+1 for Osbios’ suggestion to just use glTexStorage*. Also, creating and deleting texture storage is expensive. Generally you want to want to create texture handles and allocate texture storage for them once up-front. Then re-use the pre-allocated storage, just changing the content.

Or what is the idea behind preallocating textures using glCompressedTexImage2D and then loading it with glCompressedTexSubImage2D? How do I gain performance here?

glTexImage2D = Allocate MIPmap storage + uploads texels (if ptr != NULL)
glTexSubImage2D = Upload texels.

Again, just skip glTexImage2D and use glTexStorage2D to allocate the texel storage instead. Then upload your texel data into that storage with the glTexSubImage2D calls.

zipponwindproof · December 10, 2018, 12:00pm

If you bind an buffer as PBO, then the data pointer of glCompressedTexImage2D becomes an offset into that buffer. So you could loop over a single buffer with glCompressedTexImage2D for each mipmap with the appropriated size/offset.

Thanks, now it is working!

Here you only define the texture memory layout once and the mipmap count is a simple parameter. For driver simplifications (and performance) you can not change the layout afterwards. You have to deleted the ID, and create a new texture for that.

My personal recommendation is to always work with this limitations and in the background use GL_ARB_texture_storage/GL_ARB_texture_storage_multisample if available.

+1 for Osbios’ suggestion to just use glTexStorage*. Also, creating and deleting texture storage is expensive. Generally you want to want to create texture handles and allocate texture storage for them once up-front. Then re-use the pre-allocated storage, just changing the content.

Now I am loading textures and unloading, so I should preallocate lets say 1GB of vram (depending on my game scenes?) using GL_ARB_texture_storage and then subload texture there? And when I unload it, I dont use glDeleteTextures at all and still keep it in memory, but mark as ‘unloaded’ and reuse that memory for other texture? What should I do when I exceed 1GB of ‘preloaded’ memory and all textures are occupied, so no free buffer? Should I approximate my scene somehow and force some distance-check of entities to unload textures to always have free slots for new models (textures) to be loaded?

About mapping. Don’t use that except if you wan’t to use persistently mapped buffers! (Also part of GL_ARB_texture_storage)

The old style mapping/unmapping aged terrible and will give you very bad performance issues!

On that wiki page, pay closest attention to PERSISTENT/COHERENT mapped buffers, and to UNSYNCHRONIZED buffer maps Those are your two best options for avoiding internal driver synchronization. Use one or the other; not both.

You mean this I should not use?

glMapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY);

I see persistent mapping is Core since version 4.4. I would like to go with 4.2 max.

These can be useful for limiting what subrange of a texture’s MIPmaps that you allow OpenGL to read from. One use for this is when you’re dynamically uploading content to textures at render time. If you only have M out of N MIPmaps fully populated with data, you can use BASE/MAX to constrain OpenGL’s texture sampling so that it stays within the M MIPmap levels you’ve populated so far.

Thanks for clarification. Now it makes sense.

Alfonse_Reinheart · December 10, 2018, 12:27pm

I would like to go with 4.2 max.

Um, why? I can understand wanting to stick with GL 4.1, as that gives you compatibility with MacOS’s now-deprecated support for OpenGL. But is there hardware that is stuck on GL 4.2, which won’t get any driver/implementation updates to higher versions?

zipponwindproof · December 10, 2018, 12:51pm

No real logical reason, just I started to dev on laptop with ancient HD4000 (4.2 max in core profile), so I have here tesselation and basically everything I need, except compute shaders, which I still dont plan to use. Sure, my project wont run on such ancient integrated gpu in the end, but for some reason I like it is many years supported.
What would you recommend me? I plan to target linux+windows only, not macos. Isnt 4.4 supported by much less devices? (I just checked https://developer.nvidia.com/opengl-driver and seems even old geforce 400 from 2010 supports ogl 4.6 hmm). I plan to dev game 2-3 yrs, so time will change, but dunno. Maybe I am senselessly defending 4.2.

So should I target 4.4+?

Dark_Photon · December 11, 2018, 5:18am

[QUOTE=zipponwindproof;1293165]
Now I am loading textures and unloading, so I should preallocate lets say 1GB of vram (depending on my game scenes?) using GL_ARB_texture_storage and then subload texture there? [/QUOTE]

It’s up to you. Your game could carve out a fixed amount of GPU RAM for textures in some way, or it could have that amount be dynamic based on how much GPU memory is available on the GPU.

And when I unload it, I dont use glDeleteTextures at all and still keep it in memory, but mark as ‘unloaded’ and reuse that memory for other texture?

Exactly.

What should I do when I exceed 1GB of ‘preloaded’ memory and all textures are occupied, so no free buffer?
Should I approximate my scene somehow and force some distance-check of entities to unload textures to always have free slots for new models (textures) to be loaded?

That’s up to you. However, a tip (that may be obvious). Standardizing your textures’ internal formats and resolutions to a small set will help avoid situations like this where you’ve got free textures allocated, but they’re not the format you need.

You mean this I should not use?
glMapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY);

Right.

Think about this from the driver’s perspective. The CPU (your app + the driver) have queued up lots of GL commands ahead of where the GPU is executing now. Suppose that that buffer object that you’re mapping on the CPU “now” is going to be read in the future by the GPU once some previously queued commands get to the head of the queue. When that happens, on the GPU side, the buffer object must have the contents that it had “before” the MapBuffer and modify that you’re now doing on the CPU. So what’s the driver to do?

Some drivers just block in your call to MapBuffer in this case, waiting for already-queued references to that buffer object to clear the GPU before the CPU is allowed to change the buffer object. That means your GL thread running on the CPU can’t perform other work until it unblocks, meaning that your GL thread doesn’t have as much frame time to submit everything for a frame.

Proper use of PERSISTENT/COHERENT (or UNSYNCHRONIZED) buffer maps will avoid this internal driver synchronization, giving you more time to submit content to the GPU.

Related: See this chapter from OpenGL Insights: Asynchronous Buffer Transfers (OpenGL Insights). As a teaser for what’s inside, see the method timing table at the end. This book was published before PERSISTENT/COHERENT buffer maps were available, but all the other usual buffer update methods are included (what you’re doing is effectively the “Write” line, though with glMapBufferRange instead). Keep in mind that through that this reflects GL driver performance 6 years ago. So while the numbers would look different for drivers today, the concepts and underlying issues are the same.

I see persistent mapping is Core since version 4.4. I would like to go with 4.2 max.

Functionality is often available as extensions before it hits the core OpenGL spec. If 4.4 is too high a bar for some reason, you could access this through ARB_buffer_storage instead.

Here’s are driver reports which have support for that extension: GL_ARB_buffer_storage reports (gpuinfo.org)

Osbios · December 11, 2018, 5:54am

UNSYNCHRONIZED is also syncing the application thread to the driver thread because it returns a pointer and without flushing and waiting for the driver thread to digest all pending commands this pointer is an unknown. E.g. one could have issued some commands that change the buffer allocation. So unsynchronized has the exact aging issues like all other mappings, and should not be used. The only options are persistent mapping via GL_ARB_texture_storage (Core since 4.4), pinned memory GL_AMD_pinned_memory (But I’m not even sure if there is anything supporting this without without supporting GL_ARB_texture_storage) or just plain old glBufferSubData/glTexSubImage2D. I would recommend to only use glBufferSubData/glTexSubImage2D first. It is very easy to use, the driver cares about all object synchronizations and the main performance penalty is a duplicated client side copy of the involved data. If you later decide streaming has performance issues from system memory bandwidth you still can implement an optional path for persistent memory buffers that you fill directly with e.g. your texture data to save the duplicated copying.

If you stream and performance is an issue for you, then yes reuse already allocated memory to not have to allocate on the fly to much. But that is something completely independent of GL_ARB_texture_storage.

Dark_Photon · December 12, 2018, 4:26am

No. If you can explicitly target a single-threaded driver, then UNSYNCHRONIZED is just fine, and in my experience performs better than older methods when used engine-wide.

That said, PERSISTENT/COHERENT (where available) should be preferred to UNSYNCHRONIZED because it works well with both single and multi-threaded driver configs.

Bottom line zipponwindproof: Performance ultimately depends on the driver, so do testing with your use case on the driver(s) you support. Try a few methods, and pick the one that performs best for you.

zipponwindproof · December 13, 2018, 10:23am

Thanks for valuable information. Gotta read few more times Buffer Object Streaming. I will go for 4.4+ with persistent mapping.
If this will be fast enough, then I will try to throw classic mipmap and create my own mipmap system to be able load only one mipmap level based on distance to the model, thus reducing vram usage. Yes this will be using PBO intensively, but it should be ‘ok’ and smooth I guess. At least will give it a try. Dunno if anyone ever used such method or had a use case for it, but I plan to make hi-res textures (4096*4096) and yet allow far distance rendering, so distant objects could be loaded for example at 16x16 texture only, instead of whole huge texture+mipmapped.

zipponwindproof · December 19, 2018, 8:14am

Update: even though HD4000 supports max opengl 4.2 and gpuinfo says HD4000 supports GL_ARB_texture_storage only on windows, not on linux, it actually works here and I’ve confirmed it by dumping all supported extensions. Mesa 18.2.6. Great, so now I can dev while using persistent mapping even on 4.2

Alfonse_Reinheart · December 19, 2018, 9:07am

Persistent mapping is a feature of ARB_buffer_storage, not texture_storage.

That being said, according to the OpenGL hardware database, many IvyBridge implementations provide ARB_buffer_storage too.

zipponwindproof · December 19, 2018, 3:48pm

Oh true. I checked it few times and then wrote it incorrectly. Both are present, and yes ARB_buffer_storage is present on my HD4000, but at gpuinfo it is not present on linux, only windows version, that was my point.