Bindless textures without NV_gpu_shader5 possible without gl_drawID?

tindlessbextures · May 17, 2018, 11:05am

Having read a bunch of threads on this, including but not limited to Stack Overflow and these boards, it looks like anyone without access to that extension will have to use gl_drawID when grabbing the handle from an array.

However, I was told by someone who has spent years on graphics engines that it can be done without NV_gpu_shader5 and without gl_drawID as long as you have OpenGL 4.5 or 4.6. However I can’t find how this would be possible to do on AMD cards without using gl_drawID. I only mention the ‘years on graphics engines’ because I figure that person has been around the block and likely would not be trying to feed me junk information or blatant lies… or so I hope…

Was I told misinformation? Or is there a way in 4.5 and 4.6 to access textures in the way that NV_gpu_shader5 lets you without requiring that extension (was something new added that I am unaware of?)

EDIT: For clarification, I do not mean with some other dynamically uniform expression, I mean something like passing the handle in with the vertex data or whatever situation that NV_gpu_shader5 allows you that you can’t without it.

Alfonse_Reinheart · May 17, 2018, 12:01pm

I do not mean with some other dynamically uniform expression, I mean something like passing the handle in with the vertex data or whatever situation that NV_gpu_shader5 allows you that you can’t without it.

Well, you can’t.

ARB_bindless_texture does not exempt you from the requirement of dynamically uniform expressions. Every texture you use in a shader must resolve to a dynamically uniform value.

There is nothing in OpenGL 4.5 or 4.6 which removes this prohibition. And I can’t think of any interaction between those versions and ARB_bindless_texture that would help in this regard.

gl_drawID works because it is dynamically uniform by fiat: ARB_shader_draw_parameters says that it is dynamically uniform, therefore it is. So you can use that to index into an array of texture handles or whatever. That does not allow you to pass a texture handle per-vertex; gl_drawID only changes when a new draw command is processed.

Perhaps you misunderstood what this expert meant when he said that you could use bindless textures in some way. Can you link to or quote what he said exactly?

For example, you can pass texture handles as vertex shader inputs, and pass them to fragment shaders. But the handles within an invocation group must be the same handle. For example, if your VS input is an instance array of handles, and you draw one instance per drawing command, and you use the base instance to select which handle to pass, then it will work. Each VS invocation gets the same handle, so each FS invocation gets the same handle. And because you’re only drawing a single instance at a time, each instance will live within its own invocation group.

So the VS input based on that instance will be dynamically uniform. And this works on any version of OpenGL 4.x.

You’re asking your question backwards. You should explain how it is you’re trying to render; then we can give you advice on how to make that work.

tindlessbextures · May 17, 2018, 1:52pm

[QUOTE=Alfonse Reinheart;1291510]Well, you can’t.

ARB_bindless_texture does not exempt you from the requirement of dynamically uniform expressions. Every texture you use in a shader must resolve to a dynamically uniform value.

There is nothing in OpenGL 4.5 or 4.6 which removes this prohibition. And I can’t think of any interaction between those versions and ARB_bindless_texture that would help in this regard.

gl_drawID works because it is dynamically uniform by fiat: ARB_shader_draw_parameters says that it is dynamically uniform, therefore it is. So you can use that to index into an array of texture handles or whatever. That does not allow you to pass a texture handle per-vertex; gl_drawID only changes when a new draw command is processed.

Now, you can pass texture handles as vertex shader inputs, and pass them to fragment shaders. But the handles within an invocation group must be the same handle. For example, if your VS input is an instance array of handles, and you draw one instance per drawing command, and you use the base instance to select which handle to pass, then it will work. Each VS invocation gets the same handle, so each FS invocation gets the same handle. And because you’re only drawing a single instance at a time, each instance will live within its own invocation group.

So the VS input based on that instance will be dynamically uniform.

You’re asking your question backwards. You should explain how it is you’re trying to render; then we can give you advice on how to make that work.[/QUOTE]

For what its worth, I have bindless textures working and everything is good but it only works with the nvidia extension. This was my first journey into using bindless textures so I made it work with NV_gpu_shader5 as it was the easiest path for my codebase (and seeing it work was also nice).

My problem:

I want to support GPUs outside of nvidia since not all of my users have nvidia cards, so I figure I’d need to do gl_drawID stuff as that should(?) work on hopefully [almost] all modern GPUs that stay up to date. I don’t know if (for example) AMD cards may support NV_gpu_shader5, I don’t have any cards from them so I can’t find out at the moment, but I assume relying on modern cards to have an nvidia extension is a bad idea.

The reason I asked is because this person who has more experience in this area than myself (who I hope is not trying to feed me misinformation) told me that I don’t need to change any of my code around for doing gl_drawID because it can be done “how I’m currently doing it” with OpenGL 4.5+ somehow and without NV_gpu_shader5. This person did not explain how I was to get around the dynamically uniform limitation without the nvidia extension however. I thought it was not possible based on my research… in essence had the same thoughts as you, but since I’m not as experienced I came to get the confirmation that you provided. My goal of my post is to see if I was told false information to avoid me spending more time digging around for something that does not exist.

What I’m trying to do:

In short, I have a single VBO with 1.2 million triangles and a bunch of texture indices. The data rarely changes, but sometimes a few textures may change over the life of the objects. I could probably even eat synchronization on updating the VBO (or SSBO) because maybe 10 of the triangles might change once every few minutes and I have 30ms to make the update (right now bindless texturing + 1.2 mil triangles gets me 1500+ FPS when indexing some SSBO for the handles).

Some side questions that arise:

1) I don’t know how costly it is for each frame to swap between two UBOs (as in use UBO 0, then use UBO 1, then the draw call ends… never needing to use UBO 0 ever again), or if it’s better to just update some uniform value and do a if/switch on it or update some uniform between draw calls and branch onto a second UBO.
I was considering UBO’s because my texture handles likely will all fit into a UBO and at first pessimistically assumed they wouldn’t and thus chose one SSBO. I also discovered recently that I’d likely not need more than 64kb (very likely never more than 100-110kb), so the question now is if switching to another UBO is more costly.

2) I was worried that possible incoherent access that may occur due to texture changing might be bad for UBO’s, as I read they are best for sequential access. I can make the original batch of unchanging lines all sequential and index into some UBO with no trouble… but I don’t know how much performance that would get me. I’ve read that AMD cards like SSBO’s whereas nvidia likes sequential UBO access.

3) I am not sure of if there’s any significant overhead from a glMultiDrawArraysIndirect() call instead of one single glDrawArrays() call, since I’d have to move to that to exploit gl_drawID.

The answer to both is obviously profile it, however it would be nice if someone could say for example “It’s very likely that doing option X will be ideal” so I have the highest chance of doing it the right way from the beginning.

For what its worth, I have been debating whether I should just go to Vulkan at this point and spend a few weeks playing around and understanding the API, since it seems like the people I’m targeting with OpenGL 4.4+ and bindless textures and supporting extensions may almost all have Vulkan support (as my plan is to move over at some point in the future anyways).

Alfonse_Reinheart · May 17, 2018, 5:38pm

this person who has more experience in this area than myself (who I hope is not trying to feed me misinformation) told me that I don’t need to change any of my code around for doing gl_drawID because it can be done “how I’m currently doing it” with OpenGL 4.5+ somehow and without NV_gpu_shader5.

Was this statement made based on evaluating your actual code? That is, did this person see “how I’m currently doing it”, or did you merely explain it to them?

Ultimately, it’s difficult to evaluate someone’s statement, when it’s based on a specific piece of code I don’t have.

In short, I have a single VBO with 1.2 million triangles and a bunch of texture indices. The data rarely changes, but sometimes a few textures may change over the life of the objects. I could probably even eat synchronization on updating the VBO (or SSBO) because maybe 10 of the triangles might change once every few minutes and I have 30ms to make the update (right now bindless texturing + 1.2 mil triangles gets me 1500+ FPS when indexing some SSBO for the handles).

This is not answering the important questions I have about how you’re rendering here. Are these “texture indices” part of your vertex data? Does each actual vertex have its own index, or is it some kind of instanced value? How do you provide these indices to the VS, and how are they transmitted to the FS?

Are you rendering multiple objects in a single draw command? Do vertices in a single draw command get different texture index values? If so, how does this get determined?

Provide details about how you’re doing this stuff. Show us your shaders, your VAO setup, some vertex data, your rendering commands, and so forth. Because I suspect that either 1) your indices really are dynamically uniform and you don’t realize it, or 2) your indices can be made dynamically uniform quite cheaply.

For what its worth, I have been debating whether I should just go to Vulkan at this point and spend a few weeks playing around and understanding the API, since it seems like the people I’m targeting with OpenGL 4.4+ and bindless textures and supporting extensions may almost all have Vulkan support (as my plan is to move over at some point in the future anyways).

Do you think Vulkan doesn’t have this limitation too?

SPIR-V/Vulkan has the SampledImageArrayDynamicIndexing/shaderSampledImageArrayDynamicIndexing feature, which permits accessing an array of sampled images (AKA: textures). Without this feature, you can only access image arrays with constant indices; dynamically uniform values is an improvement.

There is no SPIR-V/Vulkan feature for using arbitrary indexes.

This limitation isn’t there because OpenGL is wrong. It’s there because that’s what (most of) the hardware requires.

I was worried that possible incoherent access that may occur due to texture changing might be bad for UBO’s

Unless you’re doing image load/store/SSBO writing or misusing persistent/unsynchronized mapped buffers, there is no “incoherent access” of this nature.

I am not sure of if there’s any significant overhead from a glMultiDrawArraysIndirect() call instead of one single glDrawArrays() call

It’s one function call; its overhead will be negligible relative to your overall code. It’d be another matter if you were calling it once per object.

It is good to remember that OpenGL isn’t D3D. Draw calls are cheap in OpenGL; state changes are what are expensive. Two consecutive draw calls will be pretty much as fast as one combined draw call. Now, that doesn’t mean you should make a bunch of consecutive draw calls that you could have made all at once. But it does mean that if you have to break something up into multiple draws, if there are no state changes between them, you should not consider this a significant performance issue (in general).

Even with state changes, the cost depends on exactly what kind of state changes you’re doing. Texture changes are expensive, UBO changes slightly less so. VAO buffer bindings are fairly cheap, but VAO format changes are more expensive than textures on some hardware. FBO changes are the kiss-of-death for performance. And so forth.

Basically, if you’ve managed to reduce your entire scene down to a single glDrawArrays call, you’ve probably reduced it a bit too much.