Quality of Implementation: Debug shader variable locations

Now that OpenGL’s debug context concept is a bit more well-defined (thanks to language in KHR_debug and the like), let’s expand on this. NVIDIA already starts spamming stdout when you’re in debug, so let’s close another source of errors:

Accidentally using the correct variable locations.

Unless you explicit assign a variable location in OpenGL, you must query its location to know where it is. However, most implementations will assign locations sequentially, starting from 0. Which means that code can be written which accidentally does the right thing.

I would suggest that, upon every shader compilation (in debug only), generating a random number and starting the location assignments from there. This will help show where a program is accidentally forgetting an explicit assignment or a query.

A very good idea

Maybe the idea is good, but I cannot approve it. Uniform locations are “slots”, hence the numeration. What are the benefits of assigning number, for example 4387 instead of 0, just to cast the exception? Further more, if that number changes on every compilation, it wouldn’t alleviate catching errors, since behavior will change from compilation to compilation. Plus, there should be addition burden to “translate” exposed numbers to real slots. Sorry, maybe I foreseen something obvious, but I don’t understand this proposal. :frowning:

Uniform locations are “slots”, hence the numeration. […] Plus, there should be addition burden to “translate” exposed numbers to real slots.

I don’t know what you mean by “slot”. Uniform locations are arbitrary numbers. They have no meaning to the hardware. They’re not byte offsets into a buffer, or even 4D vector register numbers. If the latter were true, then a 4x4 matrix would have to take up 4 uniform locations, and they don’t.

That’s why uniform locations can be assigned arbitrarily by the user. If they were some kind of hardware byte offset into a buffer or something, the ARB wouldn’t give the user the ability to assign them arbitrary numbers.

So this “burden to ‘translate’ exposed numbers to real slots” already exists (if by “real slots” you mean “byte offsets”). I’m simply wanting the burden to be used for something useful.

Further more, if that number changes on every compilation, it wouldn’t alleviate catching errors, since behavior will change from compilation to compilation.

Let’s say that I forget to get the location of a uniform. Or maybe I have some convention for my uniform locations thanks to ARB_explicit_uniform_location, but I screwed up and forgot to assign that location in one shader. Either way, in my C++ code, I have some code that assumes that location 0 represents some variable.

Given an arbitrary assignment scheme for uniform locations (typically, based on the order defined in shaders), it is entirely possible that 0 may indeed represent that uniform. Thus, my code appears to work. And since the assignment scheme doesn’t “change from compilation to compilation”, I will never know about it.

At least, until I take my code to a different implementation that uses a different assignment scheme. Then it breaks. That may have been weeks or months since I wrote the original code that’s now broken. Tracking down the variable I forgot to initialize or the shader I forgot to set up properly will be a lot harder.

If the implementation starts assigning uniform locations from a random number, it is highly unlikely that any particular compilation of a shader will just so happen to pick 0 as the start. Therefore, it is very likely that my code will break the first time I try to use it. And if it doesn’t, it will break the second time I try to use it.

Thus it catches errors sooner rather than later.

it’s a good idea, although i think it would be better, if that behavior will occur only if debug context is active. cause i’m not sure about possible outcomes of those changes, especially for already existing applications. so if you are aware of it, you can use it. but those developers who unaware of it and existing software won’t be affected.

I think it’s an awesome idea, but instead of starting at a random location and going sequentially from there I’d suggest that each uniform location be completely random, if possible. And not just in debug contexts; make it standard behaviour in all contexts.

This is in the spirit of “fail early, and fail as noisily as possible” and it’s a great idea.

it’s a good idea, although i think it would be better, if that behavior will occur only if debug context is active.

cough : “in debug only”

I think it’s an awesome idea, but instead of starting at a random location and going sequentially from there I’d suggest that each uniform location be completely random, if possible.

Why? You’re not going to catch more errors that way; it just makes the coding of the feature more difficult. Especially since arrays (of basic types) have to be contiguous in their uniform locations.

And not just in debug contexts

Um, why not? Whatever method of randomization is used will necessarily be slower than not using that method. The whole point of having debug vs. release contexts is for implementations to recognize that they can do things different for debugging sake that they wouldn’t do in release builds.

Totally agree. This has bugged me for years.

Interestingly, I have never had such problem. Maybe because I’m using object-wrapper which fetches uniform location in the initialization function, and all access is restricted through interface functions.

I ran into this numerous times, albeit during some rapid prototyping which resulted in code which should never make anywhere else - and it worked because I knew the particular implementation will handle it this way. It’s still wrong though. We also had questions here asking whether uniform locations always started at 0 so other people actually wrote code and asked afterwards if their code could possibly be incorrect or at least unsafe.

This doesn’t make sense. Uniform locations are assigned at link time, so this is a link-time-only issue. There are no performance implications at run-time from it. The only reason I can see for not doing it in release contexts is to crutch up legacy programs that didn’t do the right thing, and that doesn’t seem much of a valid reason. Have you any stats for such programs? How many of them are actually out there in the real world?

This doesn’t make sense. Uniform locations are assigned at link time, so this is a link-time-only issue. There are no performance implications at run-time from it.

OK, for the moment, let’s ignore the possibility of explicit uniform location and assume that the compiler has carte blanc to assign uniform locations as it sees fit.

The compiler has to transform a location from a number to a byte offset in order to actually do anything useful with it (remember: uniform indices are how you query information; uniform locations are only used to set the state). Therefore, the linked program must have a table that maps from location to a byte offset.

The fastest way to implement this table is with an array, where the locations are simply the indices into that array, and the values are the byte offsets. Now, it’s slightly more complex than that, since a location can represent multiple byte offsets (the same uniform in different shader stages), but that’s the general idea.

If you now randomly assign a location, such that the entire 31-bit space of positive numbers could be any active uniform location, you can’t use an array. You must now have hash table. And while hash tables are quite fast, they’re not as fast as an actual array. At the very least, you must apply your hash function. Even if your “hash function” is nothing more than “subtract by a global integer”, that will be slower than not subtracting by a global integer.

There is no way to implement random locations that is slower than the fastest possible implementation of purely arbitrary locations.

Now, you could randomly select the location within the bounds of the array of possible locations. But that won’t break things properly. You don’t want the user to be setting the wrong uniform; you want them to get no uniform at all. Because that’s a proper OpenGL error which you will report and they will catch. Them merely setting the wrong uniform may cause weirdness, but it won’t be clear exactly where it’s coming from. With a hard GL Error, you can trace it to the exact line where the problem happened.

Will the performance difference be significant? Probably not. But I see no reason to take the chance.

I’m glad you are answering your own questions. :slight_smile:

When I said slots, I meant entries in an array (or table).
Remember, also, that hash tables have to be larger than the maximum number of data that can be stored in it.
As we all know, GL_MAX_*_UNIFORM_COMPONENTS is pretty hight. Even for the modest GPUs GL_MAX_VERTEX_UNIFORM_COMPONENTS is 4096. An open addressing hash table requires at least 20% more space. Hence, you need 5K-entry hash table for the translation.
Oh, I’ve forgotten to ask, should each program has its own table?

So, the implementation would be slower, more complicated, has greater memory footprint, and all that for what… Just to prevent someone to misuse way how uniform-ids are automatically generated.

I still think this is not a valid proposal, but it’s only my humble opinion.

Just to prevent someone to misuse way how uniform-ids are automatically generated.

Yes, that is what we’re talking about: using debug contexts to help find bugs in your code.

Welcome to the conversation; glad you could make it.

Do you think all that stdout spam that NVIDIA does in debug contexts is cheap? Do you think that ARB/KHR_debug is inexpensive to implement? So there’s plenty of precedent for driver developers to implement different, performance-hampering stuff in debug contexts than in non-debug.

The entire point of debug contexts is for the user to tell the implementation, “stop caring so much about performance and help me find bugs.” That’s why ARB_debug won’t exist without debug contexts. That’s why the spec allows KHR_debug to basically stop being useful in non-debug contexts. That’s why NVIDIA’s drivers stop spamming stdout with every message they generate for ARB/KHR_debug in non-debug contexts.

Debug contexts are for debugging. I kinda thought the name spoke for itself.

[QUOTE=Alfonse Reinheart;1250986]Do you think that ARB/KHR_debug is inexpensive to implement? So there’s plenty of precedent for driver developers to implement different, performance-hampering stuff in debug contexts than in non-debug.
[/QUOTE]

Your discussion led me to an interesting experiment. If you don’t mind, I would hijack your thread for the short survey on debug-context performance.
I have tried on several (versions of) drivers, and I couldn’t spot any difference in the performance between debug and non-debug contexts.

I would ask other developers to share their results in testing debug-context performance. Thanks!

testing debug-context performance

Who cares? I mean, it’s nice to know but, like Alfonse said, debug-contexts are for finding bugs and are not intended for production code.

I do. In fact, I guess most of the additional debugging code is active in both debug and non-debug profiles, and it does not affect performance (at least in a noticeable fashion). Driver instrumentation is enabled in all Vista/Win7 drivers. We will never know what exactly is going on under the hood, but such little experiments could shed some lights on it.

I didn’t want to offend, of course. Surely it’s worth knowing what you’re dealing with. Since enabling debugging capabilities has to be a run-time decision, I assume the cost boils down to conditionals which can be efficiently avoided by branch prediction.

I’d rather get some details on how drivers implement checks instead of profiling a whole series of drivers from different vendors.

In fact, I guess most of the additional debugging code is active in both debug and non-debug profiles, and it does not affect performance (at least in a noticeable fashion).

So what you’re saying is that the ARB is stupid. All OpenGL implementations that don’t expose ARB_debug_output in non-debug builds are just being annoying for no adequately explained reason. The effort the ARB went through to allow KHR_debug to be effectively shut off in non-debug builds was a waste of time, and all OpenGL implementations that take advantage of this are just doing pointless things. Indeed, the very idea to have a separation between debug and non-debug builds was stupid to begin with.

Because that’s what you’re saying. You’re saying that all of the effort spent, both in the spec and by implementations, to separate debug from non-debug leads to no actual gain. And therefore there is no point in spending all of that effort to begin with.

I have pretty much zero faith in the ARB to make good decisions, and even I don’t think that poorly of them.

BTW, when you did these tests, did you by chance actually create any errors? Would your code have had any debug output messages in a debug run? Did you emit errors several times in the rendering loop?

Because otherwise, you’re just measuring how quickly the implementation doesn’t call the expensive error reporting code.

Nope! Just said that blocking message output does not necessarily mean not collecting information. Of course it is highly appreciable to prevent message spamming. So, please, don’t try to misinterpret what I said.

No!

No!

No, again!

The purpose of the test was to check whether debug-context is slower than non-debug. Not how fast it outputs messages. If there is a problem in the execution, it would cause slow execution in any case.