PDA

View Full Version : Quality of Implementation: Debug shader variable locations



Alfonse Reinheart
05-16-2013, 08:34 PM
Now that OpenGL's debug context concept is a bit more well-defined (thanks to language in KHR_debug and the like), let's expand on this. NVIDIA already starts spamming stdout when you're in debug, so let's close another source of errors:

Accidentally using the correct variable locations.

Unless you explicit assign a variable location in OpenGL, you must query its location to know where it is. However, most implementations will assign locations sequentially, starting from 0. Which means that code can be written which accidentally does the right thing.

I would suggest that, upon every shader compilation (in debug only), generating a random number and starting the location assignments from there. This will help show where a program is accidentally forgetting an explicit assignment or a query.

tonyo_au
05-17-2013, 06:58 AM
A very good idea

Aleksandar
05-17-2013, 08:35 AM
Maybe the idea is good, but I cannot approve it. Uniform locations are "slots", hence the numeration. What are the benefits of assigning number, for example 4387 instead of 0, just to cast the exception? Further more, if that number changes on every compilation, it wouldn't alleviate catching errors, since behavior will change from compilation to compilation. Plus, there should be addition burden to "translate" exposed numbers to real slots. Sorry, maybe I foreseen something obvious, but I don't understand this proposal. :(

Alfonse Reinheart
05-17-2013, 09:17 AM
Uniform locations are "slots", hence the numeration. [...] Plus, there should be addition burden to "translate" exposed numbers to real slots.

I don't know what you mean by "slot". Uniform locations are arbitrary numbers. They have no meaning to the hardware. They're not byte offsets into a buffer, or even 4D vector register numbers. If the latter were true, then a 4x4 matrix would have to take up 4 uniform locations, and they don't.

That's why uniform locations can be assigned arbitrarily by the user. If they were some kind of hardware byte offset into a buffer or something, the ARB wouldn't give the user the ability to assign them arbitrary numbers.

So this "burden to 'translate' exposed numbers to real slots" already exists (if by "real slots" you mean "byte offsets"). I'm simply wanting the burden to be used for something useful.


Further more, if that number changes on every compilation, it wouldn't alleviate catching errors, since behavior will change from compilation to compilation.

Let's say that I forget to get the location of a uniform. Or maybe I have some convention for my uniform locations thanks to ARB_explicit_uniform_location, but I screwed up and forgot to assign that location in one shader. Either way, in my C++ code, I have some code that assumes that location 0 represents some variable.

Given an arbitrary assignment scheme for uniform locations (typically, based on the order defined in shaders), it is entirely possible that 0 may indeed represent that uniform. Thus, my code appears to work. And since the assignment scheme doesn't "change from compilation to compilation", I will never know about it.

At least, until I take my code to a different implementation that uses a different assignment scheme. Then it breaks. That may have been weeks or months since I wrote the original code that's now broken. Tracking down the variable I forgot to initialize or the shader I forgot to set up properly will be a lot harder.

If the implementation starts assigning uniform locations from a random number, it is highly unlikely that any particular compilation of a shader will just so happen to pick 0 as the start. Therefore, it is very likely that my code will break the first time I try to use it. And if it doesn't, it will break the second time I try to use it.

Thus it catches errors sooner rather than later.

Nowhere-01
05-17-2013, 10:02 AM
it's a good idea, although i think it would be better, if that behavior will occur only if debug context is active. cause i'm not sure about possible outcomes of those changes, especially for already existing applications. so if you are aware of it, you can use it. but those developers who unaware of it and existing software won't be affected.

mhagain
05-17-2013, 05:39 PM
I think it's an awesome idea, but instead of starting at a random location and going sequentially from there I'd suggest that each uniform location be completely random, if possible. And not just in debug contexts; make it standard behaviour in all contexts.

This is in the spirit of "fail early, and fail as noisily as possible" and it's a great idea.

Alfonse Reinheart
05-17-2013, 09:27 PM
it's a good idea, although i think it would be better, if that behavior will occur only if debug context is active.

*cough* : "in debug only"


I think it's an awesome idea, but instead of starting at a random location and going sequentially from there I'd suggest that each uniform location be completely random, if possible.

Why? You're not going to catch more errors that way; it just makes the coding of the feature more difficult. Especially since arrays (of basic types) have to be contiguous in their uniform locations.


And not just in debug contexts

Um, why not? Whatever method of randomization is used will necessarily be slower than not using that method. The whole point of having debug vs. release contexts is for implementations to recognize that they can do things different for debugging sake that they wouldn't do in release builds.

thokra
05-21-2013, 03:55 AM
Totally agree. This has bugged me for years.

Aleksandar
05-21-2013, 07:19 AM
Interestingly, I have never had such problem. Maybe because I'm using object-wrapper which fetches uniform location in the initialization function, and all access is restricted through interface functions.

thokra
05-21-2013, 07:36 AM
I ran into this numerous times, albeit during some rapid prototyping which resulted in code which should never make anywhere else - and it worked because I knew the particular implementation will handle it this way. It's still wrong though. We also had questions here asking whether uniform locations always started at 0 so other people actually wrote code and asked afterwards if their code could possibly be incorrect or at least unsafe.

mhagain
05-21-2013, 10:32 AM
Whatever method of randomization is used will necessarily be slower than not using that method.

This doesn't make sense. Uniform locations are assigned at link time, so this is a link-time-only issue. There are no performance implications at run-time from it. The only reason I can see for not doing it in release contexts is to crutch up legacy programs that didn't do the right thing, and that doesn't seem much of a valid reason. Have you any stats for such programs? How many of them are actually out there in the real world?

Alfonse Reinheart
05-21-2013, 11:54 AM
This doesn't make sense. Uniform locations are assigned at link time, so this is a link-time-only issue. There are no performance implications at run-time from it.

OK, for the moment, let's ignore the possibility of explicit uniform location and assume that the compiler has carte blanc to assign uniform locations as it sees fit.

The compiler has to transform a location from a number to a byte offset in order to actually do anything useful with it (remember: uniform indices are how you query information; uniform locations are only used to set the state). Therefore, the linked program must have a table that maps from location to a byte offset.

The fastest way to implement this table is with an array, where the locations are simply the indices into that array, and the values are the byte offsets. Now, it's slightly more complex than that, since a location can represent multiple byte offsets (the same uniform in different shader stages), but that's the general idea.

If you now randomly assign a location, such that the entire 31-bit space of positive numbers could be any active uniform location, you can't use an array. You must now have hash table. And while hash tables are quite fast, they're not as fast as an actual array. At the very least, you must apply your hash function. Even if your "hash function" is nothing more than "subtract by a global integer", that will be slower than not subtracting by a global integer.

There is no way to implement random locations that is slower than the fastest possible implementation of purely arbitrary locations.

Now, you could randomly select the location within the bounds of the array of possible locations. But that won't break things properly. You don't want the user to be setting the wrong uniform; you want them to get no uniform at all. Because that's a proper OpenGL error which you will report and they will catch. Them merely setting the wrong uniform may cause weirdness, but it won't be clear exactly where it's coming from. With a hard GL Error, you can trace it to the exact line where the problem happened.

Will the performance difference be significant? Probably not. But I see no reason to take the chance.

Aleksandar
05-21-2013, 02:13 PM
I'm glad you are answering your own questions. :)

When I said slots, I meant entries in an array (or table).
Remember, also, that hash tables have to be larger than the maximum number of data that can be stored in it.
As we all know, GL_MAX_*_UNIFORM_COMPONENTS is pretty hight. Even for the modest GPUs GL_MAX_VERTEX_UNIFORM_COMPONENTS is 4096. An open addressing hash table requires at least 20% more space. Hence, you need 5K-entry hash table for the translation.
Oh, I've forgotten to ask, should each program has its own table?

So, the implementation would be slower, more complicated, has greater memory footprint, and all that for what... Just to prevent someone to misuse way how uniform-ids are automatically generated.

I still think this is not a valid proposal, but it's only my humble opinion.

Alfonse Reinheart
05-21-2013, 04:17 PM
Just to prevent someone to misuse way how uniform-ids are automatically generated.

Yes, that is what we're talking about: using debug contexts to help find bugs in your code.

Welcome to the conversation; glad you could make it.

Do you think all that stdout spam that NVIDIA does in debug contexts is cheap? Do you think that ARB/KHR_debug is inexpensive to implement? So there's plenty of precedent for driver developers to implement different, performance-hampering stuff in debug contexts than in non-debug.

The entire point of debug contexts is for the user to tell the implementation, "stop caring so much about performance and help me find bugs." That's why ARB_debug won't exist without debug contexts. That's why the spec allows KHR_debug to basically stop being useful in non-debug contexts. That's why NVIDIA's drivers stop spamming stdout with every message they generate for ARB/KHR_debug in non-debug contexts.

Debug contexts are for debugging. I kinda thought the name spoke for itself.

Aleksandar
05-22-2013, 05:50 AM
Do you think that ARB/KHR_debug is inexpensive to implement? So there's plenty of precedent for driver developers to implement different, performance-hampering stuff in debug contexts than in non-debug.


Your discussion led me to an interesting experiment. If you don't mind, I would hijack your thread for the short survey on debug-context performance.
I have tried on several (versions of) drivers, and I couldn't spot any difference in the performance between debug and non-debug contexts.

I would ask other developers to share their results in testing debug-context performance. Thanks!

thokra
05-22-2013, 06:09 AM
testing debug-context performance

Who cares? I mean, it's nice to know but, like Alfonse said, debug-contexts are for finding bugs and are not intended for production code.

Aleksandar
05-22-2013, 12:57 PM
Who cares?
I do. In fact, I guess most of the additional debugging code is active in both debug and non-debug profiles, and it does not affect performance (at least in a noticeable fashion). Driver instrumentation is enabled in all Vista/Win7 drivers. We will never know what exactly is going on under the hood, but such little experiments could shed some lights on it.

thokra
05-22-2013, 01:09 PM
I do.

I didn't want to offend, of course. Surely it's worth knowing what you're dealing with. Since enabling debugging capabilities has to be a run-time decision, I assume the cost boils down to conditionals which can be efficiently avoided by branch prediction.

I'd rather get some details on how drivers implement checks instead of profiling a whole series of drivers from different vendors.

Alfonse Reinheart
05-22-2013, 02:05 PM
In fact, I guess most of the additional debugging code is active in both debug and non-debug profiles, and it does not affect performance (at least in a noticeable fashion).

So what you're saying is that the ARB is stupid. All OpenGL implementations that don't expose ARB_debug_output in non-debug builds are just being annoying for no adequately explained reason. The effort the ARB went through to allow KHR_debug to be effectively shut off in non-debug builds was a waste of time, and all OpenGL implementations that take advantage of this are just doing pointless things. Indeed, the very idea to have a separation between debug and non-debug builds was stupid to begin with.

Because that's what you're saying. You're saying that all of the effort spent, both in the spec and by implementations, to separate debug from non-debug leads to no actual gain. And therefore there is no point in spending all of that effort to begin with.

I have pretty much zero faith in the ARB to make good decisions, and even I don't think that poorly of them.

BTW, when you did these tests, did you by chance actually create any errors? Would your code have had any debug output messages in a debug run? Did you emit errors several times in the rendering loop?

Because otherwise, you're just measuring how quickly the implementation doesn't call the expensive error reporting code.

Aleksandar
05-23-2013, 04:16 AM
So what you're saying is that the ARB is stupid...
Nope! Just said that blocking message output does not necessarily mean not collecting information. Of course it is highly appreciable to prevent message spamming. So, please, don't try to misinterpret what I said.


BTW, when you did these tests, did you by chance actually create any errors?
No!


Would your code have had any debug output messages in a debug run?
No!


Did you emit errors several times in the rendering loop?
No, again!


Because otherwise, you're just measuring how quickly the implementation doesn't call the expensive error reporting code.
The purpose of the test was to check whether debug-context is slower than non-debug. Not how fast it outputs messages. If there is a problem in the execution, it would cause slow execution in any case.

thokra
05-23-2013, 05:37 AM
Not how fast it outputs messages. If there is a problem in the execution, it would cause slow execution in any case.

The thing is, ouputting messages is costly as hell. :) And being able to have the GL report stuff on its own without querying with glGetError() is the crux and the real benefit of ARB/KHR debug. Error checks take place anyway because they have to be done - all the time - so it's no wonder that stuff doesn't seem to impact performance if everything is fine and nothing is actually reported. But what about undefined behavior? Performance hints? Those aren't pieces of information an implementation would gather and report if it disabled ARB/KHR debug in non-debug contexts - at least I wouldn't implement it this way. And this is exactly where the lack of knowledge about the degree of actually implemented checks and hints for a specific GL implementation comes into play - and I'm pretty sure, AMD, Intel and NVIDIA differ substantially in this department. If the driver does only error reporting and no errors are currently present, why should it perform noticably worse?

Also, you probaly neglect the overhead introduced by using debug groups, labels and inserted messages which will produce output and thus reduce performance - all of the stuff you might not see in your specific test case and would probably have no bearing in a non-debug context.

What I'd like to see is for vendors to actually tell us what we can expect from their implementation when using ARB/KHR debug. But I guess we'll be on our own there ...

kRogue
05-24-2013, 11:10 AM
I like Alfonse's suggestion, the idea that the uniform locations that are not explicitly assigned are slightly randomized is a very, very good idea. Also, along the same line, having vertex attributes that are not explicitly assigned having randomized index would be nice too .

I have not been bitten by the uniform, but by vertex attribute I have... mostly because the vast majority of implementation give the index in order of declare, so the bug can be hiding for a long, long time.

I suggest we move this topic, or repost it to suggestions for next version of GL.

Aleksandar
05-26-2013, 05:47 AM
I like Alfonse's suggestion, the idea that the uniform locations that are not explicitly assigned are slightly randomized is a very, very good idea. Also, along the same line, having vertex attributes that are not explicitly assigned having randomized index would be nice too .
Well, explicit attribute/uniform locations are one vote more against the proposal. :)

If implementation uses explicit locations, then no randomization is possible. Second, explicit locations are introduced exactly to help applications assume the location values in the shaders (exactly what you want to prevent).


I have not been bitten by the uniform, but by vertex attribute I have... mostly because the vast majority of implementation give the index in order of declare, so the bug can be hiding for a long, long time.
It depends. I remember (since I'm currently not using that) NV assigned subroutine IDs in the opposite order they are declared (probably because they put them on the stack). So, you know (from the experience, not from the specification) how some implementation is working, you are (mis)using that, and then you are complaining that it makes hidden errors.


I suggest we move this topic, or repost it to suggestions for next version of GL.
I disagree. Not because this is worthless, but because it is superfluous and there are lots of more important aspects of the implementation that have to be fixed.

Alfonse Reinheart
05-26-2013, 10:44 AM
I suggest we move this topic, or repost it to suggestions for next version of GL.

No, we shouldn't. The reason I titled this post "Quality of Implementation" is that this is exactly waht this suggestion is: a matter of the quality of an implementation. Good implementations would implement this; less good implementations would not.

How exactly would you specify this behavior anyway? The spec already says that the values you get for non-explicit uniform locations are implementation-defined. How do you decide that it's "random"? Can it be "random, but only if you create a new context", such that the implementation can pick a random number at the start and bias all uniform locations by that number? Does it have to be random for every glLinkProgram? How much "randomness" does it have to have?

And most important of all... how would you test randomness?

This isn't something that can be standardized. It's a quality of implementation issue; a good debug context will help catch errors like this.


If implementation uses explicit locations, then no randomization is possible.

Nonsense. Simply allocate the first 1024 locations for explicit locations, and the rest are randomized for non-explicit.


Second, explicit locations are introduced exactly to help applications assume the location values in the shaders (exactly what you want to prevent).

Sure, and when you go back in time and force explicit uniform locations into the standard since ARB_shader_objects and remove non-explicit locations, then it won't have been necessary. But so long as code exists where explicit uniform locations are not used (and, this can happen by accident, since it's not a compiler error to forget the `layout(location)`), this will be useful.


So, you know (from the experience, not from the specification) how some implementation is working, you are (mis)using that, and then you are complaining that it makes hidden errors.

Yes, and since this was completely accidental on his part (due to forgetting the query), he was unable to find the bug until much later. Welcome to the point of the thread; glad you could make it.

We're not talking about someone who was deliberately using it, but who was accidentally using it.

kRogue
06-11-2013, 11:26 AM
How exactly would you specify this behavior anyway? The spec already says that the values you get for non-explicit uniform locations are implementation-defined. How do you decide that it's "random"? Can it be "random, but only if you create a new context", such that the implementation can pick a random number at the start and bias all uniform locations by that number? Does it have to be random for every glLinkProgram? How much "randomness" does it have to have?

And most important of all... how would you test randomness?


I'd make it a hint, so nothing actually really testable :P Though, suggesting to GL implementor's to add that to their debug contexts is fine too, but I think this idea will get lost here.

As for my situation on the bug, I had the GLSL program working on multiple different hardware platforms, including embedded, before the bug manifested that I forgot to bind the attribute location. Now I am paranoid and have in what when I make a GLSL program, I emit a warning whenever an attribute location is not forced but found in the attribute list.