Not how fast it outputs messages. If there is a problem in the execution, it would cause slow execution in any case.
The thing is, ouputting messages is costly as hell. And being able to have the GL report stuff on its own without querying with glGetError() is the crux and the real benefit of ARB/KHR debug. Error checks take place anyway because they have to be done - all the time - so it's no wonder that stuff doesn't seem to impact performance if everything is fine and nothing is actually reported. But what about undefined behavior? Performance hints? Those aren't pieces of information an implementation would gather and report if it disabled ARB/KHR debug in non-debug contexts - at least I wouldn't implement it this way. And this is exactly where the lack of knowledge about the degree of actually implemented checks and hints for a specific GL implementation comes into play - and I'm pretty sure, AMD, Intel and NVIDIA differ substantially in this department. If the driver does only error reporting and no errors are currently present, why should it perform noticably worse?

Also, you probaly neglect the overhead introduced by using debug groups, labels and inserted messages which will produce output and thus reduce performance - all of the stuff you might not see in your specific test case and would probably have no bearing in a non-debug context.

What I'd like to see is for vendors to actually tell us what we can expect from their implementation when using ARB/KHR debug. But I guess we'll be on our own there ...