Are shader variables only per vertex and uniform?

It would be nice to define a variable for more primitives, like a color for an entire object. The way I see it the only ways are to define a variable per vertex (waste of space and time) or as a uniform, drawing different buffers (waste of time).
Also, assuming that there is no other way, which one of the two is better? I would go for a variable per vertex, as building different buffers and drawing them separately looks too much.

It would be nice to define a variable for more primitives, like a color for an entire object.

That’s what a uniform is. An “object” being defined as “an invocation of a glDraw* command.” Uniforms retain the same value throughout a draw command.

Hi Alfonse,

you answer is interesting. I was thinking that I should minimize the number of calls to draw, and so put more objects together. Are you saying that it’s a wrong approach? You seem to imply that calling glDraw functions more times, maybe many times, it’s common practice and possibly even recommended.

Thanks

The correct answer is to implement whatever the simplest solution for your specific needs is. What your specific needs are is up to you.

You have some object that, for whatever reason, has a solid color. The simplest way to render this is with a uniform providing the color.

The only reason not to do it this way is if you have some need that makes this unworkable. Generally, this would be some question of performance.

At which point, the first question that should be asked is this: does it actually matter? Do you have profiling data in hand that shows that the simple implementation is too slow for your application?

That last part is important. You shouldn’t just write a profiling app where you test if one is faster than the other, because that might not be where your bottleneck is in your application.

In short: you should only bend over backwards for batching if you know it’s going to help. Otherwise, you’re just making a lot of work for yourself.

Well, I’m trying to write some generic code, as well to learn some best practices. I would also say that it’s better to get it right the first time, without need to profile or change the code.
Unfortunately I often see an answer like “try it and test it”, and that is not what I was expecting. I would think that there are at least some generic guidelines, but this is turning out to be the exception rather than the norm.

Oh well, thanks for the answer as usual

The simple approach is: For each object, set uniform, invoke glDraw command. This is guaranteed to work, and is basically how you render with OpenGL 1.

If your target OpenGL version supports instancing, then it is certainly worth learning how to do instanced rendering. Basically you need to store per instance transformation and parameters in either uniform buffer object or texture buffer object and invoke instanced draw command. This way you can draw a quite large number of objects with a single call. In your shaders you can use gl_InstanceID to index into arrays, and to render you would use glDrawInstanced() functions.

I use uniform buffer abstraction which keeps copy of data in CPU memory. This is useful for a fallback code path (for GL versions without support for instancing) so it can use the same input (with instancing data in arrays) and render it with a for loop doing transformation etc setup and draw call per instance.

Uniform buffers can be interesting. I think I’ll look at them. Things like this remind me of why I post in the beginners forum :slight_smile:

In the meanwhile I found Nvidia’s GPU Programming Guide. One of the first tips is “use fewer batches”. That doesn’t really surprise me. But maybe using instanced methods I can expect only few draw calls.
It starts to make a bit of sense. Just a bit.

A word of caution. Instancing is not some magical coding trick to speed up all and any program. All instancing does is save a few cycles setting up vertex transforms - so it’s only going to help you where you know for an absolute fact you are vertex throughput limited.
I have spent many an hour codeing up alternate rendering paths supporting Uniform Buffers, Array Instancing, Texture Buffers, etc only to find that none really helped - not in all cases.
Generally speaking if the implmentation of one of these techniques is a burden, the payback is proably not worth it - unless you are certain the bottle neck is geometry throuput.

I must say I somewhat share the confusion about the need for batching geometry to extreme cases.

Now, mind you, I’ve been mostly doing demos oriented to learn both the API and the current 3D techniques (since I stoped touching 3D back in 1.5 and just recently came back to it) and in all my (naive) profiling, the bottle neck is always found at the fragment processor. At least according to GPUPerfStudio.

I do not see how fewer calls to the Draw functions can do anything for me here when a simple plane composed of 400 triangles that take half the screen takes 10 times longer to render than a model with 70.000 triangles, simply becaus they take less screen space (thus translate to less fragments).

I do reckon that some extreme cases where one would render many batches of very few triangles would need further work, but I have trouble seeing how a “normal case” scene, even with millions of triangles, is going to be bound by vertex processing.

Perhaps it’s just because my latest topics have been bump mapping, shadow mapping, deferred rendering, ambient oclussion and other screen space effects that my view is skewed.

I would also say that it’s better to get it right the first time, without need to profile or change the code.

My point is that there is no “right”. There is no “correct”. There are no “best practices”. There is only “what your particular rendering system needs for performance.”

Premature optimizations are a waste of time. Unless you are experienced in your problem domain (in which case, you wouldn’t be asking us about it), you do not know enough to know where the bottlenecks will be. It is very easy for you to spend a lot of time on batching and so forth, only for it to be completely irrelevant to the performance of your program.

Rewriting is a fact of life. You write some code, learn how to do it better, then write it again. It happens, and it is a necessary part of being a programmer. The sooner you accept that, the better.

And the best part: you may not have to rewrite it at all.

Now, mind you, I’ve been mostly doing demos oriented to learn both the API and the current 3D techniques (since I stoped touching 3D back in 1.5 and just recently came back to it) and in all my (naive) profiling, the bottle neck is always found at the fragment processor. At least according to GPUPerfStudio.

Which is why people who aren’t experienced should profile before taking steps to do batching and so forth. Because the cases where batching is vital to performance are not necessarily cases that your application will encounter.

Okay 2 questions:

  1. what do you mean by vertex throughput limited?
  2. why would the implementation be a burden? You build a buffer and you call a method

Okay, fair enough. Yet…

…here is what I mean: learn how to do it better.
If you say there is no way to learn how to do it better without testing (maybe because it is always so heavily domain specific?) okay, I’ll take it. But for someone who doesn’t have a big knowledge of OpenGL like me I’m pretty sure I can learn more generic things from people like you. In fact, I just did: consider drawing different objects separately if you have different uniforms. And I should also look at uniform buffers.
I am an experienced developer, but for me rewriting the code happens mostly when I have a change in design. If I have to learn how a technology works I usually read the documentation, or maybe I get something from the internet. It seems to work a bit different here. But I am going way off topic.

  1. what do you mean by vertex throughput limited?
  2. why would the implementation be a burden? You build a buffer and you call a method

If an app is vertex throughput limited then the vertex shader is the bottleneck.
Typical tasks of the vertex shader is to transform the per vertex attributes by the modelview matrix.

Implementing instancing is not quite as simple as you may think. Not all h/w supports all instancing techniques. Eg uniform buffers may not be implemented properly (bugs) or instance arrays may be missing.
I found adding UBO support to my shader framework took far longer than expected because it requires a redesign of various aspects. Sure adding in support on a simple standalone program is not so bad but when you have a game engine …

If I have to learn how a technology works I usually read the documentation, or maybe I get something from the internet.

There’s a difference between “how it works” and “how to make it fast.” How it works is simple and semi-well-documented. How to make it fast is a domain-specific question that cannot be answered in the general case.

Once you start deciding that performance matters, and purely algorithmic optimizations aren’t enough, then you’re going to have to be flexible about how things get implemented.

Also makes it simpler to build larger batches, which will help you if you are CPU limited (VBO bind bound, etc.)

Good point, I just wonder to what extent it applies here.
What I usually see is that FPS are super important. When I read a book, an article, a post here, there is often something about performance. Not surprisingly, considering the kind of technology.
That is the reason for me to ask, as a first question, “which way is faster?”. I just feel that fast is pretty much part of how it works.

And again, although the way to OpenGL still seems a bit obscure to me, I do get some useful information here. Thanks to all the people who answered