Multiple objects: single or multiple shaders?

sam_thedancer · April 25, 2013, 5:54am

This topic is actually a follow-on of an earlier posting of mine “Generating pipelines…” where I learned from thokra’s responses how to generate pipelines in 4.3. The reason to post a new thread is that other novices like me may be interested in this particular question.

So, here’s the scenario: Two objects (could be many more) need to be transformed/colored differently. Here are two ways I know to do this:

Single vertex and fragment shader:
Define a uniform, call it currentObj, in the VS and FS. Both VS and FS have code of the form
if (currentObj = obj1) do obj1 stuff;
if (currentObj = obj2) do obj2 stuff;

The draw routine in the main program looks like
glUniform(make currentObj = obj1);
draw obj1;
glUniform(make currentObj = obj2);
draw obj2;

Separate vertex and fragment shaders:
Make VS1 and FS1 for obj1 and attach them to pipeline1. Make VS2 and FS2 for obj2 and attach them to pipeline2.

The draw routine in the main program looks like
glBindProgramPipeline(pipeline1);
draw obj1;
glBindProgramPipeline(pipeline2);
draw obj2;

Btw, I believe the above methods would apply to other shader stages as well if present.

So, here are my questions:

Are the above basically the two ways to handle multiple objects or are there others?
What’s the best?

Thanks in advance,
Sam

thokra · April 25, 2013, 6:37am

It depends on what you mean by “transformed differently”. It could mean that you have the same program logic for every object, e.g. a chaining of multiple matrices where only the matrices themselves change. Or you could have entirely different transformation logic which in addition might be data dependent. The same goes for other properties like colors etc.

For the first case, where the logic is identical, you don’t need branching in any way in the shader code because, well, the logic is identical. Just change the uniform values or better yet, stick them in a uniform buffer object and just have the GL point to a different data store location with glBindBufferRange(). This also works well with instancing, only in that case, you’d draw a number of instances with the same set of uniform before changing the bound buffer range. This is also applicable to other areas such as material definitions or light source properties and so on. I like uniform buffers a lot.

If the logic is supposed to be different, there are actually three ways to change the code path:

[ul]
[li]dynamic branching, what you suggested first but less desirable IMO [/li][li]different shader programs, work well but need to take care on the application level to not have to many program switches (i.e. you need to batch hard) [/li][li]use shader subroutines, effectively switching portions of functionality at run-time depending on what logic is needed for the current object [/li][/ul]

I can’t speak to what works best in your particular case. For a small number of objects it’s irrelevant anyway. If you go to hundreds or thousands of objects, it’s a different story.

Technically, dynamic branching is the least favorable approach because if stuff goes wrong you loose performance due to branch prediction or additional execution of additional branchen you’re not even trying to reach. At least in the fragment shader. I’m not certain how much impact dynamic branching has on current HD7000 or GTX600 series GPUs in worst case scenarios.

With different programs (or separable programs for that matter) you get what you want but have to be careful to not introduce too many shader program switches, i.e. repeated calls to glUseProgram() or glUseProgramStage() because it alters program state and has to make the executables current for their respective stages, which takes some time, and introduces overhead in the application. If you can aggregate multiple objects into a group which uses the same program or pipeline, you can reduce this overhead. Batching is a good idea in general to reduce API overhead. Do it, and do it hard if possible.

To me, switching functionality at dynamically runtime immediately let’s me think of virtual functions or functions pointers in C/C++. Shader subroutines are technically similar to function pointers. You define multiple subroutines which offer different logic and choose the appropriate one at runtime. Subroutines are basically nothing but functions to which a pointer with a specified name points at a given point in time. In the shader, this “pointer” is declared as a uniform which can be queried and altered by the application. The drawbacks are, at least principle, that you have an additional indirection and of course a function call - which probably cannot be inlined by the compiler. Depending on the number of shader invocations I assume it can have a noticable impact. However, I’m not aware of any actual performance comparisons with real-world code that prove or disprove the value of subroutines in a high-performance context. I’m pretty sure, however, there aren’t many real applications out there which actually use subroutines anyway - or most of the GL4 stuff.

In the end, the only thing that’s gonna give you certainty, at least for the platform you’re developing on, is implementing multiple methods and simply profiling the result. Still, as I said, what works well on your platform is not guaranteed to run as fast on other platforms - but the opposite is true as well. That’s the price we pay for doing cross-platform graphics programming.

HTH.

sam_thedancer · April 25, 2013, 11:39am

As always, thokra, thank you for a clear and comprehensive reply. Shader subroutines are clearly what was missing from my list.

imported_tonyo_au · April 25, 2013, 4:49pm

I’m not aware of any actual performance comparisons with real-world code that prove or disprove the value of subroutines

My tests to date imply a slight performance hit verses a condtional branching based on a uniform when I have a small number of subroutine variations.
The plus side is the shader code is much more readable.

thokra · April 26, 2013, 12:13am

tonyo: If you increase the number of variations, how does it play out then? Did you test? What GPU?

imported_tonyo_au · April 26, 2013, 1:33am

My test only had 4 subroutine variations and I was using a nVidia GTX 570. I have changed my code to use a uniform to switch for the 3 most common variants and then a subroutine for the wilder variations that I don’t use often. This keeps the code reasonable neat. I haven’t been back to test it more seriously because it is not my major bottleneck at the moment.

There are 2 problems I see with dynamic subrountines.
One is having to pass parameters even when the variation is not going to use any of them.
The other is that you cannot just change 1 subroutine pointer; you have to rebuild all of them. Both of these things must effect performance.

The logic I am using most in the fragment shader looks like this


fetch basic colour - single texture/vertex colour/uniform colour using uniform, all other variations like double sided textures, height based colours and specialised results lookups by dynamic subroutine
lighting - by dynamic subroutine
modify colour - wire frame, contouring and some other variations by dynamic subroutine
store - by dynamic subroutine - depends on  g-buffer data I want to collect

This means I can build the dynamic subroutine list once for a large number of render objects but can easily flick texturing on/off for a particular object.

sam_thedancer · April 27, 2013, 10:31pm

Thokra,
I have a question for you. You say
“For the first case, where the logic is identical, you don’t need branching in any way in the shader code because, well, the logic is identical. Just change the uniform values…” I want to clarify this in a particular case I have.

Suppose I am drawing two different objects. Specifically, in the app I have:
…
bind buffer
glVertexAttribPointer(0, point to buffer location for coord values of obj1);
glVertexAttribPointer(1, point to buffer location for color values of obj1);
…
bind another buffer
glVertexAttribPointer(2, point to buffer location for coord values of obj2);
glVertexAttribPointer(3, point to buffer location for color values of obj2);

Correspondingly, in the VS I have
layout(location=0) in vec4 obj1Coords;
layout(location=1) in vec4 obj1Colors;
layout(location=2) in vec4 obj2Coords;
layout(location=3) in vec4 obj2Colors;

Now, in this case don’t I need a conditional in the VS (or subroutines) which does something like the following?

if (current obj == obj1)
{gl_Position = projectionMatrix * modelViewMatrix * obj1Coords;
…}
if (current obj == obj2)
{gl_Position = projectionMatrix * modelViewMatrix * obj2Coords;
…}

As the attributes change I can’t just get the job done with changing a uniform in the app and do need some form of branching in the shader, right?
Thanks again.

Alfonse_Reinheart · April 27, 2013, 11:38pm

Why would you ever do that? Why would you not simply put the vertex data in the same buffer object and render them all with the same draw call (even if it’s a multi-draw call)?

sam_thedancer · April 28, 2013, 4:15am

Thanks for the response, Alfonse.
I see your point but how about if obj1 and obj2 are, additionally, transformed differently, i.e., the modelview matrix is MV1 for obj1 and MV2 for obj2? In this case, I need to reset the modelview matrix uniform between drawing the two. Is it possible to do this within a multidraw call?

Alfonse_Reinheart · April 28, 2013, 4:27am

I see your point but how about if obj1 and obj2 are, additionally, transformed differently, i.e., the modelview matrix is MV1 for obj1 and MV2 for obj2? In this case, I need to reset the modelview matrix uniform between drawing the two. Is it possible to do this within a multidraw call?

And what if the number of draw calls isn’t your performance bottleneck? Unless and until you have hard profiling data that says otherwise, just render. Render in the most obvious manor possible. Because until you know where you’re slow, you’re not going to know how to make it fast. And until you know that you’re slow, you’re never going to tell if any of the things you do are an actual improvement in real-world scenarios.

sam_thedancer · April 28, 2013, 4:56am

I am just learning to code shaders so performance is not an issue for me at this time. So, it’s simply a coding question: if uniforms are different for a bunch of objects can they be packed into a multi-draw call (which would be hugely convenient obviously as no conditionals/subroutines would then be needed)?

Sorry if I am a bit slow picking up shaders but pre-shader OpenGL is all I’d learned till now.

thokra · April 28, 2013, 5:41am

if uniforms are different for a bunch of objects can they be packed into a multi-draw call[…]?

They can. You could add another generic attribute which is used as an index into a uniform array or uniform buffer (or any other indexable data store for that matter). Unfortunately, there is no gl_DrawCallID or something - actually I don’t know why. There is gl_InstanceID but as the name suggests, it will only non-zero values if used with instanced draw calls.

I agree with Alfonse that unless you’re having performance issues due to draw call overhead, you can simply do two separate draw calls and set your uniform stuff accordingly.

Alfonse_Reinheart · April 28, 2013, 5:50am

There is gl_InstanceID but as the name suggests, it will only non-zero values if used with instanced draw calls.

Not entirely. You can always use base instance rendering calls to specify the instance ID directly.

sam_thedancer · April 28, 2013, 5:55am

Thanks, Thokra.
From the point of view of easy-to-understand code, my main concern now, I guess separate buffers/uniforms for separate objects is the best. I can see why though if you’re after performance packing as much as possible into the least number of buffers makes sense.

thokra · April 29, 2013, 2:29am

True. Completely missed that. :dejection: