One Pass Rendering Pipeline!

Hi all!

I ve been orienting my rendering pipeline toward glsl for a while now, after reading 2 years old posts in here… I can easily say that I m behind! But I ll like to catch up on the rendering architecture!

My main concern is how is lighting/shadow are handle!

Well, I m one of those that thinks that laziness is the foundation of affectivity… that said… multipasses is not considered as an alternative so far… until further arguments is brought to the table… I think it’s a workaround for bad design… first thing I hear on this is… “vertex are cheap bla bla…” maybe, but my brain wont allocate a memory bloc for this pointer and will eventualy crash!

The first question mark would be:

I heard that professional engines rendered one pass per light and blend each pass in the frame buffer? Is that correct?

If yes, I still cant figure why… but I should be pretty close to get my answer… so far I calculate lighting in one pass… lights are accumulate in the shader… and I don’t see why it should be done other wise… 8 lights max… because im using gl_ states in shaders temporarily, I hope…

this lead to a another question:

how hard it is to raise up the number of maximum lights to 128 for instance in opengl?.. why 8 maximum lights? And please, I don’t want a hear… you shouldn’t need more then 8 lights at the time… or use any hocus pocus hack not to have to…. I want the truth!

now, to answer the first thing that comes to your mind atm… I know I can use any number of light using shaders… but… lights have lots of parameters to deal with… even though you have to set opengl light states anyways, sending them through shader parameters is more expensive (you may get me wrong on this but I doubt it)… and I choose not to for the moment… I m aware that using gl_ states is in contradiction with my master plan of using infinite number of lights eventually…. but I ve choose to reject this idea so far for simplicity sake… till further development arise on the subject.

In my research so far of using one pass rendering pipeline, it all comes to the same bottle neck… opengl/hardware failed to fullfilled my needs (welcome to the club you ll say)… first, the maximum of lights… second, the max texture matrix stack which is 10 on my gf 7800 and 8 texture units… why clamping those values so low? What is the problem with those? Still want the cruel truth here! How can Opengl developers not doing anything for this aberration… what s going on, who s in charge here? < — Mad golgoth!

Casting shadows…. Still in the idea of using One single pass in the rendering pipeline… I manage to make an ugly compromise… and again, because of the low number of texture matrices and texture units available in opengl FFP which I use in the shader… the first 4 matrices/tex Units are used by color, gloss, environment and bump maps… and the next 4 units are used by shadow maps… which leads to a maximum of 4 shadow maps per primitives… and to answer the fire glowing in your eyes… yes, I want a be able to use 4 shadow maps per light and I which I can use more… it may sound heretic for most of you, and Im aware of that too.

Now what:

What would be the smart thing to do:

1- Do One pass per light. – easier shader wise but does it worth multipasses?
2- Do One pass for all the lights + One pass for texturing.
3- One pass does it all.

If One pass does it all:

1- Wait till Opengl 3.0 and hope for max Units upgrade – bye bye backward compatibility…
2- Send Light paramaters through shader directly and suffer a penalty cost via data sharing client/shader.
3- Don’t bother and do as everyone else… forget about it.

Some say that we can store data and functions in texture handler in a shader… I m not sure what is it and how it is done… can anyone help me clear this up?

Hope this is not to heavy… thx for reading this!

Still digging!

I think your problem is that you are trying to tack GLSL onto the old fixed function pipeline. This interface is only really useful to help porting existing programs, not for writing new (advanced) ones.

If you use the proper GLSL interfaces you have:
16 - Texture binding points (image units)
>40 - Texture matrices (uniforms)
8x4vec - Texture coord interpolators

I think the problem is that there is a distinct lack of understanding, on Golgoths part, as to how Shadowing is actually done. Not to mention a lack of understanding as to what it is that actually makes a “Texture Unit” etc.

thx sqrt… your a straight shooter and I like it!

how about sending data to the shader:

if texture has gloss map… if texture has environment map then if cube else if spherical… if lightType ambient else spot… and all that sort of things… specular, diffuse, emission, spot cut off and so on x n lights and/or x n textures… we need everything to be sent to a shader not a shader for all cases… it is a lot of data to carry… I said this before… im still in the dark with the texture handling data but… ideally I ll push for a single shader that can handle any possible cases, why would you want to do it otherwise?.. plus. you obviously offer the per primitive shader that will replace the default shader for that primitive… but mainly, design speaking… the engine should have a default shader that maxed out all the default render states like we use to do with FFP… why bother with a zillion of shaders?… lets go straight to the point … bring the entire render states to one shader and stop accumulating traffic data overheads … what you guys think about that?

To be honest, I m not up to date on recent developments but, in spite of all that, I would rather Opengl developers having serious thoughts on making gpu accessible through a more open …. mmmm… HFP (Hybrid Functionality Pipeline) that can do what shaders are all about in the first place!

Thx again for hearing me whining!

I think the problem is that there is a distinct lack of understanding, on Golgoths part, as to how Shadowing is actually done.

as for shadows using 1 depth map per light like in OpenGL® Shading Language, Second Edition, I think I got this nailed down… still digging but, I have some ideas on combining all shadow maps in one texture unit… any hint would be welcome!

Not to mention a lack of understanding as to what it is that actually makes a “Texture Unit”

uint used as a memory address where texture data is stored!? What am I missing here?

on gf 7800:
GL_MAX_TEXTURE_IMAGE_UNITS: 16. Shader
GL_MAX_TEXTURE_UNITS: 4. ffp

correct?

etc.
ouch… -.- how off am I? please, strike me!

Originally posted by Golgoth:
ideally I ll push for a single shader that can handle any possible cases, why would you want to do it otherwise?..
You probably know IHV keeps telling you to batch as much as you can.
Well, maybe you’re stretching it a bit.
Although ubershaders does really help, I hardly believe having a single ubershader would make sense (at least for now). The reasoning I found is design-driven but being my own consideration, you’re encouraged to take it with some salt (it wouldn’t be the first time I’m wrong).

Most of the time, the “world” must be realistic. To do that, it must be coherent. You wouldn’t really put per-pixel lighted polys near vertex-lighted ones.
It happens that most of the polys do have similar properties. There’s then a restricted amount of polys to do special effects: to render them, some sort of state change is often needed so ubershading them wouldn’t be a real win.

The bottom line is that, to a certain degree, your engine must manage renderstates correctly so you don’t need to tell shaders “this surface does have glossmap” (information). Instead, you do the same by using meta-information embedded in the shader: using a shader which looks up glossmaps.

Originally posted by Golgoth:
why bother with a zillion of shaders?… lets go straight to the point … bring the entire render states to one shader and stop accumulating traffic data overheads … what you guys think about that?
I think being able to set BLEND (for example) thuru a shader would be great (3DLabs originally proposed this). The point is that a shader does contain metainformation such as ‘this surface does not receive shadows’ (so, no shadowmaps are looked up). I think replacing all the shaders with a parameter-driven single one will likely increase the overheads. I am missing your point here.

Originally posted by Golgoth:
…making gpu accessible through a more open …. mmmm… HFP (Hybrid Functionality Pipeline) that can do what shaders are all about in the first place!
I don’t get you there. Actual pipelines are really “hybrid”, I think in a really programmable environment, graphics would be mapped to stream processing problems.

[b]

What would be the smart thing to do:

1- Do One pass per light. – easier shader wise but does it worth multipasses?
2- Do One pass for all the lights + One pass for texturing.
3- One pass does it all.[/b]
It might be possible to do 2 lights per pass, or even more. It depends what number of instructions the GPU supports. Current generation supports looping and conditionals so it’s possible to do plenty of lights per pass. There was a NV demo that demoes this. The teapot with many point light orbitting it.

One shader that does it all? You will lose some performance on gf6200 and above.
For older cards (gf fx 5800 and radeon 9700) you have to keep your shaders lightweight. I know that the 9700 is very limited in terms of instructions and it is also limited in features.

Originally posted by Golgoth:
[QUOTE]uint used as a memory address where texture data is stored!? What am I missing here?

Silicon for one thing.

I think replacing all the shaders with a parameter-driven single one will likely increase the overheads. I am missing your point here.

if you compare the ubershader with a regular one, yes it will increase overhead… but, we have to validate to process or not a state somewhere, lets take light types for instance… AFAICS, the tendency now is to make a shader for each light type…

if we compare the Overall process of rendering time of a single frame:

first - 2 shaders, 1 doing point light and the another one doing spot light…
second - 1 shader with a single if state enabling point or spot light…

who wins?

without mentioning the work involve in tracking files between artists, scripter, programmers and code duplicate… one thing is for sure, when dealing with 1000s of assets… not using the ubershader is definatly a workflow overhead…

I don’t get you there. Actual pipelines are really “hybrid”, I think in a really programmable environment, graphics would be mapped to stream processing problems.
I meant to extend glExtensions/functionalities in accessing gpu through gl calls, that way we can stay in the same development environement, in my case Visual Studio, instead of creating a new branch of launguages/tools…

Like:

glTransformLogic(gl_Vertex * gl_ModelViewMatrix);
glFragColor(put color here);

how hard is that? I must be missing a huge piece of the puzzle because this is driving me insane.

The reasoning behind the multiple pass approach is justified by many things.

Shadowing can be done in 2 ways: stencil and shadowmap. The stencil method requires multiple passes because there’s only ever one stencil buffer. You need to do 2n+1 passes, where n is the number of lights.

Shadowmapping can, at a minimum, use n+1 passes. You still need one pass per light to generate the shadow maps in the first place.

Given that, you need at least one pass per light, either to generate a stencil buffer or to generate a shadow map.

Now, let’s forget shadowing. Let’s assume you’re using shadow maps, and focus on lighting. Basically, the problem is simple: how many shaders do you want?

Lighting equations come in a huge variety. But, basically, all of them have a few inputs: light direction, light distance, surface normal, diffuse/specular surface color. Possibly a few other things.

Getting those parameters is very difficult. Indeed changes for each type of light. Light direction computed for directional lights is just a constant vector, whereas for point lights, it needs to be an interpolant.

Surface normal, in a smooth case, is an interpolant. In a bump-mapped chase, it’s a modification of this interpolant (several interpolants, actually), based on a texture. Relief mapping goes even farther in computing the normal.

There are a variety of lighting equations. From basic Blinn/Phong, through to complicated BRDFs and various other things.

There’s several ways to handle this complexity. One is to build a megashader, which can do everything. Parameters determine which features are on/off for every envocation of the shader. This doesn’t work well, because such a shader is brutally inefficient on modern glslang hardware.

There’s the dynamic multipass approach. That is, for each of the combinations of light-type, surface type, and lighting equation, generate a shader. Then, for every kind that acts on a particular mesh, you do a pass. Hence the 1-pass-per-light method. It’s probably more efficient than the megashader approach, despite the multiple passes.

Then, there’s what I would suggest. Figure out exactly how much stuff you want to interact with each object in a scene. Say, 1 shadowed (mapped) directional light and up to 2 directional lights with no shadows. Determine which lighting equation you will use. Then, build a shader for it. For each object, build the shader that you would want to use in that instance. There can be shader sharing, of course, where appropriate (say, a shader for every character).

The idea with the latter approach is to avoid multipassing on the “+1” step for shadowmapping. To me, the principle advantage of shadow mapping over stencil shadows is that shadow maps can render stuff with fewer passes. Doing an additional pass-per-light with shadow mapping makes no sense. And it avoid thes “megashader” problems, because the shader is hand-crafted for each application.

Thx Korval!

The stencil method requires multiple passes because there’s only ever one stencil buffer.
that’s is so true… here is one piece I needed to catch up… never used stencil shadows. I must say that FEAR did a great job with them!

Shadowmapping can, at a minimum, use n+1 passes. You still need one pass per light to generate the shadow maps in the first place.
Good point here… I didn’t consider shadow maps calculations as being a pass per say, for many reasons… because there is no render states attach to it, it is only written in the depth buffer and the pass size is based on the map resolution plus, rendered from the light pov… so its not a part of the final result but a lighting calculation. To be clear on this, multipasses includes only passes from the eye pov in my book. What I m referring to as multipasses regarding lighting is this approach: calculate one light at the time in a shader then draw scene… go to next light, blend frame buffer, draw scene and so on for all lights… that’s what im not crazy about… instead of combining all the light in the same shader and draw scene once.

There’s several ways to handle this complexity. One is to build a megashader, which can do everything. Parameters determine which features are on/off for every envocation of the shader. This doesn’t work well, because such a shader is brutally inefficient on modern glslang hardware.
This doesn’t work well? Why?

That is, for each of the combinations of light-type, surface type, and lighting equation, generate a shader.
You mean determine which features are on/off for every evocation client side, generate, compile, link then use the shader… even doing this each frame if needed?
How could this be more efficient then sending uniform true or false to the umber shader?

Figure out exactly how much stuff you want to interact with each object in a scene. Say, 1 shadowed (mapped) directional light and up to 2 directional lights with no shadows.
Cant take this approach, this matter is in the artists hands… the engine must allow the widest range of possibilities they can came up with, without modifying any line of codes. I never saw texture artists in production writing shaders… they shouldn’t have too… And that’s pretty much the bottom line on this matter.

regards

This doesn’t work well? Why?
What input parameters would your megashader have? That is, what are the attributes, varyings, uniforms, textures?

You’re not done with a single “type” uniform that selects a particular equation. You also need every parameter of ALL possible equations, because you don’t know which ones you’ll need until shader execution.

Also, with one pass per light you have the additional advantage that you can cull objects that are outside the range of the light, so you do a lot less work if you have many lights with low range. Google for “deferred shading” to see how to take this to the extreme (think hundrets of visible lights).

What input parameters would your megashader have?
more or less then all shaders combined would have… whatever you need to process any given state…

You’re not done with a single “type” uniform that selects a particular equation. You also need every parameter of ALL possible equations, because you don’t know which ones you’ll need until shader execution.
I was expecting this one… you ll have to go through all of it client side anyways… client side, if this state enable send what you need to the shader for this state… if not… the umber shader have the variables handy but are not used to compute the final result… they can just sit there and wait for further instruction… on top of my head, we could do some sort of variable pooling… more like generic variables…

ei:
uniform var1;

if (state1 == 1)
float reflection_indice = var1;
else (state2 == 1)
float opacity = var1;

plus, what about sending diffuse, spec, ambient, emission in a single matrix 4x4…

with one pass per light you have the additional advantage that you can cull objects that are outside the range of the light
Not sure how using multipasses can benefit from this… you can cull objects that are outside the range of the light without rendering anything… but ill take a closer look on the deferred shading for sure.

Thx again

This doesn’t work well? Why?
In addition to what Overmind said, only the most advanced graphics cards can handle the complex conditional branching necessary to do what you suggest. And using that conditional branching can cause substantial performance penalties.

How could this be more efficient then sending uniform true or false to the umber shader?
Because you can precompile all the shaders that you will ever need. The number of combinations is pretty small:

If you have 2 light direction generators (directional and point), 2 surface normal generators (bump and smooth), 3 color generators (interpolated vertex, texture, and parallax&texture), and 3 lighting equations, then you have only 36 possible shaders. And the shaders themselves are pretty small.

Cant take this approach, this matter is in the artists hands… the engine must allow the widest range of possibilities they can came up with, without modifying any line of codes.
Then you’re going to have to take the multipass approach. If you aren’t allowed to restrict what the artists can do, you’re going to have to sacrifice performance. You can’t get without getting.

Thx again for your answers!

And using that conditional branching can cause substantial performance penalties.
What kind of penalties?… if we are talking about 50% Global speed rate, I ll forget about it… but for ~5% speed rate to process umber vs without conditional branching, it sounds reasonable for increasing our day to day development quality… speed trade off is not at all cost… at least I think so…

Because you can precompiled all the shaders that you will ever need.
That’s probably the part im scared of… im not sure if you mean hand coding 36 shaders or auto compiled them at the engine initialization… In both cases, I cant hardly imagine… hand coding is an absolute no go… and for the auto compile idea… hum… I ll have to meditate on this… maybe interesting… still have no clues how to.

If you aren’t allowed to restrict what the artists can do, you’re going to have to sacrifice performance. You can’t get without getting.
If I can buy peace this way, it’s a done deal for me …

regards

What kind of penalties?… if we are talking about 50% Global speed rate, I ll forget about it… but for ~5% speed rate to process umber vs without conditional branching, it sounds reasonable for increasing our day to day development quality… speed trade off is not at all cost… at least I think so…
It depends on what hardware you’re talking about, and how you write your megashader. Don’t forget: quite a bit of the glslang-capable hardware can’t do conditional branching in the fragment shader period.

If you want exact answers, you’ll need to benchmark it.

That’s probably the part im scared of… im not sure if you mean hand coding 36 shaders or auto compiled them at the engine initialization… In both cases, I cant hardly imagine… hand coding is an absolute no go… and for the auto compile idea… hum… I ll have to meditate on this… maybe interesting… still have no clues how to.
I do not understand what you’re trying to say here. Your artists aren’t writing shaders, by your own admission. So you, or someone much like you, are going to have to write these shaders. Whether it’s a megashader, or smaller ones.

It’s only 36 compiled shaders. At a rate of, say, 4 shaders a day (written, tested, debugged), it wouldn’t take you longer than 2 weeks to do it.

Not only that, the shader pieces are all swappable. It wouldn’t be too hard to come up with some shader conventions so that all you do is compile (now using glslang terminology) the individual shaders, and combine them at the program linking stage into the 36. That way, you only need to write 2 + 2 + 3 + 3 or 10 shaders.

You can’t get without getting.
I meant to say “You can’t get without giving,” btw.

korvak, you have been a great contribution to this thread, thx again!

Don’t forget: quite a bit of the glslang-capable hardware can’t do conditional branching in the fragment shader period.
It is not a problem for my needs yet, since im not anywhere to be ready for a release… plus, im targeting a non public market at the moment… current dev is made on gf 7800, and im not planning on targeting any lower.

It’s only 36 compiled shaders. At a rate of, say, 4 shaders a day (written, tested, debugged), it wouldn’t take you longer than 2 weeks to do it.

I agree with you to a certain level here… its not a big deal once your engine is ready for release… but most of the time we re in dev mode… especially with shaders… they hardly ever final in my case… so to maintaining all shaders will be a real waist of time in dev mode… That said, you seamed to bring to light an interesting ideas that im clearly still in the dark with…

Not only that, the shader pieces are all swappable.
What would this mean?

I know we can attach more then one shaders to a glsl program, anything to do with this?
AFAIK, I ve tried to attach multiple fragment shaders… logic turns out that the last frag shader overwrites the first one… which make me think that it could be used for multipasses… am I correct? Or did u meant something else?

It wouldn’t be too hard to come up with some shader conventions so that all you do is compile (now using glslang terminology) the individual shaders, and combine them at the program linking stage into the 36.

Ok now its getting even more interesting, I can almost see a sparkle, can you elaborate on this if it is not too much trouble? is this topic covered anywhere? OpenGL® Shading Language, Second Edition barely goes over this topic.

Thx again

regards

AFAIK, I ve tried to attach multiple fragment shaders… logic turns out that the last frag shader overwrites the first one… which make me think that it could be used for multipasses… am I correct?
It’s exactly like building a regular C program.

(note: now using glslang terminology).

You build a shader from one or more text files. This is analogus to having a .c/.cpp file that include one or more .h files. The shader text files are compiled in order, and can include header type information (forward declarations of functions).

A built shader, a glslang shader object, is like a .o/.obj file in C. It isn’t a program yet, and you can’t use it directly.

A full glslang program is what is created when you take one or more shader objects and link them together to form the program. Now, you know that a glslang program (that fully overrides the old pipeline) consists of a vertex shader and a fragment shader that link together. You know that these two shaders need to agree in terms of the names of variants passed between them.

What you may not know is that you can take two vertex shaders and one fragment shader and link them together into one program. When you do that, the two (or more) vertex shaders are combined exactly like .o/.obj files are combined into executables.

One of those vertex shaders can call a function in the other. As long as the function was declared when the vertex shader was built, it can call it. But it doesn’t need to know which compiled shader is going to implement it; as long as the function matches the declaration, everything is fine.

It’s easy to apply this to our case; do it C-style.

You have a main fragment shader text file. It implements the main function for fragment shaders, and it never changes for any fragment shader program you create.

In my earlier post:

If you have 2 light direction generators (directional and point), 2 surface normal generators (bump and smooth), 3 color generators (interpolated vertex, texture, and parallax&texture), and 3 lighting equations, then you have only 36 possible shaders. And the shaders themselves are pretty small.
I defined the 4 stages of a fragment light program:

light direction generation
surface normal generation
color generation
lighting equation

So, your main shader text file looks something like this (in pseudo-code):

vec3 GetLightDirection();
vec3 GetSurfaceNormal();
vec4 GetColor();
vec4 ComputeLighting(vec3 lightDirection, vec3 surfaceNormal, vec4 color);

void main()
{
  vec3 lightDirection = GetLightDirection();
  vec3 surfaceNormal = GetSurfaceNormal();
  vec4 color = GetColor();
  gl_FragColor = ComputeLighting(lightDirection, surfaceNormal, color);
}

There’s your main. That shader text (if it were actual glslang and not pseudo-code) would compile into a shader object. But it would not link by itself into a program, because it calls functions that aren’t defined in any shader object.

So, you write a shader object that implements GetLightDirection as directional light. It might look like:

uniform vec3 myLightDirection;
vec3 GetLightDirection()
{
  return myLightDirection;
}

So, whenever you use a program built from this shader object, you need to make sure that you set the myLightDirection uniform. Now, maybe the light direction is passed through an interpolant, for a point light. That might look like:

varying vec3 myLightDirection;
vec3 GetLightDirection()
{
  return myLightDirection;
}

Whenever you link this fragment shader object, you need to make sure to use a vertex shader object that provides a myLightDirection varying, of course.

Now, when you go to build your fragment program, you take your main shader object, and one of each kind of other shader object, for the parameters that the main shader object needs. You link them together, and you get a viable program. 36 of them. From 1 main shader object and 10 others.

The reason that the last shader in your examples kept overriding the previous ones was because, I assume, each one implemented its own main() function. So, much like a C linker, you multiply defined the same function. So, it gave you some warnings (maybe?) about multiple definitions, and then kept only the last one.

I must say, I was literally glued to the screen, it was really interesting reading you!

Let see if I get this right, here is a pseudo wrap up:

Main.frag

So, it gave you some warnings (maybe?) about multiple definitions
I just tested it again, my bad, I thought it did but it did not overwrites. my bet is, at the time I did try it, my setup just deleted the current shader and replaced it by the new one internally. just for the record, it does not compile, here is what glIntercept return:

(4) : error C2002: duplicate function definition of main (previous definition at :45)
(4) : error C2001: incompatable definition for main (previous definition at :45)

u could do what ive done
write a small app that generates a shader string, which u then create a glsl shader from

string src = crete_shader( dir_light | diffuse_texture | normal_texture etc );
well i dont do this exactly i fill a struct with the required data
eg frag_shader.num_lights = 1;
and then let the app spit out the string based on what the struct contains