glBlendEquationSeparateRGBA

Currently, the alpha in color blending has far more flexible roles that the other channels, RGB. This is actually 2 parts:

First part, a new function:


glBlendFuncSeparateRGBA(
GLenum red_src_factor, GLenum red_dst_factor, 
GLenum green_src_factor, GLenum green_dst_factor, 
GLenum blue_src_factor, GLenum blue_dst_factor, 
GLenum alpha_src_factor, GLenum alpha_dst_factor)
             

which specifies the factors for blending separately for each color channel.

In addition, a smash of new enumerations for using each of the channels, separately:


GL_SRC_RED, 
GL_SRC1_RED,
GL_DST_RED,
GL_ONE_MINUS_SRC_RED,
GL_ONE_MINUS_SRC1_RED,
GL_ONE_MINUS_DST_RED,

and similar ones for GREEN and BLUE.

A logical convention for the using current non-alpha factors with the new function would be that to just use the component, for example passing GL_SRC for red_src_factor or red_dest_factor would be same as GL_SRC_RED, and so on.

Naturally this extends to glBlendFunci as well.

first part of my post was rubbish, so i removed it.

but the thing i would like to see is quite similar to your recent suggestion on this forum about gl_FragDepth and kinda related to this one. i’d like to have a hack to use separate values for alpha-blending and for actual output value of alpha-channel.

This is actually 2 parts:

Really, any hardware that could actually implement this would more than likely just implement a blend shader. It’s not like any existing hardware could do this, so it would have to be new hardware. And there’s no reason why they’d keep using this weak configurable blend stage when they could just put some shader logic in the ROP.

Also, you forgot about the constant color.

i’d like to have a hack to use separate values for alpha-blending and for actual output value of alpha-channel.

Um, we already have that. That’s far more general-purpose and useful than what you’re asking for.

Since blending is still in fixed function hardware this would require a new hardware revision to support, which makes it kinda inappropriate for an API feature suggestion. As Alfonse correctly pointed out, at that point you may as well just have programmable blending. Until then you could emulate it with FBOs (with a performance tradeoff, but which may be preferable to having to deal with the inevitable driver bugs and differences that would very likely arise from moving such a key part of the pipeline over to programmability).

This is orders of magnitude simpler that a shader stage for ROP. Indeed, with this suggestion, the ROP still does this, and nothing more:


C_s*S op C_d*D

where op is one of add, subtract, reverse add, min or max, S is output of fragment shader, D is current value in framebuffer, C_s and C_d are controlled by glBlendEquationSeparateRGBA. All this adds is that the coefficients C_s and C_d allow the red, green, and blue values from S and D to determine the values in C_s and C_d of a different color channel. We already have that for alpha anyways. Weather or not the ROP abilities are symmetric with respect the color channels, I do not know, if someone does know or has a lead on how to know please post it here.

My suggestion is somewhat needed anyways to handle one and two channel textures with blending.

On the subject of placing an entire shader stage at the ROP: I do not think that will ever happen for immediate based renderers, unless the shader had some very severe restrictions (no texture look up, very limited uniform room, no branching, for example). This is because a typical hardware likely has like a queue of fragments for the ROP to act upon that is filled as the fragment colors are computed. Because the fragments might come to the ROP in a different order than primitive order, the queue is likely more like an array with slots getting filled, and once the first slot (and possibly enough after it as well), the ROP goes to town on a simple array. Adding an entire shader stage makes this suicide-ish since to parrellize this to hide latency means multiple pixels need to have the same ROP shader, also the number processed has to be in same chunks, etc. The first requirement is not so bad, but the rest I think is a pipeline bubble city and/or another set of pain.

Good point on forgetting about the const-color values too :smiley:

This is orders of magnitude simpler that a shader stage for ROP.

As you point out, it depends on how complicated this shader stage is. OpenGL doesn’t require unified shader architecture; it still allows each individual stage to dictate its own limits. So the new stage could be allowed to have no sampler or image lookups, no uniform blocks, very few uniform components at all, and a very limited set of math operations.

Remember: the first fragment shaders were only about 12 opcodes long. Given advances in hardware and such, it wouldn’t be unreasonable to stick a 10 opcode shader into ROPs.

I mean really, what else are they going to do with that extra silicon these days? Make double-precision math faster?

My suggestion is somewhat needed anyways to handle one and two channel textures with blending.

How? It already works just fine; it’s effectively implemented as a write mask of the other two channels. What you’re talking about is far more complex.

Also, you seem to have forgotten that blending is something that is per output buffer, which is why we have glBlendFunci and such. Also, glBlendEquation is for setting the overall equation, not the blending factors. That’s glBlendFunc.

I’d think for a blender shader to work without the whole thing going to junk would require:

[ul]
[li]No branching. I think this because I’d think that an implementation would “collect” a bunch of fragments together for parallelization. The collection may not be very screen space oriented, so the idea that neighbors are likely same branch is not going to happen.[/li][li]Nothing that induces latency, thus no texture look up, no UBO’s, very small uniform room[/li][li]Very few op codes[/li][/ul]

But even with all that, I am not totally convinced that it is going to fly well… I suspect that the current ROP jazz is almost for free, where as the shader stage thing… is going to be messy… and lord help the implementers implementing it, another shader compiler… shudders.

I suspect the bigger beans are more essentially more cores for now. I suspect that right now is a bit of a lull for adding features to GPU’s… since the most exciting new things (to me) are vendor specific:

[ul]
[li]NVIDIA Bindless texture[/li][li]AMD HSA with unified memory access[/li][li]AMD’s sparse texture[/li][/ul]

The darker part of my personality is thinking that we might be entering the end of PC gets graphics feature first and fastest… the mobile world, much to my chagrin, seems to be lots of folks’ concentration… NVIDIA’s Tegra5 is planned to be Kepler based for next year… I wonder if NVIDIA is putting it’s bet on the mobile and less on the PC and AMD is putting its bets on console and mobile more than PC (though the console business is just a tiny fraction of the PC business, but PC sales are so down now, well…). I really dislike the mobile world of Android, I dislike GLES2 and to a lesser extent GLES3. GLES2 has been unnecessarily crippled in my opinion of various features, and GLES3 seemed to take the hatchet to various features that are deadly useful (glBlendFooi, TBO’s and gl_ClipDistance being at the top of my rage list).

The issue is like this, given a two channel texture, there is no destination alpha channel. The texture format is RED_GREEN. So in order to get that back, you bite the bullet and make a 4 channel format, although the green and blue channels are wasted. The easy way out of this is to have a RED_ALPHA and RED_GREEN_ALPHA texture formats… or my suggestion which gives more power up front anyways.

Oops, I often get the names mixed up between the two glBlendFunc and glBlendEquation.

I am too anal, went ahead and made the fixes, thanks for spotting it.

any way to change the name of a thread?