Transparency CBuffer

Order-independent real transparency can be hard.
At the moment we can use:

  • Depth peeling. Slow, artifacts can appear if not enough layers are used.

  • Alpha-test with MSAA. Partial solution, can’t manage real transparency. Only fits well for vegetation, fences, etc… Requires antialiasing enabled.

  • Manual(CPU) triangle sorting, BSPs. Slow, CPU-GPU sync problems.

  • Additive/subtractive transparency. Independent sort but only works for emisive particles, glass and black/white smoke effects. Can’t manage “alpha blend” surfaces really.

  • Backface cull tricks. Sort object by distance. Draw backfaces first, then front faces. Play with the alpha blend methods like DSTALPHA. Tons of artifacts, slow because you need to execute two sepparate drawing calls per object.

  • Tiling. The PowerVR Kyro II processor used independent-order transparency dividing the screen into tiles.

See
http://www.anandtech.com/showdoc.aspx?i=1435&p=3
http://www.aceshardware.com/Spades/read.php?article_id=25000226
http://www.powervr.org.uk/newsitepowervr.htm

Multiple problems with that… a nightmare to implement as Tim Sweeney said some time ago.

  • Geometry shader. Good for particles and to sort by HW the triangles. Poor HW support at the momemnt. Perhaps slow due to the sorting + texture/shader constant changes (not sure, DX10 is not out yet!) Triangle sorting will fail for some special cases ( coplanar triangles, T-junctions, intersecting geometry )

What about to use something like a CBuffer? The framebuffer will be divided into transparency layers. When a pixel is written to the framebuffer it will compare the depth and sort the list for that pixel:

struct Pixel
{
    RGBA color;
    float depth;
    Pixel *next, *prev;
};

Pixel[,] cbuffer; //array of sorted back-to-front pixels

void InsertPixel ( const Pixel& newPix, const int x, const int y )
{
   //Insert new pixel in the list
   Pixel *pix = cbuffer[x,y];

   while ( pix!=0 && newPix.depth<pix.depth )
   {
      pix = pix->next;
   }
  
   if (pix!=0)
   {
      pix->nxt = newPix;
      newPix->prev = pix;
   }
   else
   {
      cbuffer[x,y] = newPix;
   }
}

void BlendPixels ()
{
   int x, y;
   Pixel *pix;
   
   for ( y=0; y<framebuffer.height; ++y )
   {
      for ( x=0; x<framebuffer.width; ++x )
      {
          RGB col(0,0,0);
          pix = cbuffer[x,y];
          while ( pix!=0 )
          {
             col = col*(1.0f-pix->color.a) + pix->color.rgb*pix.color.a; //well or other blend function...
             pix = pix->next;
          }

          framebuffer[x,y] = col;
      } 
   }
}

In that way we could sort the pixels for true transparency and independent order. I think the main problem with this is that the GPU is not well prepared for managing pointers or to sort efficiently. On the other hand this task could be partially parallelized ( assigning one triangle to each GPU core or unit, halting only at insertion time ). And well… after all is not so much complicated… 30 lines of C code…

Other problem with that will be framebuffer occupation in VRAM(like 5-10x).

The OpenGL interface could be something like:

... draw your opaque primitives...
glEnable(GL_CBUFFER);
...draw your transparent primitives, which will call internally to the InsertPixel function...
glDisable(GL_CBUFFER);

SwapBuffers(); //flush opaque primitives, call internally the BlendPixels function and wait for vsync

What do you think? How could we implement efficiently a per-pixel-sorting algorithm for transparency here? Could be the future blend shader be adapted to use this?

This sounds kinda like the F-Buffer that Ati tinkered with:
http://graphics.stanford.edu/projects/shading/pubs/hwws2001-fbuffer/
http://hci.stanford.edu/cstr/reports/2005-05.pdf
The trouble with that seemed to be the potential overrun of FIFO buffers and therefore memory, in tandem with the trouble of providing a clean interface to the user for obviating that situation. Somehow I just can’t see this kind of interface in OpenGL. It just doesn’t feel right.

I agree that a really simple, fast, order-independent (read: automatic) transparency solution would be quite a boon.

By the way, just how “slow” is the peeling approach these days?
http://research.microsoft.com/research/pubs/view.aspx?tr_id=1125
OK, it’s not exactly blistering, but it’s not too shabby for a NV40. The authors themselves conclude that an alternative hardware implementation is probably the way to go, given the trend in current architectures.

Here are some other ideas (it’ll cost you to read about them):
http://comjnl.oxfordjournals.org/cgi/content/abstract/49/2/201

P.S. Added links.

I am just musing, here, but would you need to use a linked list per pixel? What about having OpenGL maintain sorted primitives that are order dependent (n.b. not all blending IS order dependent) and then render them in the correct order when the user wants?

You still have the issue of buffer overflow, but now OpenGL would have to deal with less data since it’d be a per primitive rather than per fragment.

it’s just an idea

cheers
John

Originally posted by john:
I am just musing, here, but would you need to use a linked list per pixel? What about having OpenGL maintain sorted primitives that are order dependent (n.b. not all blending IS order dependent) and then render them in the correct order when the user wants?

You can easily get cases where there is no definate order of what triangle is in front. (It is easy enough to construct a test case but I not so sure how often it would happen in practice)

Anyway, with the new programmable shaders and feedback buffers we can probably sort the buffers ourselves.

Nice links, Leghorn , very interesting.

Originally posted by john:
I am just musing, here, but would you need to use a linked list per pixel? What about having OpenGL maintain sorted primitives that are order dependent (n.b. not all blending IS order dependent) and then render them in the correct order when the user wants?

Yep, typical games uses per-object sorting with backface cull tricks or a sepparated BSP for the transparent geometry. However, when I play UT2003 or Everquest I can see some artifacts in the grass and trees, and is a lot noticeable and annoying.
Others just use additive/subtractive transparency, alpha test or MSAA.

Per-triangle can achieve decent results, but is still not perfect. Some time ago I did a 3D viewer that sorted the triangles. I could not sort more than 1000 triangles with an Athlon64 3500 without a considerable FPS reduction 8( even when I used a really fast algorithm, batching and a render state proxy. Ok, with the new geometry shader this could be accelerated though.
But I got tons of small artifacts with strange semi-coplanar triangles, T-junctions and intersecting triangles … I think per triangle-sorting is only good for billboards and closed meshes without strange triangles.

Also there is a problem… To sort well you can’t use only the triangle center… Need to use 3 vertices + center, which wipes the performance even more. You need a method too to improve render state/texture changes, because you problably are going to change texture and shader a lot… Using DX9 the calling overhead was a nightmare.

So yep, I think we need something like a linked list per pixel.

With something like a blend shader and MRTs you could implement order independent transparency yourself up to a certain number of layers.

Originally posted by santyhammer:
So yep, I think we need something like a linked list per pixel.
Yes such a thing would be nice, if it was as simple as that, what i read about the f-buffer and my own limited research is that, yes, using multiple layers of pixels(color and depth) one can achieve order independent transparency with the cost of some memory(a lot of it to be precise).
However, it only works on pure alpha transparency (unless you have a specific blend buffer for each layer that is)and the hardware would have to be designed for this in mind.

Although there are problems with it, i do think something like this can be implemented in hardware in due time, at least before they start switching to raytracing(which will solve this problem once and for all).

Originally posted by Humus:
With something like a blend shader and MRTs you could implement order independent transparency yourself up to a certain number of layers.
That sounds interesting! But we need the blend shader 8(

However, I think the linked list approach will consume less memory because the layers gonna allocate all the pixels while the list will allocate an entry only when it’s needed.

Also not sure if a “do it for yourself” will be a good idea. I think an automatic thing with the glEnable(GL_CBUFFER) could be easier and enough powerful… or perhaps not!

You can easily get cases where there is no definate order of what triangle is in front. (It is easy enough to construct a test case but I not so sure how often it would happen in practice)
Sorting based on primitive is, after all, the painter’s algorithm and so of course this problem comes up. It’s why z-buffers are used instead; but z-buffers only became practical when using that much memory on a frame-buffer wasn’t an issue.

However, implementing the equivilent of z-buffering for order independent transparency is, arguably, not a trivial extension to z-buffers and so my proposal was to fall-back to the painter’s algorithm.

But the trick is that this is a problem only for objects that have order dependent transparency. Not all transparency is order dependent (eg. setting the destination coefficient to zero), and the entire scene is not order dependent, either. My argument was to partition the incoming geometry into two sets: the order dependency set that’s stored for later, and the INdependent set that is rendered per normal. The order dependent set could then be sorted and rendered as normal (ie. including z-buffer for transparency fragments) just before the buffer is swapped.

I argue that this would get you 95% of the functionality you want from this system without too much of a memory burden. The driver has to separatre and sort triangles, and there is a potential problem from mutually overlapping triangles. But: the mutual exclusion isn’t a problem for occlusion, since you’d still use the z-buffer: it is ONLY a problem for computing the final colour because of ordering difficulties, and this is potentially true for only a small part of the scene.

I argue that its a reasonable compromise.

The linked-list-per-pixel approach was used in REYES in the mid-1980s - try Googling for REYES “A-Buffer”. That was software rather than hardware, of course.

Yep, a pointerless approach would probably be better for hardware, in terms of price and/or performance.

The R-buffer (c. 2001) provides for a pointerless implementation of the A-buffer, and the authors claim certain advantages over the F-buffer:
http://portal.acm.org/citation.cfm?id=383529&dl=ACM&coll=GUIDE

P.S. There be a lot of buffers here!

Here’s an extension (c. 2003) to the R-buffer that opts for a geometric rather than shaded fragment representation:
http://www.hybrid.fi/main/research/delaystreams/delaystreams.pdf

Add another one: The k-buffer :

No x-buffer? That will sound like p0rnz!

Originally posted by santyhammer:
No x-buffer? That will sound like p0rnz!
The xxx-buffer, a buffer that only stores red, flesh tones and obnoxious music. :wink:

Anyway, i don’t think any of these buffers will make it into hardware, they are to specialized.
If anything like this makes it in to hardware then it will be the most versatile but yet simplest one of them all.

Possibly something that can be implemented using only the blend shader and a heap of MRTs.

Aka the k-buffer :wink:

Yep, MRTs and a blend stage of sorts (depth peeling). The link I posted up there somewhere demonstrates the idea on yesterday’s hardware using MRTs and a frag shader (it’s in hlsl though).

For transparency the k-Buffer seems to require some object space sorting, and is still k-size and depth complexity dependent. But I like it.

Originally posted by Leghorn:
Yep, MRTs and a blend stage of sorts (depth peeling). The link I posted up there somewhere demonstrates the idea on yesterday’s hardware using MRTs and a frag shader (it’s in hlsl though).
I don’t think i would call it depth peeling, that would require you to render the same scene multiple times, and that’s bad because there are so many simpler methods that only require you to render it once, i think it’s simple to implement, but i do need a blend shader that can freely read and write to/from the frame/depth/stencil buffer + MRT.

Originally posted by Leghorn:

For transparency the k-Buffer seems to require some object space sorting.

true, but it’s a minor task compared to sorting and splitting(if needed) every transparent polygon.

I like all of them, 3D graphics algorithms named like Rebel Alliance space fighters.