Order-independent real transparency can be hard.
At the moment we can use:
-
Depth peeling. Slow, artifacts can appear if not enough layers are used.
-
Alpha-test with MSAA. Partial solution, can’t manage real transparency. Only fits well for vegetation, fences, etc… Requires antialiasing enabled.
-
Manual(CPU) triangle sorting, BSPs. Slow, CPU-GPU sync problems.
-
Additive/subtractive transparency. Independent sort but only works for emisive particles, glass and black/white smoke effects. Can’t manage “alpha blend” surfaces really.
-
Backface cull tricks. Sort object by distance. Draw backfaces first, then front faces. Play with the alpha blend methods like DSTALPHA. Tons of artifacts, slow because you need to execute two sepparate drawing calls per object.
-
Tiling. The PowerVR Kyro II processor used independent-order transparency dividing the screen into tiles.
See
http://www.anandtech.com/showdoc.aspx?i=1435&p=3
http://www.aceshardware.com/Spades/read.php?article_id=25000226
http://www.powervr.org.uk/newsitepowervr.htm
Multiple problems with that… a nightmare to implement as Tim Sweeney said some time ago.
- Geometry shader. Good for particles and to sort by HW the triangles. Poor HW support at the momemnt. Perhaps slow due to the sorting + texture/shader constant changes (not sure, DX10 is not out yet!) Triangle sorting will fail for some special cases ( coplanar triangles, T-junctions, intersecting geometry )
What about to use something like a CBuffer? The framebuffer will be divided into transparency layers. When a pixel is written to the framebuffer it will compare the depth and sort the list for that pixel:
struct Pixel
{
RGBA color;
float depth;
Pixel *next, *prev;
};
Pixel[,] cbuffer; //array of sorted back-to-front pixels
void InsertPixel ( const Pixel& newPix, const int x, const int y )
{
//Insert new pixel in the list
Pixel *pix = cbuffer[x,y];
while ( pix!=0 && newPix.depth<pix.depth )
{
pix = pix->next;
}
if (pix!=0)
{
pix->nxt = newPix;
newPix->prev = pix;
}
else
{
cbuffer[x,y] = newPix;
}
}
void BlendPixels ()
{
int x, y;
Pixel *pix;
for ( y=0; y<framebuffer.height; ++y )
{
for ( x=0; x<framebuffer.width; ++x )
{
RGB col(0,0,0);
pix = cbuffer[x,y];
while ( pix!=0 )
{
col = col*(1.0f-pix->color.a) + pix->color.rgb*pix.color.a; //well or other blend function...
pix = pix->next;
}
framebuffer[x,y] = col;
}
}
}
In that way we could sort the pixels for true transparency and independent order. I think the main problem with this is that the GPU is not well prepared for managing pointers or to sort efficiently. On the other hand this task could be partially parallelized ( assigning one triangle to each GPU core or unit, halting only at insertion time ). And well… after all is not so much complicated… 30 lines of C code…
Other problem with that will be framebuffer occupation in VRAM(like 5-10x).
The OpenGL interface could be something like:
... draw your opaque primitives...
glEnable(GL_CBUFFER);
...draw your transparent primitives, which will call internally to the InsertPixel function...
glDisable(GL_CBUFFER);
SwapBuffers(); //flush opaque primitives, call internally the BlendPixels function and wait for vsync
What do you think? How could we implement efficiently a per-pixel-sorting algorithm for transparency here? Could be the future blend shader be adapted to use this?