Z-Buffer and occ queries improvements

santyhamer · January 13, 2007, 7:06pm

I have some suggestions to improve the Z-buffer:

1) Store second nearest depth . Proposed by Woo/Wang/Molnar like 20 years ago and used in the Pixar’s Renderman 11 (midpoint shadow filtering). Very useful for shadows, transparency, translucency and subsurface scattering. I bet will be very easy to implement in HW and could be very nice.

See this:
http://www.renderman.org/RMR/Books/infbeyond.pdf.gz
(find “midpoint”, “zero bias shadow”)

          [http://www.gamedev.net/community/forums/topic.asp?topic_id=266373](http://www.gamedev.net/community/forums/topic.asp?topic_id=266373)

(looks like Carmack wants to use this for QuakeWars)

Anybody know how can be done this with the current SM3.0 hardware? I tryed all but the floating point epsilon errors prevent me to do it well in the 2nd fragment shader pass…

2) Irregular Z-buffer. Instead of storing all the pixels equally-spaced in the depth buffer, allow to store it in a variable distance way.

See this:

          [http://www.cs.utexas.edu/ftp/pub/techreports/tr04-09.pdf](http://www.cs.utexas.edu/ftp/pub/techreports/tr04-09.pdf)              
          [http://www.tacc.utexas.edu/~cburns/papers/izb-tog.pdf](http://www.tacc.utexas.edu/~cburns/papers/izb-tog.pdf)              
          [http://www.tml.hut.fi/~timo/publications/aila2004egsr_paper.pdf](http://www.tml.hut.fi/~timo/publications/aila2004egsr_paper.pdf)              
          [http://en.wikipedia.org/wiki/Irregular_Z-buffer](http://en.wikipedia.org/wiki/Irregular_Z-buffer)

Could be used too in shadows.

32bits stencil. 8bits stencil can be not enough. Using 32bits ( with can yield us a 64bits D32S32 depth buffer) you could be pixel IDs for a G-buffer or whatever controlling their values on Zfail/Zpass.
Integration of the ZBuffer with the future “blend shader”. If a future blend shader is comming, please allow us to read/write the Zbuffer. Kill the “can’t read ZBuffer because the FBO is in use” thing ( I know, I know… performance problems… but some things like per-pixel sorting in one pass or the mentioned 2nd depth requires read AND write access… )
Depth cubemaps. I know, they are coming… But I want them with FETCH NxN feature. Basically like the ATI’s fetch4 but greater. I could want to get, in one textureCube call, the 5x5 block of depths surrounding an specific pixel ( for example, to perform a fast PCF ). Ofc I could to this with multiple textureCube calls, but the fetch thing is better because won’t have to re-divide by the greater coordinate to get the face because was already calculated… Also to do the shadow comparison by HW in an NVIDIA way.
Dynamic-in shader depth query. Strange concept this… Imagine I want to see if a pixel is in a shadow or not… Basically I want to execute an occlusion query for the current pixel viewed by a specific light position. So, inside the framebuffer fragment shader, it uses an FBO of 1x1 pixel. Then puts the camera in the light and project the pixel to camera space. Then creates a 1D-line zbuffer with all the in-line-of-sigth triangles ( using a modified guard band clipping to know the triangles hit by the ray )… Basically does a IsPointVisibleFrom(3DWorldPoint,3DWorldOtherPoint) using a 1pixel zbuffer. What I want is an GLSL instruction to launch inside the pixel shader occlusion queries without having to generate a shadow map. I want all in one pass/rendering for speed and not to write the light visibility function in a texture that suffers aliasing.

We could do pseudo-raytracing with this, but only gives us only a boolean visibillity function using oclussion queries. Speed gonna be a problem, but consider we aren’t going to call this masively in the fragment shader ( 8/16 times max and only inside the shadow dynamic branch )

I know, sounds scary but can kill definitely the shadow map aliasing and perform amazing pseudoraytraced shadows.

thx

Korval · January 13, 2007, 7:56pm

Then creates a 1D-line zbuffer with all the in-line-of-sigth triangles ( using a modified guard band clipping to know the triangles hit by the ray )
Whoa, false start.

Triangles aren’t triangles in the fragment shader. All that information has been lost. Or, equally likely, has not yet been transmitted to the GPU.

In an attempt to improve the signal-to-noise ratio on this forum, and thus actually making it useful to people who do real hardware development, you should keep to yourself ideas that are fundamentally unimplementable. Like those that require turning a scanline renderer into a scenegraph renderer in order to implement.

Bob · January 14, 2007, 2:19am

Just commenting on 3, nothing more…

OpenGL already supports 32-bit stelcil buffers. It supports any number of bits in the stencil buffer, or any other buffer for that matter. Actually providing that number of bits is the job of the implementation, which is a different issue and not about OpenGL itself. So for larger stencil buffers, ask hardware companies as OpenGL is already capable of handling it.

k_szczech · January 14, 2007, 2:46am

1) Store second nearest depth

Anybody know how can be done this with the current SM3.0 hardware?
First pass:
Render scene - render depth to texture - you get nearest z
Second pass:
Render scene - discard pixels that have less or equal Z to the one stored in texture - you get second nearest Z
You can repeat second pass if you need more Z values.

2) Irregular Z-buffer
When rendering an image you need to test pixels on the screen which are equally-placed and so must be their z-buffer.
Sho this would apply only to shadowmaps, and there are ways of achieving it cheaply on current hardware. You only need to map vertex coordinates from regular space to your desired space. The better tesselation you have, the better the results will be.

4) Integration of the ZBuffer with the future “blend shader”

some things like per-pixel sorting in one pass
Read/write access to z buffer wil not be enough to implement per-pixel sort. Values you wrote to a pixel are not remembered - only the final blended value is stored. If at some point you want to render pixel in the middle of these values then you would have to separate values behind and in front of that pixel (‘unblend’ ?!?!). Impossible.

or the mentioned 2nd depth requires read AND write access
It does not require that. See my reply to 1).

5) Depth cubemaps

Ofc I could to this with multiple textureCube calls, but the fetch thing is better because won’t have to re-divide by the greater coordinate to get the face because was already calculated
What about adges and corners? I believe implementation would have to re-divide there. If not, then it would have to implement some texture coordinate transformatoin from one face to another.
Note that re-dividing is not much of a problem - fragment shaders are usually powerfull enough for a few more operations. Otherwise we would be always limited by their performenace and it’s usually the memory bandtwidth that limits us (fragment processing is done parallelly in multiple processors, memory is only one).

6) Dynamic-in shader depth query
You propose advanced solutions but seem to be lacking fundamental knowledge.

Yeah, I’d love to see GPU’s doing pure raytracing instead of rendering - we would just need to pass geometry, materials and shaders to GPU and don’t bother with shadows, reflections, refractions and light scattering. I think hovewer, that we’ll just have to wait a bit longer for these to hit stores

zeoverlord · January 14, 2007, 4:29am

Originally posted by santyhammer:

We could do pseudo-raytracing with this
I rather do real ray tracing, it shouldn’t be that hard to do, just modify some texture memory with thousands of small fixed function ray intersection cores that returns the intersections to the fragment shader when called.

It might not be pure ray tracing or that powerful in the beginning, but it’s a start and in theory it should be powerful enough to replace stencil shadows.

32Bit stencil - yes please, and also 64 bit depth. if i might be so bold.

Integration of the ZBuffer with the future “blend shader” - yea, the blend shader should have read and write privileges to all data (color, depth, stencil and such) in all the framebuffers, FBOs, renderbuffers, MRTs to do it’s job correctly (that’s what it’s for), but only on the current pixel/fragment.

santyhamer · January 14, 2007, 8:10am

Originally posted by k_szczech:
First pass:
Render scene - render depth to texture - you get nearest z
Second pass:
Render scene - discard pixels that have less or equal Z to the one stored in texture - you get second nearest Z
You can repeat second pass if you need more Z values.

I have floating precission problems with:

Originally posted by Bob:
OpenGL already supports 32-bit stelcil buffers

Oh yep, true. Well i was referring to get real HW support for it hehe.

Otiginally posted by Korval:
Triangles aren’t triangles in the fragment shader. All that information has been lost.

I agree with you. However, the SM4.0 gives you a polygonID and a primitiveID. The pixel shader could use that to iterate internally over all the primitive triangles and test if a ray hits the current pixel triangleID. The iteration could be done using different methods like I proposed ( hierarchical-stored vertex buffer with BSPs, octrees, etc… using an internal geometry shader with that triangleID/polygonID testing triangle-ray hit, using a 1D-1pixel dynamic occlusion query from the light, etc… )

Think in the upcoming years we need to start touching raytracing in realtime… So lets start to propose a basic method and start to thing about this. And remember… the AGEIA DOOOOES this in HW ( see the ClosestHit function in the SDK and the shadow terrain/raycast example ) so the HW technology to do this is already in the market…

Notice too the realtime languages just copied an adapted Renderman shading language… See Renderman has the “gather” function to do this…
Lets copy it WELL

I don’t see why we can’t get a GLSL instruction for shadows… after all we are getting physics using the GPU so why not a basic physics/graphics function like IsPointVisibleFrom inside the fragment shader? Again, I understand all your scepticism if you look this with your current hardware eyes… But, please, remember the title of this forum.

Originally posted by zeoverlord:
eah, I’d love to see GPU’s doing pure raytracing instead of rendering - we would just need to pass geometry,

Oh well, I just proposed other method to do raytracing from my prev glRaycast proposal using the existing occlusion queries and 1pixel Zbuffer associated with the light ( I only need to test one pixel so are different than the queries really ).

Again, I don’t wanna change the current raster-triangle paradigm, just need some funcitons to do 3-16 raytraced shadows without texture aliasing. I think we arent prepared yet for pure raytracing solutions… so lets test some basic raytracing instructions in the fragment shader before to jump to the big pool!

zeoverlord · January 14, 2007, 2:14pm

Originally posted by santyhammer:

Again, I don’t wanna change the current raster-triangle paradigm, just need some funcitons to do 3-16 raytraced shadows without texture aliasing. I think we arent prepared yet for pure raytracing solutions… so lets test some basic raytracing instructions in the fragment shader before to jump to the big pool!
I don’t think we need to remove or change the raster-triangle paradigm just yet, just add high performance ray testing to it among other things.
Once that is done and the technology has matured a bit new API’s will take over, and when everything uses ray tracing, one could drop the raster-triangle paradigm.

And on top of this, ray testing can be used for GPGPU physics.

santyhamer · January 14, 2007, 3:09pm

Hey! I was thinking… perhaps CUDA/CTM can interact with the fragments shaders to perform these ray tests! Unfortunally there is a NDA… and the public papers are too abstract.

If they cannot interact with fragments shaders… perhaps we could mix CUDA/CTM raytraced shadows and blend it in the pixel shader? Interesting, I can’t wait to see the final version finished.

Korval · January 14, 2007, 4:53pm

However, the SM4.0 gives you a polygonID and a primitiveID. The pixel shader could use that to iterate internally over all the primitive triangles and test if a ray hits the current pixel triangleID.
False start. Again.

polygonID and primitiveID are just numbers. They do not imply that the fragment shader has access to some titanic array of primitives and polygons that it can iterate over.

And even if it did, they would only be for the current rendering object, not the scene altogether. In order to do any kind of raytracing, rasterization cannot start until all rendering commands are completed, and every triangle has been through vertex (and geometry processing). And no modern graphics card is designed in any way, shape, or form to do that.

And even if they did, being able to iterate over all polygons in the world is far from sufficient to being able to do performant raytracing. That requires the use of spatial partitioning schemes to limit the number of ray-object intersection tests that go on. Which is an entirely new concept that GPUs don’t even come close to supporting.

Brolingstanz · January 14, 2007, 6:27pm

Hey, I’m ready for 100 years from now, yesterday.

Let’s get this show on the road!

santyhamer · January 14, 2007, 6:28pm

Originally posted by Korval:
That requires the use of spatial partitioning schemes to limit the number of ray-object intersection tests that go on. Which is an entirely new concept that GPUs don’t even come close to supporting.
Yep, that is what the PhysX does with the cooked meshes. It stores the meshes using an hierarchy kdtree as I suggested some time ago for the VBOs.

Btw, the PhysX has barely 125M of transistors and can do millions of ray-triangle tests per second ( the shadows demo goes 500FPS with tons of boxes, a terrain, etc )

                      [http://www.xbitlabs.com/articles/video/display/ageia-physx_2.html](http://www.xbitlabs.com/articles/video/display/ageia-physx_2.html)

Photos for the people that havent seen what I’m talking:

So, if GPUs don’t wanna do that, perfect… but is absolutely achiveable in hardware. You can download the SDK at the AGEIA web page if you don’t trust me ( but need the card to run in HW mode, if not uses SW )

Also, I said we aren’t prepared yet for true raytracing… All I want, by now, is some kind of “closestHit” in the pixel shader like Renderman does. Like AGEIA PhysX does. Like CUDA/CTM are going to do ( well, this is speculative ). Not so hard I bet.

These are the shadows I’m working on captured directly from the xNormal viewer in realtime:

( DX10 ones are better but I cannot post it because i’m over some NDAs but are raytraced and the closeHit function will be nice )

Btw, I can’t get the second nearest depth work on this due to the floating point errors I mentioned, so currently uses bias because the cloth in the sword makes horrible artifacts with the back/front face approach.

I need 3-10 closestHits inside the shadow branch to perform better this. If you don’t like the raytracing think in the other methods I proposed ( the dynamic occlusion queries for example ), but give me a IsPointVisibleFromPoint() inside the fragment shader because I need it for aliasing-and-bias-free-penumbra-soft-shadows.

After we have this running at decent speed then, and only then, we could think to do real Monte-Carlo and real raytracing. Then GI with photon distribution… but that’s far future. We need to start by something simple, like the thing I proposed some time ago.

Every physics engine(ODE, Novodex, Newton) have a routine to raycast-test a triangle, bbox/sphere hit, bbox/mesh collision, etc… without these routines their physics gonna be interesting…

zeoverlord · January 14, 2007, 11:04pm

Originally posted by Korval:
[b]
polygonID and primitiveID are just numbers. They do not imply that the fragment shader has access to some titanic array of primitives and polygons that it can iterate over.

And even if it did, they would only be for the current rendering object, not the scene altogether. In order to do any kind of raytracing, rasterization cannot start until all rendering commands are completed, and every triangle has been through vertex (and geometry processing). And no modern graphics card is designed in any way, shape, or form to do that.
[/b]
That’s assuming your testing against the data you are currently rendering with, which is stupid.
The process would be simmilar to

upload a bunch of VBO’s to the memory (or create them using geo shaders).

2.Bind them for use as raytest data, this would in effect convert polygons to 4 plane equations + primitive id and move that data to the ray test processor(witch could be a separate chip).

while rendering, a raytest call can be issued by the fragment shader, witch would then send the ray data(basically a vertex and a normal) to the ray test chip.
the chip processes the ray on all it’s processors (and there might be tens of thousands of them since they are extremely small) at the same time (each sub processor may store and test up to 100 polys).
the result is then returned as a series of lengths and poly id’s and evaluated by the GPU and finally return the evaluated intersection fragment data (or just at what distance it was intersected) to the fragment shader for further processing.

From a hardware point of view it’s not totally unlike reading from a texture in the texture memory, only you get a lot more data in return.

santyhamer · January 15, 2007, 7:32am

Originally posted by zeoverlord:

upload a bunch of VBO’s to the memory (or create them using geo shaders).

2.Bind them for use as raytest data, this would in effect convert polygons to 4 plane equations + primitive id and move that data to the ray test processor(witch could be a separate chip).

Yes, store a normal VBO + some kind of hierarchical structure(octree, bsp, whatever) with its vertices + mesh AABB + primitiveID, exactly like the PhysX does.

while rendering, a raytest call can be issued by the fragment shader, witch would then send the ray data(basically a vertex and a normal) to the ray test chip.

Yep, ray origin, ray direction and ray maximum distance to optimize. Only triangle in front of the ray origin should be tested to maximize the performance. If you need double-side rays fire manually same ray with the direction inverted.

the chip processes the ray on all it’s processors (and there might be tens of thousands of them since they are extremely small) at the same time (each sub processor may store and test up to 100 polys).

Yep, you can parallelize that. For extra performance will be good to fetch 4 or more rays at the same time ( for area lights ) so can be even more parallelized.

Notice too game scenes don’t contain usually more than 15k triangles visible so this task is not very slow with a hierarchic spatial structure. If we can process 4 rays in one call I could use only 2 calls for the shadows in the screenshot ( yes, it only uses 8 samples ) and, again, only on the shadow dynamic branch so the test will be only done for a few pixels.

The process is simple. For all the visible meshes from the light(could use internally occ queries to do this btw or I could do this in SW and pass to the ray chip manually), iterate each mesh AABB. If the ray collides with the bbox then recurse hierarchically into its triangles until a triangle hit is found.
IHV can decide the best spatial structure(kdtree, octree, grids, etc). For dynamic and morphing meshes they could perform a meshAABB->bruteforce ray-triangle test calling the geometry shader internally from the fragment shader(hmm unified shaders can help here) like:

the result is then returned as a series of lengths and poly id’s and evaluated by the GPU and finally return the evaluated intersection fragment data (or just at what distance it was intersected) to the fragment shader for further processing.

If they only return a “yep, point is visible from light” will be more than enough for me. Returning only TRUE of FALSE like the old occlusion queries can reduce tons of banwidth. If you want to return baricentric coordinates, primitiveID, triangleID, normal, etc… even better!

Good, zeo caught the idea well.
Korval is right though… this will need a small change in the GPU design… but I think is achiveable ( think again in the AGEIA with only 125M transistors can do this and much more ).

Now the good news… with the 2nd depth z-buffer and this basic closestHit() inside the fragment shader our shadows and SSS will be IMPRESSIVE. Notice too this kind of shadows does not require an extra pass like shadowmaps and neither have texture aliasing(there is no texture) or biasing problems(use polygonID to discard incorrect bias self shadowing)