# The Industry's Foundation for High Performance Graphics

#### from games to virtual reality, mobile phones to supercomputers

1. yeps, the approach was so wrong I deleted the post :P At any rate look at the newer post using the geometry shader. That is the way to go, but the shaer will need some love to work fast.

2. ahh sighs, there is a bug or two in that geometry shader version, here it is, hopefullfy will bugs gone.

Code :
```in vec3 DivPosition[];

int count;
vec3 fan_positions[9];
vec3 fan_DivPositions[9];

float current_clip[9];

/*
return the time when the expression is 0
*/
float
compute_clip_location(in float v0, in float v1)
{
/*
solve for t so that t*v0 + (1-t)*v1 = 0
*/
return v1/(v1-v0);
}

/*
there are 6 clipping planes:

-DivPosition.? <= gl_Position.? <= DivPosition.? for ?=x,y,z
we codify it as

v(sgn) . ? = sgn*gl_Position.? + gl_DivPosition.?

sgn=-1, 1, ?=x,y,z

store that value in each active element of the current fan
*/
#define compute_current_clip(float sgn, F) \
do { \
int i;\
for(i=0;i<count; ++i) {\
current_clip[i] = sgn*fan_position[i].F + fan_DivPosition[i].F; \
}\
} while(0);

/*
there is either two indices where the sign of current_clip
switches or the sign does not flip at all.
Record those two indices where it changes;
if there are no two vertices record -1 twice.
*/
void
find_sides(out int changes_at[2])
{
float v;
int aa;

changes_at[0]=changes_at[1]=-1;

aa=0;
v=sign(current_clip[0]);
for(int i=1; i<count; ++i)
{
float w;
w=sign(current_clip[i]);
if(w!=v)
{
changes_at[aa] = i;
++aa;
v=w;
}
}
}

void
add_vertex(inout int new_count, int i0, int i1)
{
float t;
t=compute_clip_location( current_clip[i0],current_clip[i1]);
fan_positions[new_count] = mix(fan_positions[i0], fan_positions[i1], t);
fan_DivPositions[new_count] = mix(fan_DivPositions[i0], fan_DivPositions[i1], t);

/*
if you have interpolates, save them here too for later abuse

"fan_fooInterpolate[new_count] = mix(fan_fooInterpolate[i0], fan_fooInterpolate[i1], t)"
*/

new_count++;
}

void
save_vertex(in int dest, in int src)
{
fan_positions[dest]=fan_positions[src];
fan_DivPositions[dest]=fan_DivPositions[src];

/*
if you have interpolates save them here too for later abuse
fan_foointerpolate[dest]=fan_foointerpolate[src];
*/
}

void
stitch(int changes_at[2])
{
if(changes_at[0]==-1)
{
if(current_clip[0]<0.0)
count=0;

return;
}

int new_count;

if(current_clip[0]>=0.0)
{
//"copy" unclipped vertices
new_count = changes_at[0];

//insert interpolate between changes_at[0] -1 to changes_at[0]

//insert interpolate between changes_at[1] -1 to changes_at[1]

for(int i=changes_at[1]; i<count; ++i, ++new_count)
{
save_vertex(new_count, i);
}
}
else
{
//insert interpolate between changes_at[0] -1 to changes_at[0]

for(new_count=1, i=changes_at[0]; i<changes_at[1]; ++i, ++new_count)
{
save_vertex(new_count, i);
}

//and now insert the value of change between changes_at[1] -1 to changes_at[1]

}

count=new_count;
}

#define do_it(sgn, F) \
do { \
int temp[2];
if(count>0)
{
compute_current(sgn, F);
find_sides(temp);
stitch(temp);
}
} while(0)

void
main()
{

count=3;
for(i=0;i<3;++i)
{
fan_Positions[i]=gl_PositionIn[i].xyz;
fan_DivPosition[i]=DivPositions[i];
}

/*
clip the triangle to the clipping equations
*/
do_it(1, x);
do_it(-1,x);
do_it(1, y);
do_it(-1,y);
do_it(1,z);
do_it(-1,z);

/* now send out the triangle fan */
for(int i=2; i<count; ++i)
{
gl_PositionOut = vec4(fan_position[0]/fan_DivPosition[0], 1.0);
EmitVertex();
gl_PositionOut = vec4(fan_position[i]/fan_DivPosition[i-1], 1.0);
EmitVertex();
gl_PositionOut=vec4(fan_position[i]/fan_DivPosition[i], 1.0);
EmitVertex()
RestartPrimive();
}
}```

For interpolates, one can put it directly in the add_vertex/save_vertex, and then reply them in triangle fan emit. This shader is far from optimal, but will give you what you want.

3. kRogue, you took it too hard, man. You know, everything is possible, there is always a way to get what you want. And you know, I have hundred times easier solution: save model-view-transformed and normalized Zc in the vertex shader, write the interpolated value as gl_FragDepth in the fragment shader and voila - you get linear depth buffer. No calculations at all. The only disadvantage is that the early z-test get switched off. But comparing to the custom-made clipping in geometry shader... OMG... Yes, it is possible to do what the clipper does using the geometry shaders. But doing all that just to save a value of Zc from being divided by Wc - nonsense! Fixed functionality does what we do NOT want? Aha, let's go Microsoft way - patch it all around instead of solving the actual problem the right way.

I don't understand why do you people perceive it as like I am just struggling to find a solution for my personal problem, kinda asking for help? No-no-no, my sincere motivation is the evolution of OpenGL, as I love it so much and I want it to be perfect (at least better than DirectX). For the last 20 years it evolved so much, but the most basic thing - the camera - still can not be emulated properly! That is ridicules: we got control of every aspect of rendering, so many things are hardware-accelerated now, but the way perspective division was done 20 years ago - we still have not a slight change there. Huge terrains, thousandths of instances of objects, sun flares, soft shadows, ambient occlusion, multisampling, bump-mapping, parallax, reflections, bla-bla-bla, modern super-realistic graphics - and we still unable to draw the player's character?!?!?! As the player had no body in first Half-Life - it still didn't get it even in Battlefield 4! People, wake up, something is wrong here! If we want to render realistic 3D, we need to start from the most basic thing - we need to be able to emulate the camera the right way. Can we do that?.. Nope, we still can not! I am desperate...

Well, every FPS game uses the same trick to solve the camera's issue: character's model is not rendered into the main camera (only fake hands we see). OK, we used to it already. Now let me show the case where even this trick is not applicable.

Imagine the game featuring the futuristic post-apocalyptic world of AI machines exploring the large open-world environment, placing the supporting buildings and fighting as groups between each other for the resources. The game engine need to be able to render large areas (hundreds of kilometers) - that is the requirement #1. Next. The player is one of those machines, so that is the first-person shooter game. The key of the gameplay is the ability to construct the own machine from the functional parts, any of which may as well be destructed in battle. Among with frame elements, armor parts, reactors, accumulators, engines, guns, gravi-pushers-pullers and other functional devices the user may use to compose it's machine's body - there is also cameras. The user is not restricted on the topology of the machine's frame as parts' collision and devices' functionality are simulated, so the user can place any functional element any way it wants (whatever makes sense from the designer's point of view), and probably it will be desirable to squeeze the camera somewhere in between armor elements to protect it from catching the enemy fire and get destroyed, leaving the player blind or force it to switch to another camera located somewhere in a safer place on the machine but where it does not give a good view angle. Therefore, the requirement #2 for the game engine is that the near clip plane for the camera should be not larger than the size defined by the actual game model of the camera, which is, obviously, can not be few meters wide - it must be as small as the real camera (matrix with lenses). And That places the restriction on near clipping plane to be a few centimeters at most.

Combine the requirement #1 with #2 and see the hardware restriction on the way. And there wouldn't be any problem here if we could have some control over the clipping and perspective division, but it simply works the same way it was working 20 years ago - now it just runs on the advanced hardware, which can do so much more, but we are too conservative to change anything. We keep dragging the same crippled algorithms and whenever we want to do smg in a better way we have to find a "workaround" - this is so wrong!

4. I think doing the clipping in GS with just two planes for the custom-z will run faster than doing it in fragment shader. Also, enabling z-clamping will likely make the performance comparable to usual jazz on z stuff.

The issue about the perspective divide being same for all (x,y and z) actually has a pretty strong mathematical reasoning behind it; I really do not want to write it up on a forum, it is a hassle.

Much of the pain for the stuff near the eye being borked can be gotten around by careful projection matrix and z-futzing. The easiest way is to use glDepthRange on models near the eye and again futzing with projection matrix together with floating point depth buffer. Essentially, break the scene into regions parallael to the near plane, and render by region and use glDepthRange to get the job done. In an ideal world, one could write the value for the arguments of DepthRange from GS for better flexibility (this is likely a smaller change in hardware, but it depends).

5. Originally Posted by kRogue
The issue about the perspective divide being same for all (x,y and z) actually has a pretty strong mathematical reasoning behind it; I really do not want to write it up on a forum, it is a hassle.
Don't worry, I realize why. But it doesn't matter on the final rasterization step for an invisible Z coordinate, especially after the clipping (and that is where the perspective division performed).
Well, yeah, there will be no use for my extension applicable to xy and interpolation components, truly. Carrying 4 extra FP numbers per vertex just to serve Z is somewhat redundant, I start to realize that...
Originally Posted by kRogue
Much of the pain for the stuff near the eye being borked can be gotten around by careful projection matrix and z-futzing. The easiest way is to use glDepthRange on models near the eye and again futzing with projection matrix together with floating point depth buffer. Essentially, break the scene into regions parallael to the near plane, and render by region and use glDepthRange to get the job done. In an ideal world, one could write the value for the arguments of DepthRange from GS for better flexibility (this is likely a smaller change in hardware, but it depends).
The glClipControl will make it even easier (whenever the extension will become available):
Originally Posted by Yandersen
The best workaround for high zFar/zNear values, I think, would be:
1) using the FP depth buffer,
2) glDepthFunc(GL_GREATER) - incoming fragment passes if depth is greater,
3) glClipControl(...,GL_ZERO_TO_ONE),
4) the following projection matrix:
Code :
```    | f/aspect  0      0      0    |
|                              |
|    0      f      0      0    |
P = |                              |
|    0      0      0    zNear  |
|                              |
|    0      0     -1      0    |```
where f = ctan(ViewAngleVertical/2),
aspect = Viewport.x / Viewport.y,
zNear: distance to the near clipping plane; this value could be very small (in 1e-XX range).

This setup will cull all the geometry behind the near clipping plane, and the resulting depth will decrease as the object get drawn further away asymptotically approaching 0 for the infinitely distant objects.
Obviously that will only work for FP buffers heavily exploiting the exponent part of the numbers in it's negative range (Xe-XXX), because 99% of depth values will be less than 0.000...

6. Originally Posted by kRogue
doing the clipping in GS with just two planes
Na-ah, all clipping planes must be worked out, because it doesn't matter which one will cause the clipper to insert new vertices calculating the Z for it in the wrong way.

7. I admit that clip control would be more feature encompassing, but to make hardware that runs fast enough, means alot of freaking sand. Current triangle clipper/setup engines are already a great deal (because of the nature of clipping) and a number of optimizations (namely gaurdband) are gone for what you are asking. My opinion is the following: since the functionality can be done via GS and that functionality is really only needed in practice for z, then it is likely a hard sale to add to the design of a GPU. Secondly, the end goal is to allow for dynamic ranging of the write to teh depth buffer; that is the end goal. This can be done by the glDepthRange bits, but has draw call break. SO a much more modest want: at the primitive level to specify the values for glDepthRange from a geometry shader. This will get what you want, is easier to read, and much less for hardware to implement.

8. Originally Posted by Yandersen
But anyway, using FP for depth buffer is not the solution I would be completely happy with.
Why not? It lets you do what you want, place the near clip plane very close while being able to draw objects very far away. You get a constant relative error, which makes sense since you want more coarse LOD meshes for your distant objects anyway. And you can do it now, on today's hardware, without compromising early-Z or depth buffer compression.

The view space linear depth buffer you want requires additional hardware since it's non-linear in screen space, i.e. it requires perspective correct interpolation. It also breaks delta encoding based depth buffer compression thus needing a more complex compression scheme and/or more bandwidth.

9. Originally Posted by kRogue
...to make hardware that runs fast enough, means alot of freaking sand... ...and a number of optimizations (namely gaurdband) are gone for what you are asking. ...a hard sale to add to the design of a GPU. ...much less for hardware to implement.
kRogue, let's not try to estimate how much sand it will cost unless you are the one who actually writes machine codes for the GPU or the designer of those chips. Only the actual nVidia, AMD or Intel engineers could give the right estimation. I am not one of them, so the points of my judgement are based on the knowledge of Intel command set I use sometimes writing sse-based geometry functions in assembler. From this point of view I claim that any vector operation with xmm registers (division, multiplication, inverse-square calculations and others) affect 4 values independently, so calculating 1/w would cost as much "sand" as calculating {1/w0, 1/w1, 1/w2, 1/w3 }. And the specs says that clipping is performed one plane at a time, so one component of that vector will be used for each plane. As for the last point, the interpolation, which is done after the perspective division, the last component (1/w3) replaces the first three which are used up by that point - that is the only additional shuffling operation (which is not costy at all). Well, I assume some generic sse-based hardware there, but you never know what monster is actually lurks in your card, so only the actual developers may tell you if they could implement the extension or not. And I believe that the cost of this extension will be derived from human-hours of work spent on rewriting the drivers without any relation to the "sand" as the current generalized-vectorized HW can do all this anyway. Therefore it is the trade between "how much time we spend" and "how much we get". We here can only estimate the second part: larger scenes, smaller near clipping plane, the same depth buffer required.
Why not?
Because the default window framebuffer has integer format. I know, silly argument (who draws there directly nowadays, right?), but 24bpp depth could be used with 8bpp stencil, while FP depth buffer could be paired with 8bpp stencil in 64bpp structure only. Aside from that, why to use more memory - why not just utilize the smaller buffer in a better way? Wouldn't it be more rational?
You get a constant relative error, which makes sense since you want more coarse LOD meshes for your distant objects anyway. And you can do it now, on today's hardware, without compromising early-Z or depth buffer compression.
Yes, there is no other choice currently. Two posts above I requoted my target solution with FP buffer, but the glClipControl is not available on my card yet. Still waiting for nVidia to update their drivers so my GT 520 could get it...
The view space linear depth buffer you want requires additional hardware since it's non-linear in screen space, i.e. it requires perspective correct interpolation. It also breaks delta encoding based depth buffer compression thus needing a more complex compression scheme and/or more bandwidth.
Did you heard about w-buffers? I was surprised Microsoft did some smart trick with their DirectX. Perfect solution for linear depth buffers, couldn't be done better, IMO. OpenGL still chewing a gum indifferently...

Would be awesome if instead of glClipControl() we would get smg more general like glRasterizerParameteri() to configure origin, depth mode and other parameters of fixed functionality. Because glClipControl looks like a sort of Microsoft-style hot patch to me, falling out of the OpenGL style with those two single independent parameters it sets. Would be awesome to be able to select depth mode 0_1 and w as the source of fragment depth coordinate using glRasterizerParameteri. But, well, it is done the way it is done, the Guys do what they want the way they want, commercial world doesn't care about some indie geeks down there...

10. Originally Posted by Yandersen
Well, I assume some generic sse-based hardware there, but you never know what monster is actually lurks in your card, so only the actual developers may tell you if they could implement the extension or not.
There is plenty of information on the architecture of recent GPUs out in the public, and if you look at it you will find that they generally use wide vector units where each work-item (vertex, fragment, etc.) uses a single lane. I.e. from the perspective of a fragment the execution units are effectively scalar. You don't get vector operations "for free".

Because the default window framebuffer has integer format. I know, silly argument (who draws there directly nowadays, right?), but 24bpp depth could be used with 8bpp stencil, while FP depth buffer could be paired with 8bpp stencil in 64bpp structure only. Aside from that, why to use more memory - why not just utilize the smaller buffer in a better way? Wouldn't it be more rational?
You're ignoring the effect of depth buffer compression (and the likely possibility that stencil is stored separately in memory). While you have to allocate the full buffer for the worst case, the average amount of depth/stencil data stored is much less than 40bpp. And keeping depth linear in screen space (i.e. noperspective) likely allows better compression than view space linear depth such that using the smaller buffer in a "better" way might actually lead to higher bandwidth usage.

Did you heard about w-buffers? I was surprised Microsoft did some smart trick with their DirectX. Perfect solution for linear depth buffers, couldn't be done better, IMO. OpenGL still chewing a gum indifferently...
W-buffer support was deprecated in D3D10 for lack of hardware support, and it had always been optional before that.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•