PDA

View Full Version : Customized clipping volume



Yandersen
08-22-2014, 08:24 PM
According to the current specification (http://www.opengl.org/registry/doc/glspec45.core.pdf#section.13.5) the clipping volume is defined by:

-Wc <= Xc <= Wc
-Wc <= Yc <= Wc
Zmin <= Zc <= Wc

where Xc,Yc,Zc,Wc are the clip coordinates produced by the vertex shader as the components of gl_Position, and the Zmin is either -Wc or 0 depending on the value of depth mode set by glClipControl function. After the clipping, the division of {Xc,Yc,Zc} is performed by {Wc,Wc,Wc} producing the normalized device coordinates.

As the upper clipping bound (which is also the division vector) assembled from 3 identical values {Wc,Wc,Wc}, then all three coordinates {Xc,Yc,Zc} are divided by the same value. Technically, I see no reason to restrict the division vector from being assembled with unequal values (not necessarily {Wc=Wc=Wc}). By letting the vertex shader to output the division vector explicitly the additional functionality may be achieved (will be discussed below).

So the proposed extension references to the shading language adding an additional _optional_ output for the vertex shader stage:


out vec3 gl_PositionDiv;


If the vertex shader writes to the gl_PositionDiv, then the clipping volume is defined by:

-gl_PositionDiv.x <= Xc <= gl_PositionDiv.x
-gl_PositionDiv.y <= Yc <= gl_PositionDiv.y
Zmin <= Zc <= gl_PositionDiv.z

where Zmin is either -gl_PositionDiv.z or 0 depending on the value of depth mode set by glClipControl function. If the vertex shader does not write to gl_PositionDiv then that vector is automatically assembled as:

gl_PositionDiv = gl_Position.www

which is being essentially an equivalent of the fixed functionality implemented currently.
After the clipping the normalized device coordinates are calculated by dividing the gl_Position.xyz by gl_PositionDiv.

The major problem this extension is targeted to solve is a poor z-buffer utilization due to the uniform division of x,y and z components. Specifying the separate division coefficients for xy and z opens a new possibilities to control the distribution of depth values. In particular, the Far clipping plane can be eliminated to allow drawing the objects any distance away and Near clipping plane can be set at much closer distances compared to the conventional setups. This will make it possible to render the large scenes without introductions of separate cameras or overpushed Near clipping plane; the special/oversized depth buffer formats would not be required either.

The example of such setup using the proposed extension is described in that post:
http://www.opengl.org/discussion_boards/showthread.php/184663-Linearize-the-Depth-buffer?p=1261151&viewfull=1#post1261151

NOTE: the gl_PositionDiv is defined at that post as vec4 type. I can not edit the post as the time limit has expired. But it might be even better to make gl_PositionDiv a four-component vector and use the forth component as clipping bounds for Wc just the same way the first three are used (unless it will have a considerable performance cost). This will further increase the functionality of the proposed extension.

Yandersen
08-28-2014, 02:31 PM
Can't sleep well having no answer for the topic - so badly I want that extension. :)
I even reposted that on nVidia (https://devtalk.nvidia.com/default/topic/770881/opengl/the-ultimate-remedy-for-z-fighting-requires-extension-/) forums and placed a reference at AMD (http://forums.amd.com/game/messageview.cfm?catid=453&threadid=177539&enterthread=y) forum - dead silence! Is there any way to contact anyone who can at least tell if this extension is possible to implement at all? Anybody? Please!.. :dejection:

Agent D
08-28-2014, 02:48 PM
If you wan't this so badly, why don't you devide the components of gl_Position by your special division vector, set the w component 1.0 and see if it really is what you want.

Also, if it can be implement in the shader by yourself and nobody else on earth needs it, why would you create a GL extension for it?

Do you even have a use case for this?

Yandersen
08-28-2014, 04:04 PM
If you wan't this so badly, why don't you devide the components of gl_Position by your special division vector, set the w component 1.0 and see if it really is what you want.Division must be done after the clipping. Doing so in vertex shader is a potential source of overflow and division-by-zero for those which trap into the area close to the clipping plane (W plane, "division plane").

Also, if it can be implement in the shader by yourself and nobody else on earth needs it, why would you create a GL extension for it?

Do you even have a use case for this?Right now I save the value of the negated z-component, interpolate it and write as gl_FragDepth implementing linear z-buffer which allows me to draw the scene without artifacts with zFar>200000 and zNear=0.001 (can be even smaller) having 24 bits of depth buffer. With this setup I have a z-buffer able to distinguish between surfaces 0.01 apart from each other over the whole 200000 distance.
This action is essentially equivalent to dividing the negated z-component by '1.0' (instead of 'Wc' which is still used for x and y components) and using Zmin=0 instead of zMin=-Wc (which may be set via glClipControl).

Making the components of division vector separately specified instead of uniformly set to the same value will let us control the distribution of z-values independently from xy.

What will we get? We can draw incomparably larger scenes with an incomparably smaller constant zNear without a need to introduce additional cameras, dynamic adjusting of zNear, and even a standard 24bit depth buffer will be sufficient.

Practically, with this extension we would be able to attach a camera-object of a size as small as a human's eye right onto the character's model and it would be able to capture the large open-world scene as well as the parts of the character's model the camera is mount on. This will let the game engines to render and simulate the character's body uniformly with the other game objects without "making it transparent" or introducing fake parts rendered individually which cause those to not receive shadows, decals, not colliding with other objects, e.t.c.

Well, there are techniques that make it achievable even without the extension, yes, and there are games that prove it is all possible. But all those tricks have computational expenses, while the proposed extension does not add any additional computations overhead: there are 3 values (Xc,Yc,Zc) being divided anyway, and I see no technical difference of dividing those by a vector which has different components instead of the equal ones. So all it takes is to allow us to specify those components explicitly instead of getting them assembled by a fixed functionality. Or is there is smg I do not take into account?

malexander
08-28-2014, 04:57 PM
The hardware units that deal with depth writes and testing are fixed-function units (also color blending/writes), perhaps that's why only fixed 16b/24b and [0,1] FP32 is used. If the full FP32 range were available for the z-buffer, I would think that we'd at least see an extension to glDepthRange() to allow values outside of [0,1]. (I also wouldn't mind seeing an increased z-buffer range for some bad cases artists create, btw.)

Yandersen
08-28-2014, 06:31 PM
I see you did not read the actual definition - just a topic's name, ay? ;)
The resulting NDC coordinates are still fall in range [-1...1]. Only the clipping bounds I suggest to make unequal - instead of
{Wc,Wc,Wc} vs {-Wc,-Wc,zMin} I suggest to use custom possibly unequal values for each of the coordinates:
{Wcx,Wcy,Wcz} vs {-Wcx,-Wcy,zMin} which are derived from the explicitly set division vector
{Wcx,Wcy,Wcz} instead of the single tripled value Wc taken from the forth component of gl_Position.

Here is a clear explanation with examples (https://devtalk.nvidia.com/default/topic/770881/opengl/the-ultimate-remedy-for-z-fighting-requires-extension-/).

l_belev
08-29-2014, 03:31 AM
Generally they tend to reduce the fixed functions, not to extend them. I would rather have them remove perspective division altogether and let the shaders do it themselves if they need it.
Division is one of the more hardware-taxing operations, its cost is relatively high and is good to avoid when not necessary.
Your proposal implies 3 divisions instead of single one per vertex. (note that dividing all x, y and z by the same value w really means single division (1/w) and 3 multiplications, which are far cheaper)

Yandersen
08-29-2014, 05:44 AM
Generally they tend to reduce the fixed functions, not to extend them. I would rather have them remove perspective division altogether and let the shaders do it themselves if they need it.Once again. Perspective division must be done after the clipping. You can not clip primitives in vertex shader. Division can not be done for the zero or tiny numbers in divisor as the result is undefined. And the clipping and division using the same number also ensures that the result lays in range [-1...1]. My point is that the number may be different for different components, not necessarily the same for all.

Division is one of the more hardware-taxing operations, its cost is relatively high and is good to avoid when not necessary.
Your proposal implies 3 divisions instead of single one per vertex. (note that dividing all x, y and z by the same value w really means single division (1/w) and 3 multiplications, which are far cheaper)?! O.O Sorry, but as far as I know, the GPU hardware is vectorized - processing 4 items at once. Calculation of 1/w is the same thing as calculation of {1/w,1/w,1/w,1/w}. The source register may be stuffed with 4 different items to produce {1/wx,1/wy,1/wz,1/ww} and it will take the same amount of cycles as if all values would be equal - it doesn't matter what were the actual contents of the source xmm register (assuming hardware with SSE support).

mbentrup
08-29-2014, 07:01 AM
That w (resp. 1/w) value is not only needed for clipping, but also for perspective correct attribute interpolation. How would that work if you get three different w values ?

kRogue
08-29-2014, 07:18 AM
At any rate, this extension idea seems terribly fishy anyways. As pointed out, normalize your w to one then divide by those gl_PositionDiv factors. The only issue where it is a problem is when none of the fields from gl_PositionDiv is negative or close to zero. But the world is not ended. You can do this all with a geometry shader and do the clipping yourself [a triangle clipped N-times produces a triangle fan of no more than N+1 triangles by the way]. Also, the entire point of dividing by the same value, w, is to do perspective. What it looks like you really want is just for depth values, which can be done by by hand from the vertex shader anyways by normalizing yourself. The implicit requirement of w>0 means infront of the eye (but not the near plane), which I assume you'd want anyways. Normalize the z's yourself is my advice.

Asking for a different divide value for each element of gl_Position.xyz would make the clipper (fixed function part) of a GPU take even more sand, in order to keep the same performance of triangles to clocks because various performance shortcuts for the most common situation (namely all w's are positive and away from zero) would be gone. On subject of that, most hardware (if not all) has a dedicated unit handling triangle setup and clipping all rolled into one. Additionally, most implementations have a guard band logic to avoid clipping and let scissoring do the job. To be precise: if a triangle has all of the w's are positive (and away from zero) and z's in happy range [-1,1], then scissoring takes care of the clipping volume (essentially). If all the w's are positive and some of the z's are icky, then a triangle needs to get clipped against just the two z requirements, which results in at most 3 triangles. The really ugly case is when one or more of the w's is negative (but not all); that case is the icky case and the clipper more often than not then does the clipping pain computation against all the clipping planes. That part sucks, always sucks and uses up a fair amount of sand. There has been hardware (like old Intel GPU's) that did not have a dedicated clipper; the clipping and divide work was done by the programmable EU's. It was not happy, so they added a dedicated clipper.

My advice: likely all you want is normalizing z your own way (which just is a VS job), but if you really want the whole enchilada, make a GS to implement that which you are after.

Yandersen
08-29-2014, 07:30 AM
Ah, that is what I am missing! Thanks, mbentrup (http://www.opengl.org/discussion_boards/member.php/18541-mbentrup), for pointing that out. May I be dare to ask for a more info about the interpolation process in details? Maybe some links you could point out for me, please?

Well, there are few intuitive solutions that come to my mind.

The first solution is to make the gl_PositionDiv to be a 4-component vector, where first component is used for clipping and dividing gl_Position.x, second for y, third for z and the forth component of gl_PositionDiv is used for attribute interpolation.

The second solution is to use gl_Position.w for the attribute interpolation as it was before (otherwise what is the use of gl_Position.w would be, right?). In other words, this is like a restriction: the user can manipulate only first three components of the division vector while the forth one is taken from gl_Position.w implicitly.

I vote for the first solution, even though it makes gl_Position.w redundant unused component in case the vertex shader chooses to write custom values into gl_PositionDiv.

Yandersen
08-29-2014, 07:49 AM
Asking for a different divide value for each element of gl_Position.xyz would make the clipper (fixed function part) of a GPU take even more sand, in order to keep the same performance of triangles to clocks because various performance shortcuts for the most common situation (namely all w's are positive and away from zero) would be gone. On subject of that, most hardware (if not all) has a dedicated unit handling triangle setup and clipping all rolled into one. Additionally, most implementations have a guard band logic to avoid clipping and let scissoring do the job. To be precise: if a triangle has all of the w's are positive (and away from zero) and z's in happy range [-1,1], then scissoring takes care of the clipping volume (essentially). If all the w's are positive and some of the z's are icky, then a triangle needs to get clipped against just the two z requirements, which results in at most 3 triangles. The really ugly case is when one or more of the w's is negative (but not all); that case is the icky case and the clipper more often than not then does the clipping pain computation against all the clipping planes. That part sucks, always sucks and uses up a fair amount of sand. There has been hardware (like old Intel GPU's) that did not have a dedicated clipper; the clipping and divide work was done by the programmable EU's. It was not happy, so they added a dedicated clipper.I am not sure I understood... With the proposed extension the normalized device coordinates will be in [-1...1] range anyway, even with different values in all of the components of gl_PositionDiv. And with glClipControl the clipping of z is already unbound from clipping of xy because the range of Zndc can be changed to [0...1], that means those coordinates are already processed independently. Well, whenever you write to gl_FragDepth, the clipping for Z is not performed at all, is it right?
That what made me think that the independence of clipping for all three components is not a problem, so I come up with that extension as an alternative to DirectX w-buffers.

My advice: make a GS to implement that which you are after.Do you suggest to perform the clipping manually in GS, then make a perspective division there also?

Yandersen
08-31-2014, 06:50 PM
Gentlemen, I've updated the topic related the proposed extension to make it clear and well-defined (check the Extension part):

https://devtalk.nvidia.com/default/topic/770881/opengl/the-ultimate-remedy-for-z-fighting-requires-extension-/

If there is still something "fishy", please, point it out.

Agent D
09-01-2014, 12:03 AM
If it's all about Z-fighting, why do you insist on fiddling with the X and Y components as well? Why is it so hard to scale the Z value in the shader if that's all you actually want?

Yandersen
09-01-2014, 03:42 AM
Because scaling Z will not produce the desired result, Agent D.
The closer the point to the W plane, the more dense the distribution of the XYZ values near it. Therefore separate W planes need to be used for Z and XY to control the distribution of Z values independently from XY. But making exceptions is not an OpenGL way. If Z is separated, then all other components should be separated also. The advantage is that we can omit the perspective matrix and let gl_PositionDiv alone handle the perspective transformations while gl_Position carrying the result of model-view transformation. In that case all 4 components of gl_PositionDiv will be used and have different values.

kRogue
09-01-2014, 06:14 AM
Um, I think some things are a touch unclear. Ahem. If all one is worried is about z, then there really is a simple way. Lets say one wants to divide gl_Position.z by ZDiv instead of gl_Position.w to get the normalized value. One way to do this -without- geometry shaders is to enable clipping for the fist 2 clip distances and write for their values:



gl_ClipDistance[0] = ZDiv - gl_Position.z;
gl_ClipDistance[1] = ZDiv + gl_Position.z;


and then AFTER that write



gl_Position.z *= gl_Position.w/ZDiv;


and lastly, to be safe, enable depth clamping.

This will give you what you are after for z. For the whole enchilada, just use a geometry shader. There is no need for an extension.

Yandersen
09-01-2014, 02:43 PM
Heh, again I here a suggestion to multiply z by w. And again I say, that the result turns out to be incorrect for clipped primitives as the vertices inserted by clipper get wrong depth values (did, tested, no-no-no!). The original vertices multiplied by w get divided by w - OK. But the inserted vertices get their Zc interpolated while it is multiplied by w, and when they get divided, result apparently comes to be incorrect - Z is higher than expected, so the primitives behind the clipped ones pop through.

Believe me, I spend a lot of time trying out many tricks (my imagination is not bad, really) - there is no easy solution. :)
The best workaround for high zFar/zNear values, I think, would be:
1) using the FP depth buffer,
2) glDepthFunc(GL_GREATER),
3) glClipControl(...,GL_ZERO_TO_ONE),
4) the following projection matrix:

| f/aspect 0 0 0 |
| |
| 0 f 0 0 |
P = | |
| 0 0 0 zNear |
| |
| 0 0 -1 0 |

where f = ctan(ViewAngleVertical/2),
aspect = Viewport.x / Viewport.y,
zNear: distance to the near clipping plane; this value could be very small (in 1e-XX range).

This setup will cull all the geometry behind the near clipping plane, and the resulting depth will decrease as the object get drawn further away asymptotically approaching 0 for the infinitely distant objects.
Obviously that will only work for FP buffers heavily exploiting the exponent part of the numbers in it's negative range (Xe-XXX), because 99% of depth values will be less than 0.000...

Must admit, did not tried yet, though (I have GeForce GT520 and even with the latest driver I do not have support for glClipControl as GLView reports). But anyway, using FP for depth buffer is not the solution I would be completely happy with.

Yandersen
09-01-2014, 06:49 PM
There is no need for an extension.Seems like I am asking the world to do me a personal favor. OK, let me show you why do I think that this extension will benefit the game industry (not just me alone). For clarity: the final goal is to make it possible for cameras to have a near clipping plane set as close as ~1e-3 or so (a few millimeters, in other words, or even less) while drawing the scene as far as ~1e+5 or so (hundred kilometers) without artifacts causing by degraded z-buffer resolution.

Here is a typical problem everyone is used not to mention:
1411 - example from Far Cry 3, dawn time, player looks down, the pole on the right has long shadow stretching across the road.
What is wrong on that picture?
No idea?
Legs. There are no body, no legs, no shadow cast by the player.
Why? Why is it so in EVERY FPS game? Why is the character's model never rendered the same way as every other model in the scene? Oh, because the main camera into which the whole scene is rendered has to have a near clipping plane set too far to capture the close-up parts of the body of the player! :whistle:

Now imagine we got the extension I fetish so much about. Now we can setup the camera with near clipping plane located 1mm away, no far clipping plane - and 24bpp default integer depth buffer will let us draw the whole open-world scene with no artifacts (as we would be able to control the distribution of z values independently of the location of near clipping plane by setting the w plane for z in different position). How tempting would it be to simply attach the camera to the player's model' head and make no exceptions drawing the player's model together with objects in the scene? How much more realism would be achieved when the player would see the objects actually colliding and interacting with it's own body, seeing the own shadow reflecting each of the player's model' movement? Damn, with zNear=0.001 there are enough space between the character's eyes and the goggles to fit such camera in between and render from that natural location! We can even put two cameras right in front of the player's eyes and render in stereo mode from those points, even through the goggles or a visor mounted on the player's head - whatever. Well, playing stereo is not common yet, so for a single camera the nose' base is a good point anyway. :)

kRogue
09-02-2014, 04:33 AM
I was thinking more on this and the situation where the Div factor you want is negative for some verts and positive for others together with when some x's y's or z's are negative will interact in a bad way (namely the normalized will come out positive and not interpolate correctly, the example that breaks the vertex shader with clipdistance magick is when say two points where one has both gl_Position.z and gl_DivPosition.z positive and the other has both negative; that case will be the normalized coords stay positive but it should stretch across 0).. But, all is not lost, here is the extension implemented as a geometry shader. The code would mirror what a dedicated triangle clipper would do anyways, the shader is written old school style of GL_ARB_geometry_shader4




in vec3 DivPosition[];



int count;
vec3 fan_positions[9];
vec3 fan_DivPositions[9];

float current_clip[9];

/*
return the location of the expression being at 0.
*/
vec4
compute_clip_location(in vec4 p0, in float v0, in vec4 p1, in float v1)
{
/*
solve for t so that t*v0 + (1-t)*v1 = 0
*/
t = v1/(v1-v0);

return t*p0 + (1-t)*p1;
}

/*
there are 6 clipping planes:

-DivPosition.? <= gl_Position.? <= DivPosition.? for ?=x,y,z
we codify it as

v(sgn) . ? = sgn*gl_Position.? + gl_DivPosition.?

sgn=-1, 1, ?=x,y,z

store that value in each active element of the current fan
*/
#define compute_current_clip(float sgn, F) \
do { \
int i;\
for(i=0;i<count; ++i) {\
current_clip[i] = sgn*fan_position[i].F + fan_DivPosition[i].F; \
}\
} while(0);


/*
there is either two indices where the sign of current_clip
switches or the sign does not flip at all.
Record those two indices, and return the sign of current_clip
in their range. If there are no two vertices record -1 twice.
*/
void
find_sides(out int changes_at[2])
{
float v;
int aa;

changes_at[0]=changes_at[1]=-1;

aa=0;
v=sign(current_clip[0]);
for(int i=1; i<count; ++i)
{
float w;
w=sign(current_clip[i]);
if(w!=v)
{
changes_at[aa] = i;
++aa;
v=w;
}
}
}

void
stitch(int changes_at[2])
{
if(changes_at[0]==-1)
{
if(current_clip[0]<0.0)
count=0;

return;
}

int new_count;

if(current_clip[0]>=0.0)
{
//"copy" unclipped vertices
new_count = changes_at[0];

//insert interpolate between changes_at[0] -1 to changes_at[0]
i0= changes_at[0] - 1;
i1= changes_at[0];

fan_positions[new_count]=compute_clip_location( fan_positions[i0], current_clip[i0],
fan_positions[i1], current_clip[i1]);
new_count++;

//and now insert the value of change between changes_at[1] -1 to changes_at[1]
i0= changes_at[1] - 1;
i1= changes_at[1];

fan_positions[new_count]=compute_clip_location( fan_positions[i0], current_clip[i0],
fan_positions[i1], current_clip[i1]);
new_count++;



for(int i=changes_at[1]; i<count; ++i, ++new_count)
{
fan_positions[new_count]=fan_positions[i];
}
}
else
{
//insert interpolate between changes_at[0] -1 to changes_at[0]
i0= changes_at[0] - 1;
i1= changes_at[0];

fan_positions[0]=compute_clip_location( fan_positions[i0], current_clip[i0],
fan_positions[i1], current_clip[i1]);

for(new_count=1, i=changes_at[0]; i<changes_at[1]; ++i, ++new_count)
{
fan_positions[new_count]=fan_positions[i];
}

//and now insert the value of change between changes_at[1] -1 to changes_at[1]
i0= changes_at[1] - 1;
i1= changes_at[1];

fan_positions[new_count]=compute_clip_location( fan_positions[i0], current_clip[i0],
fan_positions[i1], current_clip[i1]);
new_count++;


}

count=new_count;
}


#define do_it(sgn, F) \
do { \
int temp[2];
if(count>0)
{
compute_current(sgn, F);
find_sides(temp);
stitch(temp);
}
} while(0)

void
main()
{


count=3;
for(i=0;i<3;++i)
{
fan_Positions[i]=gl_PositionIn[i].xyz;
fan_DivPosition[i]=DivPositions[i];
}

/*
clip the triangle to the clipping equations
*/
do_it(1, x);
do_it(-1,x);
do_it(1, y);
do_it(-1,y);
do_it(1,z);
do_it(-1,z);

/* now send out the triangle fan */
gl_PositionOut = vec4(fan_position[0]/fan_DivPosition[0], 1.0);
EmitVertex();
gl_PositionOut = vec4(fan_position[1]/fan_DivPosition[1], 1.0);
EmitVertex();

for(int i=2; i<count; ++i)
{
gl_PositionOut=vec4(fan_position[i]/fan_DivPosition[i], 1.0);
EmitVertex()
RestartPrimive();
gl_PositionOut = vec4(fan_position[0]/fan_DivPosition[0], 1.0);
EmitVertex();
gl_PositionOut = vec4(fan_position[i]/fan_DivPosition[i], 1.0);
EmitVertex();
}
}



That terribly unoptimized mess will give you your extension. It does not contain early leave optimizations. It writes the gl_PositionOut as the normalized device coord you want and w=1.0. It needs to be augmented for interpolates (which means saving that t in compute_clip_location and using to compute the interpolate there) and should be refactored to make it more optimal.

Have fun.

Yandersen
09-02-2014, 05:07 AM
kRogue, your suggestion is straightforward, and I DID tried it in equivalent way: I set up modelview-projection transformation matrices in such a way to ensure that Z (and W) of any vertex drawn lays in range [-1...1]. And I also had a clipping plane, just one, that cuts geometry behind the near clipping plane only based on the Zc. Then I premultiplied gl_Position.z by gl_Position.w (I even tried abs(gl_Position.w)). And the artifacts surprised me very much. Non of the behind-camera polygons were popping out, no. But those ones, which were close to camera, appeared like receiving slightly higher depth then they should have, because parts of them started to be occluded by the polygons right behind them (only in case both surfaces were not far away from each other, so the artifact had some degree of locality).
I blow my head struggling to understand the mechanism causing this annoying artifact. And it turned out that the source is the clipping, were vertices inserted receive wrong depth value.
See, typically, gl_Position.z and gl_Position.w both hold the product of z. Therefore after the premultiplication the gl_Position.z holds the value of z^2+z*n. Now imagine one vertex after the modelview-projection transformation receives Zc=0.4, another one Zc=0.3. If the clipping plane crosses the edge at half, the inserted vertex should have interpolated Zc=0.35. But If you premultiply both initial vertices, then you linearly interpolate squared values, not linear ones, so the vertex inserted in the middle does not receive correct Zc. In controversy, for unclipped edges, the Zc of their vertices gets interpolated after the division, when their values become linear again, so the result for unclipped polygons is what we expect. Simply saying, the the source of the problem is the clipping because the Zc coordinates of the inserted vertices are calculated based on non-yet-divided Zc of the original premultiplied values, while the correct results of interpolation could be achieved based on assumption that the premultiplied values are divided already.
Therefore one need to clip the polygons while their Zc are not premultiplied yet, and multiply by Wc only after that, ensuring that fixed functionality will not have reasons to insert any new vertices between the ones which have Zc premultiplied by Wc.

kRogue
09-02-2014, 05:14 AM
yeps, the approach was so wrong I deleted the post :P At any rate look at the newer post using the geometry shader. That is the way to go, but the shaer will need some love to work fast.

kRogue
09-02-2014, 05:36 AM
ahh sighs, there is a bug or two in that geometry shader version, here it is, hopefullfy will bugs gone.



in vec3 DivPosition[];



int count;
vec3 fan_positions[9];
vec3 fan_DivPositions[9];

float current_clip[9];

/*
return the time when the expression is 0
*/
float
compute_clip_location(in float v0, in float v1)
{
/*
solve for t so that t*v0 + (1-t)*v1 = 0
*/
return v1/(v1-v0);
}

/*
there are 6 clipping planes:

-DivPosition.? <= gl_Position.? <= DivPosition.? for ?=x,y,z
we codify it as

v(sgn) . ? = sgn*gl_Position.? + gl_DivPosition.?

sgn=-1, 1, ?=x,y,z

store that value in each active element of the current fan
*/
#define compute_current_clip(float sgn, F) \
do { \
int i;\
for(i=0;i<count; ++i) {\
current_clip[i] = sgn*fan_position[i].F + fan_DivPosition[i].F; \
}\
} while(0);


/*
there is either two indices where the sign of current_clip
switches or the sign does not flip at all.
Record those two indices where it changes;
if there are no two vertices record -1 twice.
*/
void
find_sides(out int changes_at[2])
{
float v;
int aa;

changes_at[0]=changes_at[1]=-1;

aa=0;
v=sign(current_clip[0]);
for(int i=1; i<count; ++i)
{
float w;
w=sign(current_clip[i]);
if(w!=v)
{
changes_at[aa] = i;
++aa;
v=w;
}
}
}

void
add_vertex(inout int new_count, int i0, int i1)
{
float t;
t=compute_clip_location( current_clip[i0],current_clip[i1]);
fan_positions[new_count] = mix(fan_positions[i0], fan_positions[i1], t);
fan_DivPositions[new_count] = mix(fan_DivPositions[i0], fan_DivPositions[i1], t);

/*
if you have interpolates, save them here too for later abuse

"fan_fooInterpolate[new_count] = mix(fan_fooInterpolate[i0], fan_fooInterpolate[i1], t)"
*/

new_count++;
}

void
save_vertex(in int dest, in int src)
{
fan_positions[dest]=fan_positions[src];
fan_DivPositions[dest]=fan_DivPositions[src];

/*
if you have interpolates save them here too for later abuse
fan_foointerpolate[dest]=fan_foointerpolate[src];
*/
}

void
stitch(int changes_at[2])
{
if(changes_at[0]==-1)
{
if(current_clip[0]<0.0)
count=0;

return;
}

int new_count;

if(current_clip[0]>=0.0)
{
//"copy" unclipped vertices
new_count = changes_at[0];

//insert interpolate between changes_at[0] -1 to changes_at[0]
add_vertex(new_count, changes_at[0] - 1, changes_at[0]);

//insert interpolate between changes_at[1] -1 to changes_at[1]
add_vertex(new_count, changes_at[1] - 1, changes_at[1]);

for(int i=changes_at[1]; i<count; ++i, ++new_count)
{
save_vertex(new_count, i);
}
}
else
{
//insert interpolate between changes_at[0] -1 to changes_at[0]
add_vertex(new_count, changes_at[0] - 1, changes_at[0]);

for(new_count=1, i=changes_at[0]; i<changes_at[1]; ++i, ++new_count)
{
save_vertex(new_count, i);
}

//and now insert the value of change between changes_at[1] -1 to changes_at[1]
add_vertex(new_count, changes_at[1] - 1, changes_at[1]);

}

count=new_count;
}


#define do_it(sgn, F) \
do { \
int temp[2];
if(count>0)
{
compute_current(sgn, F);
find_sides(temp);
stitch(temp);
}
} while(0)

void
main()
{


count=3;
for(i=0;i<3;++i)
{
fan_Positions[i]=gl_PositionIn[i].xyz;
fan_DivPosition[i]=DivPositions[i];
}

/*
clip the triangle to the clipping equations
*/
do_it(1, x);
do_it(-1,x);
do_it(1, y);
do_it(-1,y);
do_it(1,z);
do_it(-1,z);

/* now send out the triangle fan */
for(int i=2; i<count; ++i)
{
gl_PositionOut = vec4(fan_position[0]/fan_DivPosition[0], 1.0);
EmitVertex();
gl_PositionOut = vec4(fan_position[i]/fan_DivPosition[i-1], 1.0);
EmitVertex();
gl_PositionOut=vec4(fan_position[i]/fan_DivPosition[i], 1.0);
EmitVertex()
RestartPrimive();
}
}


For interpolates, one can put it directly in the add_vertex/save_vertex, and then reply them in triangle fan emit. This shader is far from optimal, but will give you what you want.

Yandersen
09-02-2014, 03:15 PM
kRogue, you took it too hard, man. :) You know, everything is possible, there is always a way to get what you want. And you know, I have hundred times easier solution: save model-view-transformed and normalized Zc in the vertex shader, write the interpolated value as gl_FragDepth in the fragment shader and voila - you get linear depth buffer. No calculations at all. The only disadvantage is that the early z-test get switched off. But comparing to the custom-made clipping in geometry shader... :dejection: OMG... Yes, it is possible to do what the clipper does using the geometry shaders. But doing all that just to save a value of Zc from being divided by Wc - nonsense! Fixed functionality does what we do NOT want? Aha, let's go Microsoft way - patch it all around instead of solving the actual problem the right way. :)

I don't understand why do you people perceive it as like I am just struggling to find a solution for my personal problem, kinda asking for help? No-no-no, my sincere motivation is the evolution of OpenGL, as I love it so much and I want it to be perfect (at least better than DirectX). For the last 20 years it evolved so much, but the most basic thing - the camera - still can not be emulated properly! That is ridicules: we got control of every aspect of rendering, so many things are hardware-accelerated now, but the way perspective division was done 20 years ago - we still have not a slight change there. Huge terrains, thousandths of instances of objects, sun flares, soft shadows, ambient occlusion, multisampling, bump-mapping, parallax, reflections, bla-bla-bla, modern super-realistic graphics - and we still unable to draw the player's character?!?!?! As the player had no body in first Half-Life - it still didn't get it even in Battlefield 4! People, wake up, something is wrong here! If we want to render realistic 3D, we need to start from the most basic thing - we need to be able to emulate the camera the right way. Can we do that?.. Nope, we still can not! I am desperate...

Well, every FPS game uses the same trick to solve the camera's issue: character's model is not rendered into the main camera (only fake hands we see). OK, we used to it already. Now let me show the case where even this trick is not applicable.

Imagine the game featuring the futuristic post-apocalyptic world of AI machines exploring the large open-world environment, placing the supporting buildings and fighting as groups between each other for the resources. The game engine need to be able to render large areas (hundreds of kilometers) - that is the requirement #1. Next. The player is one of those machines, so that is the first-person shooter game. The key of the gameplay is the ability to construct the own machine from the functional parts, any of which may as well be destructed in battle. Among with frame elements, armor parts, reactors, accumulators, engines, guns, gravi-pushers-pullers and other functional devices the user may use to compose it's machine's body - there is also cameras. The user is not restricted on the topology of the machine's frame as parts' collision and devices' functionality are simulated, so the user can place any functional element any way it wants (whatever makes sense from the designer's point of view), and probably it will be desirable to squeeze the camera somewhere in between armor elements to protect it from catching the enemy fire and get destroyed, leaving the player blind or force it to switch to another camera located somewhere in a safer place on the machine but where it does not give a good view angle. Therefore, the requirement #2 for the game engine is that the near clip plane for the camera should be not larger than the size defined by the actual game model of the camera, which is, obviously, can not be few meters wide - it must be as small as the real camera (matrix with lenses). And That places the restriction on near clipping plane to be a few centimeters at most.

Combine the requirement #1 with #2 and see the hardware restriction on the way. And there wouldn't be any problem here if we could have some control over the clipping and perspective division, but it simply works the same way it was working 20 years ago - now it just runs on the advanced hardware, which can do so much more, but we are too conservative to change anything. We keep dragging the same crippled algorithms and whenever we want to do smg in a better way we have to find a "workaround" - this is so wrong!

kRogue
09-03-2014, 02:45 AM
I think doing the clipping in GS with just two planes for the custom-z will run faster than doing it in fragment shader. Also, enabling z-clamping will likely make the performance comparable to usual jazz on z stuff.

The issue about the perspective divide being same for all (x,y and z) actually has a pretty strong mathematical reasoning behind it; I really do not want to write it up on a forum, it is a hassle.

Much of the pain for the stuff near the eye being borked can be gotten around by careful projection matrix and z-futzing. The easiest way is to use glDepthRange on models near the eye and again futzing with projection matrix together with floating point depth buffer. Essentially, break the scene into regions parallael to the near plane, and render by region and use glDepthRange to get the job done. In an ideal world, one could write the value for the arguments of DepthRange from GS for better flexibility (this is likely a smaller change in hardware, but it depends).

Yandersen
09-03-2014, 04:56 AM
The issue about the perspective divide being same for all (x,y and z) actually has a pretty strong mathematical reasoning behind it; I really do not want to write it up on a forum, it is a hassle.Don't worry, I realize why. But it doesn't matter on the final rasterization step for an invisible Z coordinate, especially after the clipping (and that is where the perspective division performed).
Well, yeah, there will be no use for my extension applicable to xy and interpolation components, truly. Carrying 4 extra FP numbers per vertex just to serve Z is somewhat redundant, I start to realize that...

Much of the pain for the stuff near the eye being borked can be gotten around by careful projection matrix and z-futzing. The easiest way is to use glDepthRange on models near the eye and again futzing with projection matrix together with floating point depth buffer. Essentially, break the scene into regions parallael to the near plane, and render by region and use glDepthRange to get the job done. In an ideal world, one could write the value for the arguments of DepthRange from GS for better flexibility (this is likely a smaller change in hardware, but it depends).The glClipControl will make it even easier (whenever the extension will become available):


The best workaround for high zFar/zNear values, I think, would be:
1) using the FP depth buffer,
2) glDepthFunc(GL_GREATER) - incoming fragment passes if depth is greater,
3) glClipControl(...,GL_ZERO_TO_ONE),
4) the following projection matrix:

| f/aspect 0 0 0 |
| |
| 0 f 0 0 |
P = | |
| 0 0 0 zNear |
| |
| 0 0 -1 0 |

where f = ctan(ViewAngleVertical/2),
aspect = Viewport.x / Viewport.y,
zNear: distance to the near clipping plane; this value could be very small (in 1e-XX range).

This setup will cull all the geometry behind the near clipping plane, and the resulting depth will decrease as the object get drawn further away asymptotically approaching 0 for the infinitely distant objects.
Obviously that will only work for FP buffers heavily exploiting the exponent part of the numbers in it's negative range (Xe-XXX), because 99% of depth values will be less than 0.000...

Yandersen
09-03-2014, 05:12 AM
doing the clipping in GS with just two planesNa-ah, all clipping planes must be worked out, because it doesn't matter which one will cause the clipper to insert new vertices calculating the Z for it in the wrong way. ;)

kRogue
09-03-2014, 05:13 AM
I admit that clip control would be more feature encompassing, but to make hardware that runs fast enough, means alot of freaking sand. Current triangle clipper/setup engines are already a great deal (because of the nature of clipping) and a number of optimizations (namely gaurdband) are gone for what you are asking. My opinion is the following: since the functionality can be done via GS and that functionality is really only needed in practice for z, then it is likely a hard sale to add to the design of a GPU. Secondly, the end goal is to allow for dynamic ranging of the write to teh depth buffer; that is the end goal. This can be done by the glDepthRange bits, but has draw call break. SO a much more modest want: at the primitive level to specify the values for glDepthRange from a geometry shader. This will get what you want, is easier to read, and much less for hardware to implement.

Firadeoclus
09-03-2014, 07:37 AM
But anyway, using FP for depth buffer is not the solution I would be completely happy with.
Why not? It lets you do what you want, place the near clip plane very close while being able to draw objects very far away. You get a constant relative error, which makes sense since you want more coarse LOD meshes for your distant objects anyway. And you can do it now, on today's hardware, without compromising early-Z or depth buffer compression.

The view space linear depth buffer you want requires additional hardware since it's non-linear in screen space, i.e. it requires perspective correct interpolation. It also breaks delta encoding based depth buffer compression thus needing a more complex compression scheme and/or more bandwidth.

Yandersen
09-03-2014, 03:32 PM
...to make hardware that runs fast enough, means alot of freaking sand... ...and a number of optimizations (namely gaurdband) are gone for what you are asking. ...a hard sale to add to the design of a GPU. ...much less for hardware to implement.kRogue, let's not try to estimate how much sand it will cost unless you are the one who actually writes machine codes for the GPU or the designer of those chips. Only the actual nVidia, AMD or Intel engineers could give the right estimation. I am not one of them, so the points of my judgement are based on the knowledge of Intel command set I use sometimes writing sse-based geometry functions in assembler. From this point of view I claim that any vector operation with xmm registers (division, multiplication, inverse-square calculations and others) affect 4 values independently, so calculating 1/w would cost as much "sand" as calculating {1/w0, 1/w1, 1/w2, 1/w3 }. And the specs says that clipping is performed one plane at a time, so one component of that vector will be used for each plane. As for the last point, the interpolation, which is done after the perspective division, the last component (1/w3) replaces the first three which are used up by that point - that is the only additional shuffling operation (which is not costy at all). Well, I assume some generic sse-based hardware there, but you never know what monster is actually lurks in your card, so only the actual developers may tell you if they could implement the extension or not. And I believe that the cost of this extension will be derived from human-hours of work spent on rewriting the drivers without any relation to the "sand" as the current generalized-vectorized HW can do all this anyway. Therefore it is the trade between "how much time we spend" and "how much we get". We here can only estimate the second part: larger scenes, smaller near clipping plane, the same depth buffer required.

Why not?Because the default window framebuffer has integer format. I know, silly argument (who draws there directly nowadays, right?), but 24bpp depth could be used with 8bpp stencil, while FP depth buffer could be paired with 8bpp stencil in 64bpp structure only. Aside from that, why to use more memory - why not just utilize the smaller buffer in a better way? Wouldn't it be more rational?

You get a constant relative error, which makes sense since you want more coarse LOD meshes for your distant objects anyway. And you can do it now, on today's hardware, without compromising early-Z or depth buffer compression.Yes, there is no other choice currently. Two posts above I requoted my target solution with FP buffer, but the glClipControl is not available on my card yet. Still waiting for nVidia to update their drivers so my GT 520 could get it... :(

The view space linear depth buffer you want requires additional hardware since it's non-linear in screen space, i.e. it requires perspective correct interpolation. It also breaks delta encoding based depth buffer compression thus needing a more complex compression scheme and/or more bandwidth.Did you heard about w-buffers? I was surprised Microsoft did some smart trick with their DirectX. Perfect solution for linear depth buffers, couldn't be done better, IMO. OpenGL still chewing a gum indifferently... :(

Would be awesome if instead of glClipControl() we would get smg more general like glRasterizerParameteri() to configure origin, depth mode and other parameters of fixed functionality. Because glClipControl looks like a sort of Microsoft-style hot patch to me, falling out of the OpenGL style with those two single independent parameters it sets. Would be awesome to be able to select depth mode 0_1 and w as the source of fragment depth coordinate using glRasterizerParameteri. But, well, it is done the way it is done, the Guys do what they want the way they want, commercial world doesn't care about some indie geeks down there...

Firadeoclus
09-04-2014, 06:02 AM
Well, I assume some generic sse-based hardware there, but you never know what monster is actually lurks in your card, so only the actual developers may tell you if they could implement the extension or not.
There is plenty of information on the architecture of recent GPUs out in the public, and if you look at it you will find that they generally use wide vector units where each work-item (vertex, fragment, etc.) uses a single lane. I.e. from the perspective of a fragment the execution units are effectively scalar. You don't get vector operations "for free".


Because the default window framebuffer has integer format. I know, silly argument (who draws there directly nowadays, right?), but 24bpp depth could be used with 8bpp stencil, while FP depth buffer could be paired with 8bpp stencil in 64bpp structure only. Aside from that, why to use more memory - why not just utilize the smaller buffer in a better way? Wouldn't it be more rational?
You're ignoring the effect of depth buffer compression (and the likely possibility that stencil is stored separately in memory). While you have to allocate the full buffer for the worst case, the average amount of depth/stencil data stored is much less than 40bpp. And keeping depth linear in screen space (i.e. noperspective) likely allows better compression than view space linear depth such that using the smaller buffer in a "better" way might actually lead to higher bandwidth usage.


Did you heard about w-buffers? I was surprised Microsoft did some smart trick with their DirectX. Perfect solution for linear depth buffers, couldn't be done better, IMO. OpenGL still chewing a gum indifferently... :(
W-buffer support was deprecated in D3D10 for lack of hardware support, and it had always been optional before that.

kRogue
09-04-2014, 11:05 AM
kRogue, let's not try to estimate how much sand it will cost unless you are the one who actually writes machine codes for the GPU or the designer of those chips.

I am not estimating as I have consulted those that have been hardware engineers. Even now, clipping from user defined clip distances for much hardware out there is horror slow implemented and the thing still takes up lots of gates.

The brutal part is this: the main use case (z-fighting near the eye) you are after can actually be implemented by floating point depth buffer already together with an infinite far plane (which is just a different projection matrix).

Personally, I'd like to modify the beans for normalized z, where the value for depth buffer is taken from the z after w-divide directly and the clip values for that z are user defined, or even possibly disabled. This makes perfect sense with a floating point depth buffer (which everyone implements), has minimal hardware intrusions (and is infact trivial to implement). That alone would give you what you are after since floating point has built into it different degrees of accuracy.

Yandersen
09-04-2014, 02:11 PM
I am not estimating as I have consulted those that have been hardware engineers. Even now, clipping from user defined clip distances for much hardware out there is horror slow implemented and the thing still takes up lots of gates.Well, sounds like removing redundant far clipping plane may benefit the rendering speed. :)

The brutal part is this: the main use case (z-fighting near the eye) you are after can actually be implemented by floating point depth buffer already together with an infinite far plane (which is just a different projection matrix).Can you be more specific, please? I am interested. Isn't your solution the same as mine on the previous page:
The best workaround for high zFar/zNear values, I think, would be:
1) using the FP depth buffer,
2) glDepthFunc(GL_GREATER),
3) glClipControl(...,GL_ZERO_TO_ONE),
4) the following projection matrix:

| f/aspect 0 0 0 |
| |
| 0 f 0 0 |
P = | |
| 0 0 0 zNear |
| |
| 0 0 -1 0 |

where f = ctan(ViewAngleVertical/2),
aspect = Viewport.x / Viewport.y,
zNear: distance to the near clipping plane; this value could be very small (in 1e-XX range).

This setup will cull all the geometry behind the near clipping plane, and the resulting depth will decrease as the object get drawn further away asymptotically approaching 0 for the infinitely distant objects.
Obviously that will only work for FP buffers heavily exploiting the exponent part of the numbers in it's negative range (Xe-XXX), because 99% of depth values will be less than 0.000...

kRogue
09-04-2014, 03:11 PM
yep something like that.

The best case for this to be shiny is what I said for extension: ability to disable z-clipping without clamping the value. This would avoid the near plane issue, and I am not worried about w being close to zero since the clip bits -w <= x <= w and -w <=y <=w make sure w is not zero anyways. That would require that the depth buffer is FP32 (or some other floating point format) and some hairy icky things to handle really big numbers and likewise very small (unnormalized) numbers too.

Yandersen
09-05-2014, 08:50 AM
Actually, this will work only if z values are mapped from [0...1] range to [0...1] depth values directly, because in case of [-1...1] to [0...1] mapping the addition of 0.5 will kill the precision of Xe-XX values, therefore glClipControl(...,GL_ZERO_TO_ONE) is a must, otherwise FP buffer will work as 23bpp integer buffer.
The equations "-w <= x <= w and -w <=y <=w" do not ensure "w is not zero" at all. In fact, those 4 clipping planes constrain viewing volume into two inverted point-connected frustums. But do not worry, this equation cuts the back frustum:

0 <= zNear <= -z

So all geometry behind the near clipping plane is clipped. The left part of the equation (0<=zNear) is always true, therefore there is no far clipping plane. But even without glClipControl the equation above will produce the same result:

z <= zNear <= -z

because the second part is just a more strict limit of the first part of the equation. However, the result with FP depth buffer will not be any better than with 24bpp integer depth buffer. In any case, the resulting Zc will be 1.0 for the points at near clipping plane and it will be lower for further points, producing Zc values in range [1...0), not [-1...1]. That is the another reason to use glClipControl with such setup.

The glClipControl-enabling extension is not available to me yet, so I keep playing games waiting for Xmas to come... :)

kRogue
09-08-2014, 06:32 AM
um,

The clipping equations:

-w <= x <= w
-w <= y <= w
-w <= z <= w

do just as good job of avoiding w being small as

-w <= x <= w
-w <= y <= w

in practice. Now the main pain is when x and y are close to zero and w is too, that means w is tiny positive and the fragment is close to the center of the viewport. This case is handled by hardware already [in particular, when depth clamping is enabled those are the effective clip planes]

Just to be clear: there is no "near clipping plane" in the hardware really, it is just clipped in clip space. The concept of near clipping plane comes from what happens to the condition -w <= z when inverted through a traditional projection matrix.

For the z-stuff, one does not need this clip control thing, one just needs the ability to allow for the normalized z-value to be unbounded (instead of [-1,1] or [0,1]) and to disable the clip equations in clipping. Together with FP32 depth buffer, one is done completely for the use cases I think you are after. This change is tiny and relatively trivial to implement. In short this is what is needed:


glDepthRange to accept values outside of [0,1]
ability to turn off one or both clip conditions -w<=z, z<=w.

That is all you need.