z-buffer accuracy

i’ve begun reading the opengl programming guide and came to a point where the book mentioned that values closer to the near clipping plane have greater depth accuracy than those nearer the far clipping plane (with perpspective projections). the reasoning behind this was because during the perspective divide the z values were scaled non-linearly. i looked online a little further into the subject, and what i figured is that the z-values are scaled in such a way because that’s the only matrix transformation that will produce the desired change of the perspective canonical view volume to parallel canonical view volume (i don’t know any linear algebra except how to multiply matrices so i might be wrong). my question is then, why must we HAVE to use a matrix tranformation. why can’t we just alter the x, y (w?) values and leave the z value unaltered?

The question is not very clear to me. The Z-values in the depth test must be based on the distance from the camera point. Two images from the same view with or without perspective correction is not identical.

The standard Z-buffer has most of the precision near the camera as you writes but there is an alternative that I think is called W-buffer. I have never seen it implemented in OpenGL but the precision is equal distributed.

The depth buffer interpolation is required to be perspective correct for accurate interpenetration and occlusion calculations. For fast implementation z must also be linearly interpolable in screen space. You will find that the values stored in the OpenGL depth buffer are actually linear in 2D screen space over any individual triangle, so what you have as a depth representation that gives both linear interpolation performance while being perspective correct for planar surfaces at any orientation. It’s pretty cool when you realize this for the first time.