Can't understand perspective projection matrix

Good evening everyone. I’m OpenGL newbie and I have some question about perspective projection. I’ve read this article about it and don’t understand next things. If i read correctly, then OpengGl pipeline first multiply given vertices by modelview matrix. After that we have got an “eye” coordinates. Then, the next state of OpenGl pipeline is to multiply this new formed eye vertex by projection matrix. If we specified a perspective view then perspective matrix should be something like this:

The point is that i cant understand some things from article linked above. First, if we can imagine that our camera is something like truncated pyramid and we can see only things that are inside it’s volume, then how can this matrix transform
all geometry into the other geometry which can be or can’t be hold in a box with coordinates (1,1,1), (1,-1,1) and so on? Yeah, you can say that try to imagine the process of perspective projection like tracing the ray from given point to the eye of camera and see where this ray intersect near plane. Then if we have something that out of the “box”, then it will be projected onto the near plane and visually it will be look like… it don’t belong to inner space of near plane contour (or viewport specified by client). If i understand correctly, if we will think about perspective projecting in terms of finding intersection of ray started from given point and heading to camera’s eye, then in the end we will get the point projected into the near plane with x and y components with new values (not the same like original point) and z equal to near plane. In this case we loosing very important value - z). For this the perspective projection matrix give us an option to save this value adding fourth coordinate - w. After multiplying given vertex by perspective projection matrix we get four-component vertex - (x,y,z,w) - this vertex called, correct me if i’m wrong, clip coordinates. Am I right to understand that frustum culling starts right here by comparing first three coordinates to fourth coordinate w? If that, how can i understand this comparison? The article linked above says that: “If each clip coordinate is less than -wc, or greater than wc, then the vertex will be clipped (discarded)”. But if we have volume box with sides equal to 1 and -1 in all three dimensions, than we should compare all three coordinates related to three-space coordinate system, right? And if we comparing this clip coordinates x, y and z then all of them should lie in the range of -1 to +1, right? This is the place where i’m lost. And why then we convert those coordinates into the “device coordinates” by performing “perspective division”? If the article don’t lie then wc will be equal to -ze, and this is mean that we will perform division by third coordinate z for first three components and will get z = -1?
In any case, if you have answers please reply. Thanks.

songho article is about as clear as it gets. I suggest you do some calculate with x/y/z values with his logic and see what you get. Not all values will translate to the range -1,-1,-1 1,1,1. Some will be outside this. They are the ones that get clipped

A night with book prove it’s strength: now i get it :wink: Thank you all!