Clip space is not the NDC box. It is the 4D space that clipping occurs in. It is your eye space after the projection transform warp has been applied to it. If your projection is perspective, think of Clip space as an orthographic warp of your eye-space perspective frustum (| | vs. \ /), where perspective foreshortening has been accounted for. However it is still 4D so we don’t have any nasty singularities created by perspective divides. That’s why clipping occurs here.
NDC space is 3D (the -1…1 XYZ box you refer to). Clip space + perspective divide = NDC space. After we’ve clipped, it’s safe to divide as nothing has Z=0.