Otherwise each draw call will have to finish before the next can begin.
Each triangle’s rendering, from the top of the pipeline to the bottom, must act as though every triangle rendered beforehand had completed. Otherwise, there’s no way depth test, stencil test, or even blending could ever be specified to work in any consistent, reasonable way.
However, note the key phrase: “act as though”. The rendering system only needs to make sure that everything works out as if this were the case. It can reorder things however it wants, so long as all of the testing, blending, etc operations proceed as if everything rendered in a specific order.
All of the pre-rasterizer stages (vertex shaders, tessellation, geometry shaders, etc) can be parallelized all it wants, so long as the triangles on the other end come out in the expected order. So it can process groups of 16 vertices from the input attribute stream, so long as the triangles come out in the expected order.
Fragment shaders can operate independently of the blending units (which is one reason why they can’t do framebuffer pixel read-back). Early-depth-test has to proceed in-order, but it can be massively parallel. Hi-Z can cull large blocks of fragments from a triangle. And the ROP units (late depth-test, stencil tests, blending, etc) operate in-order, but are fixed-function and working over very small datasets.
That’s one of the reasons image load/store is so complicated; it doesn’t provide these kinds of ordering guarantees, so you have to make sure everything is properly done. While that’s fine for those cases when you’re arbitrarily reading/writing stuff, I wouldn’t want to have to do that kind of synchronization all the time when the driver can do it for me.
Maybe there are newer APIs
The OpenGL specification is not hidden, and the most recent version isn’t that difficult a read. You should familiarize yourself with it before wondering about whether something exists.