My [STRIKE]engine[/STRIKE] program works in a state-less mode on the draw-call level. I.e. I have all the information about each draw call: which render buffers it draws to, what vertex attributes it uses, what are the rasterizer states, etc. This opens the door for an ability to re-order the draw-calls automatically before they are sent to OpenGL. This seems to be pretty tough to implement, though, taking into account various GL states and how they rely to each other.
For example, the client orders to draw A->B, then B->C. Then the other render phase calls X->Y, then Y->Z. An unoptimized version would likely stall 2 times: first before B->C while waiting for A->B to complete, and second before Y->Z while waiting for X->Y. The algorithm can safely reorder it to A->B,X->Y,B->C,Y->Z, which would effectively reduce the stalls. Note that I’m not taking into account state switches here, I’m only concerned about possible GPU stalls.
I’m [STRIKE]afraid[/STRIKE] suspect that OpenGL driver might be already doing this. More to that, it might do this on a per-tile basis (instead of my per-call), which could be far more effective than my algorithm. Hence, all my reordering would be useless in this case. I know that the best thing is to test it myself. However, it is really a big chunk of work to me, so I’d like to hear first from people who might know the answer in advance.