Group draws by least changes in programs, states, program inputs etc

I am implementing my own glsl-friendly 3d model format specification.
In implementation (not specification), all meshes and skins, are break into draws.
Every draw has:

  • glDrawElements/glDrawArrays parameters.
  • Material (glsl program, gl states, glAttribPointers, glAttrib, glUniform, buffers etc.)

I am trying to sort FAST these draws, so from N-th draw go to N+1-th draw as efficient I can.
So, I sort draws by material with this order:

  • Same GLSL program from one draw to another
  • Least size of glAttrib*, glUniform* data send from one draw to another
  • Least gl state changes from one draw to another
  • Least buffer changes from one draw to another

Of course is platform dependent and super-bulky, but can anyone assume, that whole this thing has meaning at all?
If yes, the order of non-efficient changes, is correct?

As a general rule, the less you’ll call OpenGL, the less you will change GL states, the more efficient you can be.
But some calls are likely to cost more than others. For example, switching from one shader to another will take far more time than sending an uniform value to the GPU. There was a known table from nvidia about this, but unfortunately I can’t find it again…
But I found some links that might be useful for you (the latter is a bit old however):

http://www.nvidia.fr/object/doc_performance.html

https://www.nvidia.com/docs/IO/8228/BatchBatchBatch.pdf

You should also keep in mind that you might have the best practice code and still have something slow. This depends a lot on how you can sort and remove invisible geometry prior to sending them to the GPU.

Beyond Porting by Cass Everitt covers costs of state changes and resource bindings : [ATTACH=CONFIG]1439[/ATTACH]

Full slides here.

[QUOTE=Spoops;1285743]Beyond Porting by Cass Everitt covers costs of state changes and resource bindings : [ATTACH=CONFIG]2335[/ATTACH]

Full slides here.[/QUOTE]

Thanks for the link !