Yet another batching thread

I wonder what are the performance costs of the following batch criterias (thanks to add if it’s not complete) :

  • Buffer binding (glBindBufferARB)
  • Array pointer (glVertexPointer, …)
  • Program binding (glBindProgramARB)
  • Program constants (glProgramEnvParameter4fvARB, …)
  • Material states (glMaterialfv, …)
  • Texture (glBindTexture, Enable / disable GL_TEXTURE_2D)
  • Texture env and param
  • Matrix (glMatrixMode, glPush, glPop, glMult, glLoad, glLoadID …)
  • Face culling (Enable / disable GL_CULL_FACE)
  • Alpha blending mode
  • Z test and write states

On which criterias should be based my batching so that I minimize expensive state changes ? What is the actual cost of each of the above operations, and which one should be optimized and which one could be neglected ?
(Stencil and light have been willingly ommitted, due to the fact that I render a pass for each light that cast shadows)

Thanks,
SeskaPeel.