# Thread: What is best practice for batch drawing objects with different transformations?

1. Originally Posted by openlearner
As we were talking about pre-multiplying vertex transformations before submitting them to the shader, I'm wondering what you mean in this context; Are you saying even these matrix calculations are pre-processed?
In somes 3D model's formats, such as the .MD2 format, the vertex and matrix data are pre-processed for to minimize the size of the model's data :
Code :
```
// vertex typedef struct
{
unsigned char   v[3];                // compressed vertex (x, y, z) coordinates
unsigned char   lightnormalindex;    // index to a normal vector for the lighting
} vertex_t;

// texture coordinates typedef struct
{
short    s;
short    t;
} texCoord_t;

typedef struct {
short   index_xyz[3];    // indexes to triangle's vertices
short   index_st[3];     // indexes to vertices' texture coorinates
} triangle_t;

// frame typedef struct
{
float       scale[3];      // scale values
float       translate[3];   // translation vector
char        name[16];       // frame name
vertex_t    verts[1];       // first vertex of this frame
} frame_t;

glBegin( GL_TRIANGLES );   // draw each triangle
for( int i = 0; i < header.num_tris; i++ )
{
// draw triangle #i
for( int j = 0; j < 3; j++ )
{
// k is the frame to draw
// i is the current triangle of the frame
// j is the current vertex of the triangle
glTexCoord2f(
);

glNormal3fv( anorms[ Vertices[ Meshes[i].index_xyz[j] ].lightnormalindex ] );

glVertex3f(
(Vertices[ Meshes[i].index_xyz[j] ].v[0] * frame[k].scale[0]) + frame[k].translate[0],
(Vertices[ Meshes[i].index_xyz[j] ].v[1] * frame[k].scale[1]) + frame[k].translate[1],
(Vertices[ Meshes[i].index_xyz[j] ].v[2] * frame[k].scale[2]) + frame[k].translate[2]
);
}
}
glEnd();```

We can find an full explanation of the .MD2 format at http://tfc.duke.free.fr/old/models/md2.htm for example

Code :
```You may have noticed that v[3] contains vertex' (x,y,z) coordinates and because
of the unsigned char type, these coordinates can only range from 0 to 255. In fact these 3D
coordinates are compressed (3 bytes instead of 12 if we would use float or vec3_t). To uncompress it,
we'll use other data proper to each frame. lightnormalindex is an index to a precalculated
normal table. Normal vectors will be used for the lighting.```

=> here, we can clearly say that the input vertex and matrix data is pre-processed ...
(the vertex cordinates are stored with 3 bytes and not 3 floats and the normal is stored in a precalculed table [+ the matrix data is simplified to only handling scaling and translation])

Note that into this 3D model format, the vertex/normal/texel/matrix data is not pre-multiplied on another side
(cf. they are pre-processed [for to optimize the size of the data to store] but not pre-multiplied)

Originally Posted by tonyo_au
I don't think it was directly related to the batch size; I think it is more related to the number of buffers I had - I had 7000+ (not a good idea) but with small batch sizes I think the gpu was basically idle as it had very little work to do with are render call.

I run on ATI 5870, nVidia Quadro 5000 and GTX 580 - the frame rate is different on each but the percentage change is similar
Yes, in this case you are totally CPU limited, not GPU limited, like explained at http://www.google.fr/url?sa=t&rct=j&...,d.d2k&cad=rja
Code :
```Yes, at < 130 tris/batch (avg) you are
- completely,
- utterly,
- totally,
- 100%
– CPU limited!
• CPU is busy doing nothing, but submitting batches!```

I think one good solution would be to have something like a "primitive transformation restart" than can be stored into the batch's indices with specials indices that indicate that the ongoing primitive have to handle "transformations vertices" and not trues vertices indices

For a triangle batch, the first index can to be an index into a translation table, the second index into a rotation table and the third into a scaling table for example
(if we use quads batchs, the fourth index can to be used for to handle homogeneous coordinates for example)

=> we can certainly use negatives indices for to indicate that the ingoing primitive is in fact a transformation primitive

2. The number of buffers and the number of batches aren't the same thing
In my case I had one buffer for each render call
Did you try putting all that in one buffer and just rendering parts of it
Since I wanted to render all the objects, I did put all the vertices in into several buffers each about 100,000 vertices and used the index restart primitive. This got me back to an acceptable frame rate.

The 100,000 size was a compromise for render time verses update time when objects are deleted. Doubling this number did not make a partical difference to the overall frame render time but noticablely slowed my delete.

3. As we were talking about pre-multiplying vertex transformations before submitting them to the shader
One of the objects I render lots of are pipes. These are all cylinders and could therefore use the same geometry with a scale/rotate/translate matrix.
I have tried rendering these 3 different ways

1) instancing with matrix
2) creating the geometry in the tesselator from a parameterised vertex that describes the radius/length/rotation/translation of the pipe
3) separate geometry for each pipe with each vertex for each pipe at its world location

The third option is the fastest but takes the most space.

I am currently using the second option which uses little space (only marginally more than instancing) and allows lod to improve speed of render. It is not as fast
as the third option even with a pipe of 64 sides but is a lot less data.

With the new graphics cards tesselation is a lot faster but the amount of memory on the card is also larger so I am not sure my option is the best choice.

4. Originally Posted by tonyo_au
2) creating the geometry in the tesselator from a parameterised vertex that describes the radius/length/rotation/translation of the pipe

With the new graphics cards tesselation is a lot faster but the amount of memory on the card is also larger so I am not sure my option is the best choice.
By "tesselator" are you referring to the building of geometry in the shader?

5. By "tesselator" are you referring to the building of geometry in the shader?