The mesh has 2503 vertices and 4968 faces.
That’s 1000 objects, with ~5000 faces per object. That gives a rough estimate of 5,000,000 triangles per frame. Multiplied by 15 frames, that gives 75Mtri/sec. On a mid-grade GPU. I’m not sure you’re going to get that much more out of your GTS 450.
Instancing may help, but the purpose of instancing is to reduce setup overhead. Generally, it helps most when you’re rendering rather more than 1000 objects.
You didn’t say if you were using GL 3.x or not, so I’ll assume you are.
Instancing can generally be done one of two ways. One way is by having a uniform buffer or buffer texture or some other form of storage contain your per-instance data (in your case, orientation and size, packed as small as you possibly can). glDrawElementsInstanced will repeatedly draw the same sequence of triangles, but it will bump an instance count each time it draws a new one. This count is a per-vertex input to your vertex shader named gl_InstanceID. You use that to select which values from the per-instance storage to pick. So it would be something like this:
struct InstanceData {
vec3 qtOrientation; //Recover the fourth component with a square-root
float size;
};
const int MAX_NUM_INSTANCES = 1024;
layout(std140) uniform InstanceArray{
InstanceData instances[MAX_NUM_INSTANCES];
};
void main()
{
vec3 qtOrientiation = instances[gl_InstanceID].qtOrientation;
float size = instances[gl_InstanceID].size;
//Do stuff with orientation and size.
}
Another alternative uses the same draw call, but uses vertex attributes rather than uniforms or buffer textures to store the per-instance data. This uses glVertexAttribDivisor. Essentially, if you set the divisor to the number of vertices you pass to glDrawElementsInstanced, then the attribute will only advance to the next value when an instance is finished.
The benefits of using the divisor are that you can use per-vertex attributes. These can be stored more efficiently (quaternions can be squeezed down to signed shorts, sizes could be bytes or whatever). Also, there are limits on the sizes of uniform buffers and buffer textures; there are no limits on vertex attribute sizes. There could be performance improvements as well.
The downside of the divisor approach is that the volume of per-instance data must fit within the space of 16 vertex attributes. And some of those have to be taken up by your actual mesh data.