I've been developing a loader/player for loading and displaying animated characters. It uses a shader I wrote that does simple phong shading with a texture map and GPU skinning. Data is read from a custom file format, and each mesh defined in the file is put into two VBOs. The first VBO is GL_FLOAT and contains all vertex data (including skin weights) and the other is GL_UNSIGNED_SHORT and contains vertex indices.
Anyhow, things were going well until I was asked to use it to display our level background files which are unskinned and quite large. My program which had been running smoothly for files containing one or two meshes with hundreds of triangles was now chugging under level files with hundreds of meshes, some of which had thousands of trianges.
My profiler is showing that nearly all of the bottleneck comes from a call to glDrawElements() with GL_TRIANGLES.
Would anyone have any advice for getting things running quickly? Right now I'm using a weighty 22 floats per vertex (3 pos, 3 norm, 2 uv0, 2 uv1, 4 color, 4 bone weight, 4 bone index) One thing I could do is detect when bones are not being used and throw out 8 floats, but that will only save about 1/3 of space. Each mesh has its own vertex VBO and the data is not interleaved. Each frame, for each mesh I set all the attribute and uniform values on my shader and then call glDrawElements().
Right now the largest mesh in one of my test files has 4587 verts (403656 bytes), and 8145 indices (16290 bytes).
What would be some things I could do to optimize? Also, what would be a good OpenGL profiler to use? (Prefereably something low/no cost). The program is written in Java, and my card is an NVidia Quadro 500.