Large meshes drawing slowly

I’ve been developing a loader/player for loading and displaying animated characters. It uses a shader I wrote that does simple phong shading with a texture map and GPU skinning. Data is read from a custom file format, and each mesh defined in the file is put into two VBOs. The first VBO is GL_FLOAT and contains all vertex data (including skin weights) and the other is GL_UNSIGNED_SHORT and contains vertex indices.

Anyhow, things were going well until I was asked to use it to display our level background files which are unskinned and quite large. My program which had been running smoothly for files containing one or two meshes with hundreds of triangles was now chugging under level files with hundreds of meshes, some of which had thousands of trianges.

My profiler is showing that nearly all of the bottleneck comes from a call to glDrawElements() with GL_TRIANGLES.

Would anyone have any advice for getting things running quickly? Right now I’m using a weighty 22 floats per vertex (3 pos, 3 norm, 2 uv0, 2 uv1, 4 color, 4 bone weight, 4 bone index) One thing I could do is detect when bones are not being used and throw out 8 floats, but that will only save about 1/3 of space. Each mesh has its own vertex VBO and the data is not interleaved. Each frame, for each mesh I set all the attribute and uniform values on my shader and then call glDrawElements().

Right now the largest mesh in one of my test files has 4587 verts (403656 bytes), and 8145 indices (16290 bytes).

What would be some things I could do to optimize? Also, what would be a good OpenGL profiler to use? (Prefereably something low/no cost). The program is written in Java, and my card is an NVidia Quadro 500.

How are you profiling it? Most important of all, how do you define “slowly”? What is the performance you’re getting and what performance do you want?

My profiler is showing that nearly all of the bottleneck comes from a call to glDrawElements() with GL_TRIANGLES.

Are you making that call one time, or many times? Are you using index buffers or supplying indices on the CPU?

The program is written in Java

Then you gave up your right to complain about performance.

I’m profiling it by running Java’s JVisualVM tool. It measures the amount of time spent in method calls. Most of the time used by by program is spend inside of the glDrawElements() method call.

By slowly, I mean 1 or 2 seconds per frame on my large test file. I upload and process my data when the program starts, which includes building my VBOs, textures and the shader. I am then calling my render loop continuously in a thread, in which I simply attach the shader program, load all my uniforms and attributes into the shader, and then call glDrawElements() to renderin my shapes.

Since the same scene can be rendered in Maya at around 15 fps, I’d prefer to have a similar draw rate.

I’m using glGenBuffers() to allocate a GL_UNSIGNED_SHORT buffer of indices. They’re being bound before the draw call too.

Java is not slow. Besides all the performance issues are happening in the OpenGL API after all CPU processing is done with.