I’ve recently been profiling my little application and noticed that glUniformMatrix4fv is taking stupidly long compared to the rest of the calls.
As you can see, it’s taking 16% of the total run time and thus, 16% of the frame time.
The problem is, I don’t know any other way to do what I’m doing. glUniformMatrix4fv is called (in this case) 23 times a frame to upload the transformation matrix of the current object to the vertex shader. I’ve tried using OpenGL’s matrix stuff, but unsurprisingly it’s slower still.
It’s drawing about 19300 polys and the frame time is around 8ms. This seems a little high to me since it was around 4ms when just drawing the main model (which is 19280 polys).
Should it be taking this long, or could this be a side effect of using Java/JOGL?
It might the Java binding, indeed. How is your Java code to declare the float array you pass to it ? ie. using floats or doubles, 2D/1D etc.
Can you drilldown to a specific Java code within this call (little tick on the left) ? Or is it JNI wrapper directly ?
And for glDrawElement ? how many calls, and how do you declare the arguments ? it seems to be much faster…
JOGL wraps buffer pointers in Buffer objects, of which there is corresponding Buffer for each type (ByteBuffer, IntBuffer, etc).
The Buffer objects (part of the JDK) were designed to work in situations like JOGL’s where memory is allocated directly on the heap (instead of the VM).
The JOGL equivalent of the C call would be:
void glUniformMatrix4fvARB(int location, int count, boolean transpose, FloatBuffer value)
That said, there is still the JNI transition from the Java FloatBuffer object to it’s heap location. I guess the only way to check the performance would be to time a trivial example between Java and C.
Actually, I’d be interested to see if there was a performance difference between the FloatBuffer and the float [] versions of that method. Could you run that test?
I had already thought that it was the fact I was smuggling a float[] array over the JNI border, but I have tested using a FloatBuffer, and there’s no real performance difference to using a float[].
Interestingly, the glUniform4fv call is passing in a float[8] and that’s not taking nearly as long, and it’s called more often (If you scroll right in my post you can see the invocation count).
The glDrawElements call is just 3 ints and a long so it’s unsurprising it’s quicker.
I’ll have a look at getting more detail as to what glUniformMatrix4fv is calling, I’ve just realised that JProfiler may be hiding the calls within that class as it’s in the javax.* package.
Well, I managed to speed it up by replacing the mat4 with a vec4[4] and using glUniform4fv instead of glUniformMatrix4fv. It now takes around 6% of the frame time to set the matrix data rather than 16%…although I’m still uploading the same amount of data…
An NVIDIA 6800GT and 163.75 I believe (I’m at work at the moment, might be 163.71).
I will try rolling back the driver to the latest one supposedly supported on Windows 2000 which is 94.24 as I do have an issue with the 163 series not letting me set anti-aliasing to application-controlled.