I recently dropped using display lists because I was getting horrible performance compared to immediate mode. DLs had other annoying properties so I dropped them without really understanding why they were so slow.
Instead I turned to Vertex Buffer Objects. And guess what. They’re also consistently slower than immediate mode (dropped framerate by 3x).
I am jusing LWJGL (GL extension for Java), so this may not be an OGL problem. If anyone has seen similar problem in C or C++ I’d at least know where to look .
A boiled down version of my immediate code looks like this:
private void renderMesh()
{
for (int f = 0; f < m_aFaces.length; f++)
{
Face face = m_aFaces[f];
GL11.glBegin(GL11.GL_TRIANGLES);
GL11.glNormal3f(face.nx, face.ny, face.nz);
GL11.glBegin(GL11.GL_TRIANGLES);
GL11.glVertex3f(face.v0x, face.v0y, face.v0z);
GL11.glVertex3f(face.v1x, face.v1y, face.v1z);
GL11.glVertex3f(face.v2x, face.v2y, face.v2z);
GL11.glEnd();
}
}
My VBO code (again in a trimmed down version) looks like this
GL11.glEnableClientState(GL11.GL_VERTEX_ARRAY);
GL11.glEnableClientState(GL11.GL_NORMAL_ARRAY);
ARBVertexBufferObject.glBindBufferARB( ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, m_iVertVBO);
GL11.glVertexPointer(3, GL11.GL_FLOAT, 0, 0);
ARBVertexBufferObject.glBindBufferARB( ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, m_iNormVBO);
GL11.glNormalPointer(GL11.GL_FLOAT, 0, 0);
for (int f = 0; f < m_aFaces.length; f++)
{
Face face = m_aFaces[f];
GL11.glBegin(GL11.GL_TRIANGLES);
GL11.glArrayElement(face.v0);
GL11.glArrayElement(face.v1);
GL11.glArrayElement(face.v2);
GL11.glEnd();
}
GL11.glDisableClientState(GL11.GL_VERTEX_ARRAY);
GL11.glDisableClientState(GL11.GL_NORMAL_ARRAY);
The VBO is created like this:
FloatBuffer vertbuffer = BufferUtils.createFloatBuffer(3*verts.length);
for(int j=0,i=0;i<verts.length;i++)
{
vertbuffer.put(j++,verts[i].x);
vertbuffer.put(j++,verts[i].y);
vertbuffer.put(j++,verts[i].z);
}
IntBuffer temp = BufferUtils.createIntBuffer(1);
ARBVertexBufferObject.glGenBuffersARB(temp);
int iVBO = temp.get(0);
ARBVertexBufferObject.glBindBufferARB( ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, iVBO);
ARBVertexBufferObject.glBufferDataARB( ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, verts.length*3*4, vertbuffer, ARBVertexBufferObject.GL_STATIC_READ_ARB);
I’m aware that glArrayElement is not the fastest way to use buffers, but I’d hate to expand meshes to 4, 5 or 6 times as many vertices as I really need (I need multiple texcoords per vertex).
Additionally, why would GL_STATIC_DRAW_ARB be 3x slower still than GL_STATIC_READ_ARB?
I am not reading data from GL, as you can see from my code. DRAW should be the correct hint, and there is no reason why READ would be faster, esp. 3x faster. But it is. (In summary DRAW is 9x slower than immediate, READ is 3x slower).
I’m on a 2.5GHz Intel using a GF4 Ti4400 card, newest drivers (2 days ago).
Any ideas?