glBegin/End - 5ms!

Ok that doesn’t sound like much but when you’re calling it 100,000 times a frame it is

I know what you’re thinking, use vertex arrays. I can’t in this app because I need to change projection/modelview every few polys, also I have to change scissor tests. So I think I’m stuck with immediate mode. The matrix loading and scissor code is not the bottleneck, its the glBegins and to a lesser extent glEnable/Disable of blending. (I already have coded it to make sure I do these a few times as possible).

So my question is why does it take so long. There seems to be no code checking whether any of the setup is relevent since it takes the same time if you call it twice with no state changes in between. I suppose checking might in a few cases slow it down further. It all depends what is taking the time, if it is a variety of small things then it’s impossible to optimise but if it were doing a lot of processing because of one possible state change then maybe it can be. (I’m sure if it was this obvious it would have been done already)

Here are some benchmarks. The numbers are milliseconds to perform an begin/end/enable/disable 1000 times. I overclocked/underclocked my FSB and GPU to determine where the bottleneck lay. It mostly seems to be dependent on the FSB/CPU.

-33% FSB +10% GPU

glBegin/glEnd
GL_QUADS WO FINISH 123
GL_QUADS W FINISH 5816

glEnable/glDisable
GL_BLEND WO FINISH 181
GL_BLEND W FINISH 5754

glEnable/glDisable
GL_TEXTURE_2D WO FINISH 94
GL_TEXTURE_2D W FINISH 248

-33% FSB -10% GPU

GL_QUADS WO FINISH 125
GL_QUADS W FINISH 5799

GL_BLEND WO FINISH 197
GL_BLEND W FINISH 5845

GL_TEXTURE_2D WO FINISH 94
GL_TEXTURE_2D W FINISH 236

+5% FSB +0% GPU

GL_QUADS WO FINISH 93
GL_QUADS W FINISH 4777

GL_BLEND WO FINISH 162
GL_BLEND W FINISH 4715

GL_TEXTURE_2D WO FINISH 68
GL_TEXTURE_2D W FINISH 183

So if you have any information as to why it takes the time it does I’d be interested to hear. Or if you know how I can get around this.

btw, those benches were on a XP2000 GF4.

[This message has been edited by Adrian (edited 01-09-2003).]

[This message has been edited by Adrian (edited 01-09-2003).]

If I were you, I’d be thinking about why you have to change the modelview matrix and the scissor rectangle every few polys, rather than trying to figure out why immediate mode rendering is slow. Care to elaborate on what it is you’re doing?

– Tom

Originally posted by Tom Nuydens:
If I were you, I’d be thinking about why you have to change the modelview matrix and the scissor rectangle every few polys, rather than trying to figure out why immediate mode rendering is slow.
– Tom

Yes, I’ve thought about it a lot and there is an alternative but it has its own set of problems that are arguably worse.

I’m using hemicubes for realtime radiosity which involves setting up the camera for five different views for each patch on a wall. The scissor stuff is an optimisation to reduce the readpixel bottleneck. The alternative to hemicubes is a hemisphere projection using a vertex program but this has its own set of problems.

I know there is unlikely to be a solution to my problem but I’m partly posting because I think it will be interesting to some people to see the speeds of those calls.

Anyone know if the professional cards like the Quadro/Wildcat are faster at this kind of thing. I think it’s unlikely since it appears to be a CPU not GPU thing but I’d be interested to know.

[This message has been edited by Adrian (edited 01-09-2003).]

Hi

perhaps you can store all your geometry in large vertexarrays, setup your matrices … and then call glDrawElements with your [small amount of]indices. You then at least save the glBegin()/glEnd(). When sometimes (hopefully soon ) the ARB_VAO extension is ready, you could use it to improve rendering speed.

Bye
ScottManDeath