Glut code too CPU intensive, help me optmize!

It’s a simple program using Glut that draws a 100x100 cube field, an 8x accum. is also used…(i just thought it looked “cool”). THe problem is it’s running really slow (40fps without glaccum), 5fps with 8x accum.

Any ideas what I can do to increase the performance?

Get Code HERE!

Each push/pop matrix is stalling the pipeline.
Write your own cube routine, and apply the
translations in code, rather than churning the
OpenGL transformation state so heavily. :slight_smile:

Nigel

What do you mean by “Write your own cube routine”? OGL is new to me so it would be nice if you could provide a detailed explaination/code sample. Thanks in advance!

ALso, the pop/push matrix routine is commented out. It WAS used to rotate the sphere that WAS in the middle of the scene.

Here is the OpenGLUT implementation of glutSolidCube. It draws the six faces as quads, each with a different surface normal.

Hope you find it useful!

Nigel

 
void glutSolidCube( GLdouble width )
{
    double size = width * 0.5;

#   define V(a,b,c) glVertex3d( a size, b size, c size );
#   define N(a,b,c) glNormal3d( a, b, c );
    /* PWO: Again, I dared to convert the code to use macros... */
    glBegin( GL_QUADS );
        N( 1, 0, 0 ); V( +, -, + ); V( +, -, - ); V( +, +, - ); V( +, +, + );
        N( 0, 1, 0 ); V( +, +, + ); V( +, +, - ); V( -, +, - ); V( -, +, + );
        N( 0, 0, 1 ); V( +, +, + ); V( -, +, + ); V( -, -, + ); V( +, -, + );
        N( -1, 0, 0 ); V( -, -, + ); V( -, +, + ); V( -, +, - ); V( -, -, - );
        N( 0, -1, 0 ); V( -, -, + ); V( -, -, - ); V( +, -, - ); V( +, -, + );
        N( 0, 0, -1 ); V( -, -, - ); V( -, +, - ); V( +, +, - ); V( +, -, - );
    glEnd( );
#   undef V
#   undef N
}

You mean replace the GlutSolidCube routine with this? Ok, I’ll try that.

Yep – perhaps rename your copy to mySolidCube,
so the linker doesn’t get upset about two
functions with the same name.

You could also get a speedup by putting your
loop inside the glBegin/glEnd block, so
that all the quads are sent in one big block,
rather than in per-cube bundles.

After that, try a display list.

After that, try a vertex array.

You should be able to get a big speedup
without using display lists or
vertex arrays though…

Nigel

K, thanks. Your “glutcube” gave me a 30% speed gain, I’ll try other things, too.

There is another push/pop pair within the cubes() routine. That’s the one you should aim to remove by doing all the translations yourself inside a mySolidCube routine.

The problem is that OpenGL has to wait until the entire pipeline is clear before changing the transformation state. Your application is spending most of the time waiting for six quads to get from one end to the other.

void cubes(){
  for (int x=-50; x<50; x+=1){
    for (int y=-50; y<50; y+=1){
      glPushMatrix();
      glTranslatef(x*3,y*3,-2.0);
      glColor3f(abs(x)/10.0f, abs(y)/10.0f, cos(y));
      glutSolidCube(2);
      glPopMatrix();
    }
  }
}

I might try to do the translations myself tomorrow… I need sleep.

Also, DL didn’t help much at high resolution, but at 320x240, my fps doubled, yay!

Originally posted by mpan3:
Also, DL didn’t help much at high resolution, but at 320x240, my fps doubled, yay!
That means you are limited by the number of pixels drawn, so it will be hard to improve framerate without having a smaller glaccum.

With a display list, does that mean the cubes are only generated/transformed once and then store in a buffer?

Updated Code here!

You could also get a speedup by putting your
loop inside the glBegin/glEnd block, so
that all the quads are sent in one big block,
rather than in per-cube bundles.

? Please elaborate on that…

A display list will contain the matrix push/pop
and transformation calls. It doesn’t pre-compute
the transformations.

A begin/end block with 1000 quads will be quicker
than 100 begin/end blocks of 10 quads, because there are no state-switching delays between the blocks.

But, that does mean a little more work on the application-side.

Is there anyway to precompute a list of vertices and store is in a buffer of somesort?

…so that no matter how many push/pop matrix I use, it is only run once during the application “initial”

you are looking for fast API ? On windows platforms, Win32 API is faster than others window manager
Ok, it’s boring to write 1000 lines with only 250 OpenGL lines (the rest go to window manager)

Gollum, I don’t think the problem here is related to win32 or glut api.

yes i am already using glut which saves me from a lot of trouble. but that’s not where the bottlenecks are.

Zbuffer > yes i know, it’s related to non-optimized code, but i just said the Win32 is faster than GLUT, that’s all…