PDA

View Full Version : Glut code too CPU intensive, help me optmize!



mpan3
10-29-2004, 07:20 PM
It's a simple program using Glut that draws a 100x100 cube field, an 8x accum. is also used...(i just thought it looked "cool"). THe problem is it's running really slow (40fps without glaccum), 5fps with 8x accum.

Any ideas what I can do to increase the performance?

Get Code HERE! (http://members.shaw.ca/mpan3/GL.txt)

nigels
10-29-2004, 07:31 PM
Each push/pop matrix is stalling the pipeline.
Write your own cube routine, and apply the
translations in code, rather than churning the
OpenGL transformation state so heavily. :)

Nigel

mpan3
10-29-2004, 07:34 PM
What do you mean by "Write your own cube routine"? OGL is new to me so it would be nice if you could provide a detailed explaination/code sample. Thanks in advance!

ALso, the pop/push matrix routine is commented out. It WAS used to rotate the sphere that WAS in the middle of the scene.

nigels
10-29-2004, 07:41 PM
Here is the OpenGLUT (http://www.openglut.org) implementation of glutSolidCube. It draws the six faces as quads, each with a different surface normal.

Hope you find it useful!

Nigel



void glutSolidCube( GLdouble width )
{
double size = width * 0.5;

# define V(a,b,c) glVertex3d( a size, b size, c size );
# define N(a,b,c) glNormal3d( a, b, c );
/* PWO: Again, I dared to convert the code to use macros... */
glBegin( GL_QUADS );
N( 1, 0, 0 ); V( +, -, + ); V( +, -, - ); V( +, +, - ); V( +, +, + );
N( 0, 1, 0 ); V( +, +, + ); V( +, +, - ); V( -, +, - ); V( -, +, + );
N( 0, 0, 1 ); V( +, +, + ); V( -, +, + ); V( -, -, + ); V( +, -, + );
N( -1, 0, 0 ); V( -, -, + ); V( -, +, + ); V( -, +, - ); V( -, -, - );
N( 0, -1, 0 ); V( -, -, + ); V( -, -, - ); V( +, -, - ); V( +, -, + );
N( 0, 0, -1 ); V( -, -, - ); V( -, +, - ); V( +, +, - ); V( +, -, - );
glEnd( );
# undef V
# undef N
}

mpan3
10-29-2004, 07:46 PM
You mean replace the GlutSolidCube routine with this? Ok, I'll try that.

nigels
10-29-2004, 07:56 PM
Yep -- perhaps rename your copy to mySolidCube,
so the linker doesn't get upset about two
functions with the same name.

You could also get a speedup by putting your
loop inside the glBegin/glEnd block, so
that all the quads are sent in one big block,
rather than in per-cube bundles.

After that, try a display list.

After that, try a vertex array.

You should be able to get a big speedup
without using display lists or
vertex arrays though...

Nigel

mpan3
10-29-2004, 08:03 PM
K, thanks. Your "glutcube" gave me a 30% speed gain, I'll try other things, too.

nigels
10-29-2004, 08:17 PM
There is another push/pop pair within the cubes() routine. That's the one you should aim to remove by doing all the translations yourself inside a mySolidCube routine.

The problem is that OpenGL has to wait until the entire pipeline is clear before changing the transformation state. Your application is spending most of the time waiting for six quads to get from one end to the other.


void cubes(){
for (int x=-50; x<50; x+=1){
for (int y=-50; y<50; y+=1){
glPushMatrix();
glTranslatef(x*3,y*3,-2.0);
glColor3f(abs(x)/10.0f, abs(y)/10.0f, cos(y));
glutSolidCube(2);
glPopMatrix();
}
}
}

mpan3
10-29-2004, 10:57 PM
I might try to do the translations myself tomorrow... I need sleep.

Also, DL didn't help much at high resolution, but at 320x240, my fps doubled, yay!

ZbuffeR
10-30-2004, 07:04 AM
Originally posted by mpan3:
Also, DL didn't help much at high resolution, but at 320x240, my fps doubled, yay!That means you are limited by the number of pixels drawn, so it will be hard to improve framerate without having a smaller glaccum.

mpan3
10-30-2004, 07:55 AM
With a display list, does that mean the cubes are only generated/transformed once and then store in a buffer?

Updated Code here! (http://members.shaw.ca/mpan3/GL2.txt)

mpan3
10-30-2004, 08:24 AM
You could also get a speedup by putting your
loop inside the glBegin/glEnd block, so
that all the quads are sent in one big block,
rather than in per-cube bundles.

? Please elaborate on that...

nigels
10-30-2004, 08:54 AM
A display list will contain the matrix push/pop
and transformation calls. It doesn't pre-compute
the transformations.

A begin/end block with 1000 quads will be quicker
than 100 begin/end blocks of 10 quads, because there are no state-switching delays between the blocks.

But, that does mean a little more work on the application-side.

mpan3
11-02-2004, 12:49 PM
Is there anyway to precompute a list of vertices and store is in a buffer of somesort?

...so that no matter how many push/pop matrix I use, it is only run once during the application "initial"

Silkut
11-03-2004, 07:06 AM
you are looking for fast API ? On windows platforms, Win32 API is faster than others window manager
Ok, it's boring to write 1000 lines with only 250 OpenGL lines (the rest go to window manager)

ZbuffeR
11-03-2004, 12:16 PM
Gollum, I don't think the problem here is related to win32 or glut api.

mpan3
11-04-2004, 06:56 AM
yes i am already using glut which saves me from a lot of trouble. but that's not where the bottlenecks are.

Silkut
11-04-2004, 08:19 AM
Zbuffer > yes i know, it's related to non-optimized code, but i just said the Win32 is faster than GLUT, that's all..