PDA

View Full Version : Display Lists are running slower than immediate mode



knackered
05-17-2002, 04:18 AM
Sorry about this, I feel a little embaressed at having to ask this question.
In my *very* simple test environment, I'm rendering 8000 cubes (made out of GL_QUADS).
I'm running this on a dual PIII 700mhz with a geforce2 gts with nvidia drivers about 1.5 months old.
I'm getting 36fps without using a display lists, and 24fps using a display list. Both methods use immediate mode to send the vertices and normals.
Here's the code (the Draw_Cube() function just issues 24 glVertex+glNormal calls)
:-




//#define USEDISPLAYLIST
#ifdef USEDISPLAYLIST
static unsigned int displist=0;

if (!displist)
{


displist = glGenLists(1);

glNewList(displist, GL_COMPILE);
glBegin(GL_QUADS);
for (float x=-(w*0.5f); x<(w*0.5f); x+=wstep)
for (float y=-(h*0.5f); y<(h*0.5f); y+=hstep)
for (float z=-(d*0.5f); z<(d*0.5f); z+=dstep)
context.Draw_Cube(x, y, z, sizex);
glEnd();
glEndList();
}
else
glCallList(displist);
#else
glBegin(GL_QUADS);
for (float x=-(w*0.5f); x<(w*0.5f); x+=wstep)
for (float y=-(h*0.5f); y<(h*0.5f); y+=hstep)
for (float z=-(d*0.5f); z<(d*0.5f); z+=dstep)
context.Draw_Cube(x, y, z, sizex);
glEnd();
#endif


Can anyone suggest why the display list is slower than the straight immediate mode?

[This message has been edited by knackered (edited 05-17-2002).]

Robbo
05-17-2002, 04:21 AM
No idea. But I think your code looks really neat and tidy http://www.opengl.org/discussion_boards/ubb/wink.gif

knackered
05-17-2002, 04:27 AM
Mmm, cheers Robbo - your contribution is welcome... http://www.opengl.org/discussion_boards/ubb/smile.gif
I think I'll give the new detonator drivers a go...

saian
05-17-2002, 04:37 AM
I think (just a presume you know) that's because display list stores all the data to a system procedure of a linked fonctionnalities.
So, opengl has to do more steps compared to you another way (immediate mode).

if you compiled your display lists (if you could : your data are just static), i think you'll have more power.

hope this help you a bit

saian
05-17-2002, 04:45 AM
euh... sorry,
i didn't see all your code.
you're not in immediate mode, just calling the display list which was compiled...
so. thousands appologies.

as i saw, you've just one dlist ? maybe to huge for a correct done.

have you tried something like that:

for (float x=-(w*0.5f); x<(w*0.5f); x+=wstep)for (float y=-(h*0.5f); y<(h*0.5f); y+=hstep)for (float z=-(d*0.5f); z<(d*0.5f); z+=dstep)
{
displist = glGenLists(1);
glNewList(displist, GL_COMPILE);
glBegin(GL_QUADS);
context.Draw_Cube(x, y, z, sizex);
glEnd();
glEndList();
}

so, you'll need to have your displist an array for all the dlists.

Sorry. i think i was not in the context.

knackered
05-17-2002, 04:51 AM
Thanks saian - I'll give that a try.
(I don't really want to use immediate mode, it's just a test program - it shocked me, that's all).
BTW, I've just installed the latest w2k detonator drivers, and the results in the same program are:-
WITH display lists:- 24fps (no change)
WITHOUT display lists:- 39fps !! (so that's gone up by 3fps)

knackered
05-17-2002, 05:07 AM
Mm, I get a 2fps increase on the previous display list frame rate if I change it to this:-



if (!displist)
{
displist = glGenLists(cubecnt);

unsigned int i=0;

for (float x=-(w*0.5f); x<(w*0.5f); x+=wstep)
for (float y=-(h*0.5f); y<(h*0.5f); y+=hstep)
for (float z=-(d*0.5f); z<(d*0.5f); z+=dstep)
{

glNewList(displist+i, GL_COMPILE);
glBegin(GL_QUADS);
context.Draw_Cube(x, y, z, sizex);
glEnd();
glEndList();

i++;
}

}
else
{
for (unsigned int i=0; i<cubecnt; i++)
glCallList(displist+i);
}

kehziah
05-17-2002, 05:33 AM
you don't use something like glPolygonMode(GL_FRONT_AND_BACK, GL_LINE), do you?

For lines, display lists have pretty poor performance (at least on my R8500).

[This message has been edited by kehziah (edited 05-17-2002).]

Robbo
05-17-2002, 05:38 AM
Is the slowdown still proportionately the same if you increase\reduce the number of primitives?

I'm just thinking that you might have exceeded or undershot some sweet-spot somewhere http://www.opengl.org/discussion_boards/ubb/wink.gif

kehziah
05-17-2002, 05:43 AM
from the red book
Very small lists may not perform well since there is some overhead when executing a listA simple cube is maybe too small.
I think you have reached 2 limits : method #1 : 8000 is too big and the driver doesn't cache it on the card, method #2 : 8000 list calls per frame causes too much overhead.
What about say 50 cubes per list?

amerio
05-17-2002, 12:57 PM
By reading your code, I think the problem is because you are recreating the display list *every* frame.
Creating a display list is a heavy job, and will certainly cost more than simply drawing.
So try changing your test code so your DL is created only once, and then test by either calling the DL or pure Bgein/End pair...

Quaternion
05-17-2002, 01:11 PM
But he doesn't do that...

if (!displist) create list
else call list

Shlomi.

zeckensack
05-17-2002, 01:19 PM
Show us the code of the Draw_Cube function.
Maybe you're doing some matrix transformation stuff that eats all your memory when compiled into a list.

knackered
05-17-2002, 03:15 PM
lol amerio http://www.opengl.org/discussion_boards/ubb/smile.gif

Ok, here's the draw_cube code - copy and pasted directly (no hidden wires):-




void kGLContext: http://www.opengl.org/discussion_boards/ubb/biggrin.gifraw_Cube(float x, float y, float z, float size)
{
glColor3f(1.0f, 1.0f, 1.0f);

size*=0.5f;

glNormal3f(0.0f, 0.0f, -1.0f);
glVertex3f(x-size, y-size, z-size);
glVertex3f(x+size, y-size, z-size);
glVertex3f(x+size, y+size, z-size);
glVertex3f(x-size, y+size, z-size);

glNormal3f(0.0f, 0.0f, 1.0f);
glVertex3f(x-size, y-size, z+size);
glVertex3f(x+size, y-size, z+size);
glVertex3f(x+size, y+size, z+size);
glVertex3f(x-size, y+size, z+size);

glNormal3f(0.0f, 1.0f, 0.0f);
glVertex3f(x-size, y+size, z-size);
glVertex3f(x+size, y+size, z-size);
glVertex3f(x+size, y+size, z+size);
glVertex3f(x-size, y+size, z+size);

glNormal3f(-1.0f, 0.0f, 0.0f);
glVertex3f(x-size, y-size, z-size);
glVertex3f(x-size, y+size, z-size);
glVertex3f(x-size, y+size, z+size);
glVertex3f(x-size, y-size, z+size);

glNormal3f(0.0f, -1.0f, 0.0f);
glVertex3f(x+size, y-size, z-size);
glVertex3f(x-size, y-size, z-size);
glVertex3f(x-size, y-size, z+size);
glVertex3f(x+size, y-size, z+size);

glNormal3f(1.0f, 0.0f, 0.0f);
glVertex3f(x+size, y+size, z-size);
glVertex3f(x+size, y-size, z-size);
glVertex3f(x+size, y-size, z+size);
glVertex3f(x+size, y+size, z+size);
}


As you can see, there are less glNormal calls then I previously said...

kehziah: I tried it with less cubes, but the display lists are always slower - the lower the number of cubes, the less the difference - but dlists are always slower.
I'm now running it on my home machine, with a geforce3 ti500 in it, and it's the same problem - although all frame rates are slightly higher (obviously).

Grimba
05-17-2002, 04:27 PM
I am guessing you are seeing driver limitations made by nvidia because they want their consumer based cards slower than their Quadro cards at professional apps. And professional apps use things like dlists. They need to make immediate mode fast for games like mdk2 which use immediate mode.

Lev
05-17-2002, 04:40 PM
mdk2 uses immediate mode?? where did they get the engine programmers? from the dumps? When I hear(read) such things I almost agree to someone here in the forum (don't remember who) who would like to see immediate mode banned from opengl.

-Lev

knackered
05-17-2002, 05:35 PM
I feel like I've entered the twighlight zone today - I mean, I can remember display lists being around 6X faster than immediate mode...or did I dream that year?

mcraighead
05-17-2002, 06:19 PM
Yes, MDK2 uses immediate mode. (It also has a display list option, but it runs slower with display lists, for some strange reason.)

In this case, I'd suggest that the problem is your use of four vertices for each normal. This sort of nonuniform vertex usage is good for immediate mode and likely to be bad in other cases. It's very hard to optimize that sort of usage.


Now, if you think overuse of immediate mode in apps is a bad thing, you haven't seen the incredible stupidity of some GL apps.

- Matt

jwatte
05-18-2002, 10:18 AM
> I almost agree to someone here in the
> forum (don't remember who) who would like
> to see immediate mode banned

Might have been me. I think any API designed for performance should be block streaming based; ideally with application access to driver-allocated buffers.

UNIX write() is bad. nVIDIA VAR/ATI MOB is good.

Anyway, display lists ought to work well when you compile them once and then draw them "forever". It ought to be possible to optimize simple glBegin()/issue/glEnd() cases no matter what it is that you're issuing, fairly simply, as the driver ought to be capable of expanding all current state per vertex when Vertex3f() is called. Perhaps the display list optimization isn't that aggressive, though.

Regarding the initial code: the glCallList() should not be in an "else" as you still need to draw it after compiling it; this is just a cosmetic issue though. Also, if the lists are very large, they may be sub-optimal.

knackered
05-18-2002, 11:17 AM
What can I say - I'll try putting normal calls in between all vertex calls.

I don't use immediate mode, unless I'm doing a test program and want to knock some triangles up quick, and don't want to link in my container class lib to do vertex arrays. I usually compile display lists from gldrawelement calls, dereferencing the attribute arrays at dlist compile time.
I'm just puzzled by this anomaly - compiled immediate mode is slower than non-compiled immediate mode....

jwatte, bear in mind this is a test program, which was originally intended to test something other than the rendering part, so don't worry about me missing drawing objects while display lists are compiling http://www.opengl.org/discussion_boards/ubb/smile.gif

martin_marinov
05-18-2002, 12:32 PM
Hi

maybe the problem is that dl compilation for this is very simple - jast put the vertices in one array, normals in other and so on. So normals get multiplied by 4 in this case, efektively increasing the geometry send by 60% (4 vertices + 1 normal vs. 4 vertices + 4 normals). And since the display list is too large, driver chooses that it cannot fit into VM, so you hit the bandwidth limitation when you glCallList().

I dont have even an imagination what the driver does here, I'm only guessing http://www.opengl.org/discussion_boards/ubb/smile.gif. So I can be completely wrong, of course http://www.opengl.org/discussion_boards/ubb/smile.gif
This however proves that the imediate mode has its use cases, and maybe it's better that it exists http://www.opengl.org/discussion_boards/ubb/smile.gif - of course developing 3D game seems for me a wrong use case for it http://www.opengl.org/discussion_boards/ubb/smile.gif

Regards
Martin

mcraighead
05-18-2002, 01:26 PM
I'm in favor of throwing out immediate mode... I think immediate mode should be part of GLU.

- Matt

martin_marinov
05-18-2002, 01:52 PM
Originally posted by mcraighead:
I'm in favor of throwing out immediate mode... I think immediate mode should be part of GLU.

- Matt

At least the live will be easyer for OpenGL driver developers after that http://www.opengl.org/discussion_boards/ubb/smile.gif http://www.opengl.org/discussion_boards/ubb/smile.gif

Martin

Lev
05-18-2002, 02:38 PM
Immediate mode as part of GLU is a cool idea! It wouldn't even be that hard to implement via vertex arrays. Since immediate mode isn't aimed for performance anyway a small glu layer wouldn't change that much

-Lev

[This message has been edited by Lev (edited 05-18-2002).]

knackered
05-18-2002, 02:43 PM
Originally posted by mcraighead:
I'm in favor of throwing out immediate mode... I think immediate mode should be part of GLU.

- Matt

I agree it should not have been part of GL in the first place, but in GLU. But, weren't vertex arrays not included in the original 1.0 version of opengl?
Does supporting immediate mode have *that* much of an impact on what you can do with the rest of GL these days? If so, then maybe it should be moved to glu - but programs would have to be recompiled....

mcraighead
05-18-2002, 06:17 PM
That is of course the problem. GL1.0 had only immediate mode and display lists. There was no good, fast way to do dynamic geometry.

- Matt

V-man
05-18-2002, 09:30 PM
There were vertex array extensions during 1.0 (GL_EXT_vertex_array).
Kind of strange that something as obvious as VA's wasn't ready in 1.0

Matt, is that really the explanation for the performance loss? What's the full story?

V-man

mcraighead
05-18-2002, 11:22 PM
I have no way to know without running the app, but I suspect it's as simple as the geometry expanding when extra normals are added.

- Matt

knackered
05-19-2002, 03:51 AM
Specifying a normal for every glvertex call, and removing the glcolor call - these are the results (on gf3ti500 with 8000 cubes and a static viewpoint):-

WITH dlist: 34fps
WITHOUT dlist: 43fps

So no change there.

Matt, with all due respect, you have seen the entire app - except for the creation of the window and context - window is about 512x512, context is 32bit colour buffer, 24bit zbuffer, 0bit stencil buffer, double buffered.
wglMakeCurrent is issued once at initialisation.
There's a very low priority thread dealing with window messages.

I'm not too bothered by this, because as I said, I don't use immediate mode in anything important anyway - but I do hope that display lists work better with VA's, as I create these if VAR or VOB is not supported on the card.

mcraighead
05-20-2002, 01:53 PM
No, the only way I could know what was going on would be to *run* the app. [Which I probably don't have time to do at present...]

- Matt

Shag
05-20-2002, 04:39 PM
Knackered ... this may seem insulting (not meant to be http://www.opengl.org/discussion_boards/ubb/smile.gif ) ... but you're not doing your drawing based on windows messaged are you?

You mentioned a low priority thread ...

[This message has been edited by Shag (edited 05-20-2002).]

knackered
05-20-2002, 11:24 PM
No Shag, I'm not. The message thread just deals with resize, mouse, quit and char messages. The drawing happens in the main thread (WinMain) in this test app.

bsenftner
05-21-2002, 07:28 AM
I'm interested in hearing what happens if you switch from using quads to triangles... Drivers seem to be so triangle centric, I would not be surprised if that is the problem...

I remember having something similar occur when I first got a GeForce256 card: my display lists were slower than immediate mode and I could not figure out why. I wasted some time and got all depressed and moved on to other areas of the app that were getting neglected. Then one morning I noticed that my frame rate was higher than normal... the sun spot or whatever it was must have ended because from that day on my display lists have been suitable faster than immediate mode. I believe that nothing I did in the other portions of the app could have affected my display list render speed... so I'm interested in hearing how your investigations turn out.

Rml4o
05-27-2002, 09:27 AM
I experienced the same performance problem with my GeForce3 Ti200 card. In some cases, immediate mode would give better performances than display lists. From a few experiments I deduced that display lists are worse than immediate mode when they are small. Display lists must be used in a clever way, that is you must create not-too-small ones, with as many shared vertices as possible, and large strips or fans or quads. Anyway, you should use VAR: it's the best possible means of specifying geometry. And if you improve the sorting of your indexes, you will benefit from the vertex cache.

knackered
05-28-2002, 12:36 AM
I don't really care about immediate mode, I never use it - it was an experiment, for heavens sake!
I'm sure this performance difference was introduced long ago, and I've only just discovered it because I never use it.
The upshot is, I tried drawing the cubes as quads, then as triangles, specifying per face normals, then specifying per vertex normals, creating a small cube-sized display list, then creating a large 8000-cube-sized display list, then creating a medium sized 200-cube sized display list. No matter what I did, display lists were always slower than the uncompiled immediate mode commands.
It doesn't worry me, because vertex arrays compiled into display lists are still faster than uncompiled vertex arrays - this was all I was concerned about.
I still don't understand why the performance is bad with compiled immediate mode, but I'm past caring - and the test program I wrote has moved on to other things....