View Full Version : Display list performance

11-04-2000, 06:07 AM
I need some help with DL performances.
Should I optimize in some way the data I send in a display list,
or leave it to the driver ?

I mean, actually I'm doing this :
glBegin(GL_TRIANGLES); // no, triangles strips are not suitable for me ...
glVertex(a lot of time);

And I'm getting very poor performance when sending 10.000 vertex or over on GeForce2 card, with 64Mb!

To be more precise :
My app reads in Lightwave models, complete with color, texture, etc.
When looking the model inside Lightwave, it runs almost perfectly
smooth (guess around 30fps. But surely a lot above 5fps!)
When playing the exact same model in my own viewer, I
barely reach 5fps, sometimes even 2fps !
I don't understand why, as :
- LW does not use display list (the mesh can be deformed)
- I DO USE display list, with optimized surfaces
(ie,I first set the materials parameters, then send all triangles
with this param... etc) I do not have a lot of
state changing ! Send all triangles at once !
Even tried with simple cubes (5000 * 6 quads,
simplest surface parameters...)
=> LW = smooth, Me = bleah !

-LW does not use strips (the mesh is not smooth, so strips are inadequate)
-I don't neither (same reason)

I use lighting with 1 light, single sided,
infinite viewer and light...

Should I do something else ?

PS: I use glut as the framework.
(performance hit ?)


11-04-2000, 06:41 AM
Perhaps you (or GLUT for you) are setting up
the render state in a way which forces
software rendering. Look at all the
parameters and environment to see if there is
something which can force software (some
weird bit depth or texture format, some
esoteric render state, etc).

11-05-2000, 10:41 PM
Some months ago, while playing with my GeForce, I noticed that this code:


is a LOT LOT LOT LOT slower than doing:


It had nothing to do with rendering states or whatever... I posted it in this very forum (it was the old version back then) and some people confirmed they had the same behaviour !

I do not know if it was a bug in the drivers. I do not know the behaviour is still the same....

But just in case you are using GL_COMPILE_AND_EXECUTE, try to switch to GL_COMPILE....

If this solves the problem, I guess we need to send a message to nVidia...

Best regards.


P.S. : for those who wonder, I really had a factor 10 increase in performance when doing what is described above !

11-06-2000, 04:14 AM
Same for me on TNT2 ULTRA.
I was first using GL_COMPILE_AND_EXECUTE displaylists and it was damn slow, slower than without displaylists (looking my code for bugs, etc, found nothing ...) and then, at last ressort, i tried to GL_COMPILE then glCallList ... MAGIC ! Suddenly very fast http://www.opengl.org/discussion_boards/ubb/biggrin.gif http://www.opengl.org/discussion_boards/ubb/smile.gif http://www.opengl.org/discussion_boards/ubb/biggrin.gif

11-06-2000, 05:40 AM
We're well aware of the consequences of COMPILE_AND_EXECUTE. Consider, though; it is NOT equivalent to first compiling, then executing. In particular, if you do a Get inside the display list compilation, in one case you'll get the old value and in the other you'll get the new value.

COMPILE_AND_EXECUTE is another of those features that OpenGL should have never included in the first place, right up there with feedback, FRONT_AND_BACK rendering, and edge flags.

- Matt

11-06-2000, 05:48 AM
Matt, believe me I know that !

The thing is, after having done the compilation once, you'd expect the glCallList being as fast in either case...

This was not the case at all when I ran my tests. Using GL_COMPILE_AND_EXECUTE when creating the list resulted in a very slow display list at glCallList level....

If there is a good reason why there should be a difference, I would really like to know it !



11-06-2000, 05:49 AM
I forgot to mention that I did not use glGet or any fancy stuff in my display list... It was pure glColor/glBegin/glVertex/glEnd calls...



11-06-2000, 09:53 AM
OK, so to be more precise :

-The list is 'GL_COMPILE'
-I'm sure (!) not to be in software render
mode (imagine 10000 multi-textured poly
with alpha blend in software ? even 1fps ?)
But why soooooo slow ?
-no abusive state change
-no glGet at all !
-no glFinish/glFlush
-no EXT used
-ALL triangles are send at once, with a single glBegin()/glEnd() in the DL.
-It is slow even with just one light
and no textures at all, for there is
more than 10000 vertex (to give a rough idea)

I can mail/post part of the code if you
can help...
(oh please....)

11-06-2000, 12:08 PM
Let me get this straight...

If you compile the display list using COMPILE_AND_EXECUTE, _future_ executions of that display list are significantly slower than that display list compiled as COMPILE?

How much slower? Is it is slower than if you just did those exact same commands that you compiled in immediate mode? At minimum, display lists shouldn't be any slower.

I had always thought it was just the compiling process that was slow. If it's more than that, maybe there's a problem.

- Matt

11-06-2000, 10:16 PM

Yes you got it: when using GL_COMPILE_AND_EXECUTE, the compile time is awfully long *PLUS* the _FUTURE_ executions of the list ARE SLOW ! As far as I remember they were almost slower than issuing the direct commands (but that is highly subjective as I did not try to time them...).

As I told you, I haven't tried again since I switched to GL_COMPILE + glCallList. I am going to try again today... Do you want I send you an application if I manage to reproduce the problem ?

Amerio, can you e-mail me your code (if it is not part of a commercial app !) ? I would like to understand what the problem is...

I'll post the results of my tests here...



11-07-2000, 12:06 AM
OK, just performed the tests again and I have the same behaviour !

It only happens on one of my *HUGE* models...

When using GL_COMPILE, I have 8-9 FPS.
When using GL_COMPILE_AND_EXECUTE, I have 4-5 FPS.

The thing is, I can not e-mail the model file (first, it is 16Mb big but moreover it is part of a project we did...). I am trying to find such a big model that would show the same behaviour...

Matt, or anyone at nVidia, have you got an FTP site I could upload such a model + the application + the source to ???? Although this program is nothing special, I'll ask you not to disclose any part of it.



11-07-2000, 07:13 AM
I already know what the issue is, but fixing it could very possibly be more trouble than it's worth. I don't know how to fix it, certainly.

- Matt

11-07-2000, 07:33 AM
Matt, that's not really a problem as long as people are aware of it ! I keep telling people to use GL_COMPILE only since I discovered this ! Maybe some commercial apps would benefit of knowing/using it...

Can you explain what the problem is or does it touch a confidential part of the drivers ?



11-07-2000, 08:00 AM
Nope, I can't talk about anything that relates to the internals of our drivers.

- Matt

11-07-2000, 09:24 AM
I'm not aware of how NVidia drivers are built but ...
Why not patch the driver so that when GL_COMPILE_AND_EXECUTE is used, the drivers simply performs a GL_COMPILE and a glCallList ? Is a way, this is compile then execute, behavior is the same and problems are solved http://www.opengl.org/discussion_boards/ubb/wink.gif

11-07-2000, 10:56 AM
I'm having the a simliar problem with my own code! I'm using display lists with GL_COMPILE on a Geforce256 DDR, and using a couple of loops and glbegin/end to read in some vertices for GL_TRIANGLES and my app is crawling along... I am feeding in a 361x361 array which is generating about 250'000 polys but I've always been told about the massive power of the GeForce based cards so what gives? Any suggestions would be welcome, I'm not using GLUT either I working through windows.



11-07-2000, 11:45 AM
I hope you are not suggesting that you are building the display list every frame. http://www.opengl.org/discussion_boards/ubb/tongue.gif

11-07-2000, 12:00 PM
Try breaking your model up into smaller "chunks" and multiple display lists. I've found that on the GF , if you use a model in a dlist which has lots of verts, it slows down a LOT. I ended up splitting the model into 4-5 dlists and now everything is dandy. Go figure! I try to stay around 2000 verts / dlist.

Good luck!

11-07-2000, 12:45 PM
Originally posted by paddy:
I'm not aware of how NVidia drivers are built but ...
Why not patch the driver so that when GL_COMPILE_AND_EXECUTE is used, the drivers simply performs a GL_COMPILE and a glCallList ? Is a way, this is compile then execute, behavior is the same and problems are solved http://www.opengl.org/discussion_boards/ubb/wink.gif

No, this is not sufficient... the behavior is different between the two in certain cases when commands are executed immediately rather than entered into the display list.

- Matt

01-18-2002, 01:21 PM
I have the same problem
the program just do:

glNewList(listnum, GL_COMPILE);
do a lot of triangle display
(about 50000 vertices)

I profiled it using CProfiler from codeguru
and found that on some machine with GeForce card the List compilation (glEndList) takes very long.
I tried it on several computers with MXs, GeForce 2 GTS, and GEForce 256

On some 2 computer with MX (one with W98 PIII 500, the other with W2000 on PIII 450) it only takes 5 - 6 seconds.
On an Athlon1100KT7W2000with Mx takes 26 sec.
On a PIII 500 NT4 GeForce2 GTS 90 sec.
On a PIII 733 W98 GeForce 256 114 sec.

if I turn of the acceleration of the card
(from display setting, choose the lowest
2 performance setting), it takes less than 0.01 sec. (probably no compilation)?
Also < 0.01 sec. on a voodoo 3 card.

I wonder if NVidia knows about this


01-18-2002, 07:01 PM
do what fresh suggests above + break the list up into smaller lots

01-18-2002, 07:31 PM
I am feeding in a 361x361 array which is generating about 250'000 polys but I've always been told about the massive power of the GeForce based cards so what gives? Any suggestions would be welcome, I'm not using GLUT either I working through windows.

Well, you're using a GeForce 256. Even using NV_vertex_array_range (the fastest way to send triangles to the card, faster than display lists) and even if they're untextured, you're probably not going to break 8 million polygons per second.

And, of course, as the others said, smaller display lists are better.

01-19-2002, 12:11 AM
Hey, this old thread just get revived from hell :-)

So, well, my pb got solved a long time ago in another thread. The pb wasn't in the DL compilation, nor in the way I used it (of course, the DL was compiled only once during the whole app life, and then reused...)
The pb was in glPolygonMode.
I used to do :
glPolygonMode(GL_FRONT, GL_FILL);
glPolygonMode(GL_BACK , GL_LINE);
This appears to be *slow* (yes, backface culling was enabled, so no 'line' at all were drawn actually).
As soon as I switched to glPolygonMode(GL_FRONT_AND_BACK, GL_FILL); (still with culling), I got a performance boost http://www.opengl.org/discussion_boards/ubb/confused.gif http://www.opengl.org/discussion_boards/ubb/eek.gif
This answer was given in another thread http://www.opengl.org/discussion_boards/ubb/smile.gif
You may have the same pb as I had.

01-21-2002, 01:09 PM
Thanks Amerio,
but that was not the problem, I did not use glPolygonMode. Even when I tried to use it
like what you suggested with these 3 command
glEnable( GL_CULL_FACE );
glCullFace( GL_BACK );
, it did not seems too speed up my case.
Can you point out which thread you refered to?
Also I wanted to post my comment on a newr thread (GeForce3 display lists compilation slower ? ), but after registering I did not check it carefully that I ended up posting it here, I found it out later but I did not want to make a double post. No answer on that thread either.

For zed,
yes I tried to break the model manually and
found out that if I only use a quarter of that size the list compilation time drops from
28 sec. to 1 second. So I know that if I chop them to several smaller list I will get acceptable result on this computer. It will still take some experiments to guaranty that it will be fast enough on other GeForce cards.

But I still believe that this is a problem with nvidia, because as it was mentioned in another thread it only affects nvidia cards.

Also it is hard to explain why on a slower computer with the same card it only takes less than 6 seconds (as compared to 28 seconds),
or how do we explain that on some other cards (GeF 2 GTS and the 256) it takes around 100 seconds.

Does anybody know if there is any discussion / article /knowledge base on what is causing this problem (related to NVidia based cards only as far as I know).