Quirks with display lists on geforce2?

Ok, here’s the setup. 500mhz p3 with Geforce 2 GTS board. Latest drivers (6.31?) installed.

The code renders 3300 strips, resulting in a total of 13k polygons drawn.

Pseudo code like this:
glNewList(listID,GL_COMPILE);
for (s32 nStrip=0;nStrip<nStrips;nStrip++)
{
//putting the glcolor call here = 200fps
glBegin(GL_TRIANGLE_STRIP);
glColor3f(r,g,b); //putting it here = 600fps
for (s32 v=0;v<pStripLengths[nStrip];v++)
{
glVertex3f(x,y,z);
}
glEnd()
}
glEndList();

Now, look at that code, it generates a display list, and it draws properly…but look where I put the glColor3f() call. If I put it there, it runs at nearly 600fps. If I move the glColor3f call outside the glBegin/glEnd pair it’ll slow down to 200FPS!

OUCH!

Anybody else think thats wierd? Am I just on crack here and am overlooking something obvious? Figured I’d pass the word along since a 3x speed increase is a big deal to most people…and some people may be doing this without knowing it!

Jon

This is something you’ll probably see on other drivers too. If we see something change inside a Begin/End, we will flag it as something that we have to track per-vertex rather than leaving it constant.

Therefore, you should always move these kinds of “initialization” per-vertex state commands outside of Begin/End.

  • Matt

Wait a sec, I reread the message. It looks like your results are backwards from what I’d expect. Are you sure there’s not some confusion?

  • Matt

Okay, I have a theory on why this might be happening.

When the Color is inside Begin/End, we probably are coalescing all of the triangle strips together into one large mesh…

Is there a lot of vertex reuse in this mesh?

  • Matt

No confusion…its very wierd that its so slow. The odd thing is, if I take out the glColor3f all together its like 1200fps…why the heck does calling glColor3f 350 times in a 13k poly mesh hurt so much? OUCH.

I dont know the actual verts per tri that are being used, but I’d guess its around 1.4. I’m using stripe and the mesh seems to strip fairly well.

I’m sure this has been asked before, but whats the fastest way to render a 13k polygon static mesh? About a year ago I tried the vertex array range and wierd extensions to alloc memory on the card on a geforce 1. To I still need to manually do that, or does creating a display list take care of that for me?

If i can get this rendering speed sorted out I’ll have a killer hardware based PVS generator.

Jon

I could debug it, but that would involve a fair amount of work.

VAR gives you the most control, always. When you use display lists, we’ll try to make it fast, but display lists are too generic for us to know exactly how to optimize every single case. This is one such case – even in such a simple program, given the API stream, it would be hard for us to tell whether we should try to merge the tstrips together into a single indexed mesh, or whether the color per vertex would cause problems for that, and so on. It’s an optimization problem that can get very complicated very quickly.

  • Matt

Cool…I guess I’ll recode my static mesh class using the VAR extension. I was disheartened to find that when I went from a TNT2 to a Geforce 2 my display list based static mesh class only went from 108 to 150fps.

Jon