Bad polys per second...

I am looking for something to compare my poly per second count to. The problem is that I’m getting only 2M textured shaded polys a second on a GeForce2 Pro with 64M ram. The card is rated at 25M polys. Now I’m not using display lists, but I am using tristrips. Is there a way to tell if the issue is bandwidth or poly count or what. I guess if there was a chart that said 25M card poly = 20M opengl polys and every vertex uses 100 bytes of bandwith, I would be able to tell if I’m doing something wrong or how to optimize from here. Thanks….

John.

Do you use vertex arrays? with GF2 you’ll get maximum performance with VAR.

-Lev

No just calls to glVertex, glColor, and glTexCoord.

Lets pretend like I was going to use arrays, I would have to use tris and not tristrips right? Also my levels are like 1M+ polys thats nearly 200M of data to load. Is it faster to allow opengl to swap 100 or so large (2M) arrays or to pump data across the agp bus with calls to glVertex etc? I already have the data in the game partitioned in a way that would allow me to do this with some work. But how long will it take to load the arrays? I experimented with arrays a while ago and I remember the load was a little slow. When opengl swaps from video to regular memory will it swap out the least frequently used object (texture, array, etc). I’ll be honest, I don’t like not being in control of memory management. Let me know what you think. Thanks….

John.

You can still use tri-strips with vertex arrays.

If you’re pumping soo many polygons across using immediate mode commands, it’s not surprising you’re not hitting anywhere near top performance.

If you use standard vertex arrays, the driver has to copy vertex data over to AGP memory every frame. If you use VAR, you can manually control the copying of data over to AGP. Given that you say you have 200 meg of vertex data, your situation is absolutely perfect for using fences also.

You can use fences (an extension to the VAR extension thing) to know at which point it is safe to start uploading the next part of the vertex array to AGP memory, while the GPU is still rendering…

You will have to switch to vertex arrays if you want better performance. Immediate mode commands for such huge volumes of data are slow.

Nutty

Nutty do you mean NV_vertex_array_range when you say VAR? I was hoping to stay generic.

John.

Also how do vertext arrays improve performance? it looks like I would still need to call glBegin and glEnd with glArrayElementEXT in between. If the array is still in system memory how does this help? Thanks…

John.

With vertex arrays you have no function calling overhead, 10000 calls to glVertex, glNormal, …
are replaced with 1 call to glDrawElements.

-Lev

> do you mean NV_vertex_array_range when you say VAR? I was hoping to stay generic.

The golden rule of graphics programming is: the more specialized you make it, the faster it runs.

If you want to stay generic, you won’t achieve the same performance.

Y.

Ok last night I tried a few things. I setup a test app that shutoff all of the OpenGL states I set the color to white and then built 3 rendering methods. One method did straight glVertex calls for 100k polys. One method would execute 1000 display lists each with glVertex calls for 100 polys (I did 1000 smaller lists because I had read a few posts on OpenGL.org about performance issues with GeForce and big lists) and finally one method used a vertex array. I built the display lists and arrays on startup, so for the vertex arrays I had only one call in the render method (glDrawArrays). Also I setup a simple loop method that would increment a frame counter then every second come along and pickup the value and reset the counter to 0.

Ok the results, test 1 straight calls to glVertex: 4 fps, test 2 display lists: 4 fps, test 3 vertex arrays: 4 fps. If this is a bandwidth issue, how is it occurring? 100k polys/frame = 300k verts/frame = (assuming only data put into OpenGL is sent to card (so glVertex3f = (sizeof(float) * 3) = 12)) 3.6mB/frame = 14.4mB/second. AGP bandwidth is like 256mB/second and the card (and mother board) are AGPx4. Lets assume my card is running only AGP and lets use my poly results, to fill the bandwidth the per vertex data would need to be 213 bytes.

I’ll be the first to admit that I know nothing about how the hardware functions, but in order to take full advantage of OpenGL I realize I need to know more.

John.

Does anyone have any ideas on my last post? Does anyone have any demos that just push large numbers of polys out to the card, maybe I’m not disabling some state that’s slowing everything down? A demo that shows a few different methods of pushing polys would be great, then I would have something to compare my results with.

Thanks…

John.

Try here .

Take a look at NVIDIA VAR demo.

With your large number of vertices you prevent the use of any cache. Benchmark it with smaller models (40k triangles). And believe us: use VAR if you want more speed, you won’t reach the speed of VAR with anything else. (though DLs with 3.5k triangles seem to run almost as fast as VAR on my PC)

-Lev

Thanks, I’ll try that out tonight.

In anyone’s professional opinion what is a really good poly per second count using only generic OpenGL calls with all states disabled (buy the way I have glPolygonMode set to GL_FILL on front and back) on say 800M P3 with a GeForce2 Pro AGPx4 (card and mother board).

Thanks…

John.

The VAR demo linked above reaches 4.7 Mtris on my GF2 MX 400 with duron 800 (it uses fences even without VAR). My older app was reaching 17 Mtris with VAR and 6 without, same machine (with completely static geometry which fitted in AGP memory)

BTW why are using GL_FILL on backface polys?

-Lev

[This message has been edited by Lev (edited 01-24-2002).]

My last card was a Voodoo3 2000, when you set the polygon mode to GL_NONE it defaults to GL_LINE and even though the back faces were culled the app would run really slow. When I tried setting front and back to fill the frame rate skyrocketed.

Lev when you say you got 6M tris, are those tris or tri-strips? also when you say fit into AGP mem do you mean video mem, like in a display list?

Thanks…

John.

I mean 6 M tris. I wasn’t using tristrips. By saying AGP memory I mean AGP memory, not video memory. BTW why do you assume DLs are stored in video memory? They’re no with NVIDIA drivers AFIAK. DLs take up much memory, they wouldn’t fit into video memory.

-Lev

I dont understand the AGP memory / video memory. I thought that video memory is of the video card, where is AGP mem located?

Actually if anyone has a link to a good AGP doc that explains all of this that would be great. Also an OpenGL doc explaining bandwidth considerations would be excellent too.

Thanks…

John.