PDA

View Full Version : Bad polys per second...



ToolChest
01-22-2002, 11:27 AM
I am looking for something to compare my poly per second count to. The problem is that Iím getting only 2M textured shaded polys a second on a GeForce2 Pro with 64M ram. The card is rated at 25M polys. Now Iím not using display lists, but I am using tristrips. Is there a way to tell if the issue is bandwidth or poly count or what. I guess if there was a chart that said 25M card poly = 20M opengl polys and every vertex uses 100 bytes of bandwith, I would be able to tell if Iím doing something wrong or how to optimize from here. ThanksÖ.

John.

Lev
01-22-2002, 11:41 AM
Do you use vertex arrays? with GF2 you'll get maximum performance with VAR.

-Lev

ToolChest
01-22-2002, 11:46 AM
No just calls to glVertex, glColor, and glTexCoord.

ToolChest
01-22-2002, 12:28 PM
Lets pretend like I was going to use arrays, I would have to use tris and not tristrips right? Also my levels are like 1M+ polys thats nearly 200M of data to load. Is it faster to allow opengl to swap 100 or so large (2M) arrays or to pump data across the agp bus with calls to glVertex etc? I already have the data in the game partitioned in a way that would allow me to do this with some work. But how long will it take to load the arrays? I experimented with arrays a while ago and I remember the load was a little slow. When opengl swaps from video to regular memory will it swap out the least frequently used object (texture, array, etc). Iíll be honest, I donít like not being in control of memory management. Let me know what you think. ThanksÖ.

John.

Nutty
01-22-2002, 01:12 PM
You can still use tri-strips with vertex arrays.

If you're pumping soo many polygons across using immediate mode commands, it's not surprising you're not hitting anywhere near top performance.

If you use standard vertex arrays, the driver has to copy vertex data over to AGP memory every frame. If you use VAR, you can manually control the copying of data over to AGP. Given that you say you have 200 meg of vertex data, your situation is absolutely perfect for using fences also.

You can use fences (an extension to the VAR extension thing) to know at which point it is safe to start uploading the next part of the vertex array to AGP memory, while the GPU is still rendering..

You will have to switch to vertex arrays if you want better performance. Immediate mode commands for such huge volumes of data are slow.

Nutty

ToolChest
01-22-2002, 01:28 PM
Nutty do you mean NV_vertex_array_range when you say VAR? I was hoping to stay generic.

John.

ToolChest
01-22-2002, 02:02 PM
Also how do vertext arrays improve performance? it looks like I would still need to call glBegin and glEnd with glArrayElementEXT in between. If the array is still in system memory how does this help? Thanks...

John.

Lev
01-22-2002, 02:16 PM
With vertex arrays you have no function calling overhead, 10000 calls to glVertex, glNormal, ...
are replaced with 1 call to glDrawElements.

-Lev

Ysaneya
01-23-2002, 12:07 AM
> do you mean NV_vertex_array_range when you say VAR? I was hoping to stay generic.

The golden rule of graphics programming is: the more specialized you make it, the faster it runs.

If you want to stay generic, you won't achieve the same performance.

Y.

ToolChest
01-23-2002, 09:45 AM
Ok last night I tried a few things. I setup a test app that shutoff all of the OpenGL states I set the color to white and then built 3 rendering methods. One method did straight glVertex calls for 100k polys. One method would execute 1000 display lists each with glVertex calls for 100 polys (I did 1000 smaller lists because I had read a few posts on OpenGL.org about performance issues with GeForce and big lists) and finally one method used a vertex array. I built the display lists and arrays on startup, so for the vertex arrays I had only one call in the render method (glDrawArrays). Also I setup a simple loop method that would increment a frame counter then every second come along and pickup the value and reset the counter to 0.

Ok the results, test 1 straight calls to glVertex: 4 fps, test 2 display lists: 4 fps, test 3 vertex arrays: 4 fps. If this is a bandwidth issue, how is it occurring? 100k polys/frame = 300k verts/frame = (assuming only data put into OpenGL is sent to card (so glVertex3f = (sizeof(float) * 3) = 12)) 3.6mB/frame = 14.4mB/second. AGP bandwidth is like 256mB/second and the card (and mother board) are AGPx4. Lets assume my card is running only AGP and lets use my poly results, to fill the bandwidth the per vertex data would need to be 213 bytes.

Iíll be the first to admit that I know nothing about how the hardware functions, but in order to take full advantage of OpenGL I realize I need to know more.

John.

ToolChest
01-24-2002, 06:25 AM
Does anyone have any ideas on my last post? Does anyone have any demos that just push large numbers of polys out to the card, maybe Iím not disabling some state thatís slowing everything down? A demo that shows a few different methods of pushing polys would be great, then I would have something to compare my results with.

ThanksÖ

John.

harsman
01-24-2002, 06:49 AM
Try here (http://developer.nvidia.com/view.asp?IO=vardemo) .

Lev
01-24-2002, 06:55 AM
Take a look at NVIDIA VAR demo.

With your large number of vertices you prevent the use of any cache. Benchmark it with smaller models (40k triangles). And believe us: use VAR if you want more speed, you won't reach the speed of VAR with anything else. (though DLs with 3.5k triangles seem to run almost as fast as VAR on my PC)

-Lev

ToolChest
01-24-2002, 07:25 AM
Thanks, I'll try that out tonight.

In anyoneís professional opinion what is a really good poly per second count using only generic OpenGL calls with all states disabled (buy the way I have glPolygonMode set to GL_FILL on front and back) on say 800M P3 with a GeForce2 Pro AGPx4 (card and mother board).

Thanks...

John.

Lev
01-24-2002, 07:42 AM
The VAR demo linked above reaches 4.7 Mtris on my GF2 MX 400 with duron 800 (it uses fences even without VAR). My older app was reaching 17 Mtris with VAR and 6 without, same machine (with completely static geometry which fitted in AGP memory)

BTW why are using GL_FILL on backface polys?

-Lev



[This message has been edited by Lev (edited 01-24-2002).]

ToolChest
01-24-2002, 08:18 AM
My last card was a Voodoo3 2000, when you set the polygon mode to GL_NONE it defaults to GL_LINE and even though the back faces were culled the app would run really slow. When I tried setting front and back to fill the frame rate skyrocketed.

ToolChest
01-24-2002, 08:56 AM
Lev when you say you got 6M tris, are those tris or tri-strips? also when you say fit into AGP mem do you mean video mem, like in a display list?

Thanks...

John.

Lev
01-24-2002, 09:09 AM
I mean 6 M tris. I wasn't using tristrips. By saying AGP memory I mean AGP memory, not video memory. BTW why do you assume DLs are stored in video memory? They're no with NVIDIA drivers AFIAK. DLs take up *much* memory, they wouldn't fit into video memory.

-Lev

ToolChest
01-24-2002, 09:36 AM
I dont understand the AGP memory / video memory. I thought that video memory is of the video card, where is AGP mem located?

ToolChest
01-24-2002, 09:38 AM
Actually if anyone has a link to a good AGP doc that explains all of this that would be great. Also an OpenGL doc explaining bandwidth considerations would be excellent too.

ThanksÖ

John.

Lev
01-24-2002, 09:57 AM
Video memory is the memory residing on the video card. Writing to it and reading from it is very slow. AGP memory is a part of normal system, memory, but it is uncached and can be directly accessed by the gfxcard via direct memory access (aka DMA). Reading from AGP memory is slow, because it is uncached, but if you write sequentially its not that slow.

This all applies if you have an AGP gfx card, I don't know how things work for an integrated grappics chip a la nForce.

In case of generic vertex arrays the CPU must copy the arrays from the system memory to the GFX card - this is slow, and depends on CPU speed. With vertex array range extension CPU is freed - the GPU pulles the data via DMA while the CPU can do some other work - thats what VAR basically does.

Hope this helps,
-Lev

ToolChest
01-24-2002, 10:14 AM
Thanks, that does make a lot of sense. I also just found the Intel agp spec online. I had no idea that all of my data (display list, textures, etc.) could be kept in system memory. I always thought that was a last resort thingÖ

Anyway I would be happy to be getting your generic OpenGL 6M polys a second Iím getting like 400k. Iím going to checkout that nvidia demo, Iím going to update my drivers (poke and hope), and Iíll play around a little more with the test app I made.

I really appreciate the help from everyone thats posted. Iíll let you know how it goes tomorrow.

Thanks guysÖ

John.

OldMan
01-24-2002, 02:30 PM
Just for curiositynow we know that VAR is the fastest way in NVIDIA cards.. and about ATI cards..what can I do that improves specificaly the speed in these cards (radeon , radon 7500m 8500)? I have both (7500 and a GF2 Pro) and I wold like to know how can I get the maximum from each one. I read everywhere about optimizations for NVIDIA cards..but nothing for ATI ones.

harsman
01-25-2002, 01:00 AM
Try the ATI_vertex_array_object extension here (http://oss.sgi.com/projects/ogl-sample/registry/ATI/vertex_array_object.txt) . I think there are more readable explanations of it on the ATI devrel site as well.

ToolChest
01-25-2002, 06:05 AM
Good morning,

Lev I forgot to take the Nvidia demo home last night, but I will look at it this weekend. Also, I found the problems.

First the problem I created:
When I tested the poly count I was too lazy to build a method that would build a square mesh. Instead the method I whipped up gave all of the polys the same cords, a rather large poly I might add. It didnít occur to me until I was driving home last night that OpenGL was over sampling an 800x600 window like 100 times a frame. This could be a performance hit! So I changed the method to create a square mesh scaled the mesh to fit in the window and wham! 17 frames a second with 100k polys. I feel confident that the issue is bandwidth now, because if I increased or decreased the number of polys the poly per second count was still solid. So in bandwidth terms (correct me if Iím wrong) I was getting 5.1M verts a second (100k polys x 3 verts per poly x 17 frames) period. It was up to me to decide how many would go into each frame.

Second the problem Iím trying to fix:
Why only 1.7M polys? Lev gets 6M. I did a little tinkering and after a few frustration hours I decided to disable AGPx4 in the bios to see if get a performance hit. When I get into the bios I find AGPx4 disabled already. I turn it on addÖ Nothing, the same frame rate, my card supports it and my motherboard supports it, hmmÖ I turn on fast writes (how this would help moving data to the card I donít know, but why not at this point), nothing. To make a long story short I installed Nvidia drivers yesterday, but I going to have to research this mother board thing because in theory 1.7 x 4 = 6.8 thatís the number Iím looking for.

I know it sounds like Iím wrapping up this post, but I do have one more question. I now understand the video memory, AGP memory, and system memory architecture. My question is: VARs - when the array is larger that the video card can store, does the card send the whole array to AGP memory or does it swap out the pieces that wonít fit? And: VARs Ė You canít have more that one VAR can you?

Again I would like to thank everyone for their helpÖ.

John.

Lev
01-25-2002, 08:12 AM
With VAR you manage the your arrays and the memory yourself so you decide if the array goes to video mem or AGP mem and you are responsible for reusing memory if all of your arrays don't fit into memory.

-Lev

ToolChest
01-28-2002, 05:58 AM
All right, this week is looking better already! I fixed the AGPx4 problem it was related to my motherboards VIA chipset and the GeForce2. I posted the fix in Coding Advanced under the title íVIA chipset may not be running AGPx4...í. Thanks for the help.

John.