win2K and slow opengl?!?

Hi,

I just setup my pc for dual boot 98 and win2K. I have been working on an opengl app in 98 for a while and when I run the same app in win2K the frame rate sinks a lot. Being dual boot the hardware is the same, I sate the video resolutions (and depth) the same for both os’s, I installed the latest nvidia drivers on both (I’m using GF2 Pro), and I checked to make sure I was getting the same pixel format on both. I ran some tests and here is the results:

Win 98 (Full pipeline 115K poly): 14 f/s
Win 2K (Full pipeline 115K poly): 8 f/s

Then I tried removing my call to glDrawArrays to see if the code leading up to the rendering was slow here are the results:

Win 98 (No render 115K poly): 149 f/s
Win 2K (No render 115K poly): 166 f/s

Thats strange… I’m using the same compiled exe and the same level files, so I wouldn’t think this could be a caching issue (both wouls see it, right?).

Has anyone seen anything like this before? Any ideas?

Thanks…

John.

How’s your AGP drivers on W2k?

We usually find the same or better performance on Win2k than on 16-bit Windows. Of course, that could be due to overlap with other parts of our program.

Well, I had the same problem here once anytime too. I personally count Win2k as more stable but slower system… Win98 and it’s hybrids exactly the other way round, relative fast, but crashing.

It can be very many things… majorly the memory management is as far as I know a bit slower on Win2k… better, but slower. The drivers could for sure also be one of the reasons… but as till now everybody who runs both systems told me that on Win2k all was a bit slower… I don’t think that OpenGL nor your program are responsible for this.

BlackJack

BlackJack,

I think your correct, I ran a few nvidia demos on both os’s and 98 was always faster. Oh well…

I hate to sound paranoid, but is this some kind of Microsoft conspiracy?

Thanks…

John.

Hehe, well, I don’t think that Microsoft made Win2k wanted slower. But it’s kernel is simply so different that it can raise these 10% difference easily. After all Win NT’s / Win2k’s intention wasn’t to play games on it, but to work with or to use it as server. I think you know, what I try to say.

You should port it to Linux and check your FPS there . It will for sure take a week for you till you have everything set up, but your FPS should be even a big bunch higher than on Win98 .

The OS makes the difference , you really shouldn’t worry about Win2k… people are used to that it’s a lil slower on that OS, hehe. But if it’s also far slower on WinXP… you should begin to worry as it… unfortunately… will surely become standard more and more.

BlackJack

john_at_kbs_is, both Win98 and Win2K look very slow. Are you doing anything fancy? Lots of spotlights, etc?

I mean

Win 98 (Full pipeline 115K poly): 14 f/s ~= 1.6 million polys per second.
Win 2K (Full pipeline 115K poly): 8 f/s ~= 900,000 polys per second.

I routinely get close to 20 million polys per second. [Single textured. 1 directional light. No quite so many polys per frame.]. With a Geforce2 GTS and PIII 750. I feel like I’m banging on the same drum again. But are you sure that you’re getting a hardware accelerated pixel format? Please see my posts on http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/006961.html (near the bottom)

Both your problems may in fact be the same. Follow the given advice and see what you find out. I would hope that you would achieve rendering that is an order of magnitude greater than what you have been getting thus far (unless you are doing lots of fancy stuff).

Regards,
Kevin

Hi everyone,

i wanted to add a question of mine about a recurrent problem i have and which may be similar.

Whevener my system (Win2K + GeForce2Pro) is too heavily loaded in 3d the GL applications launched crawl !

For example, if I have a 3dsMAX running with a big scene in it, if I launch my GL app, it crawls as if doing software rendering. If I close MAX and relaunch my app, everything is well !.. I tried testing the PFD to check if it is accelerated, and it keeps telling me it is…

any idea would be greatly welcome !
Thanks
Nicolas

kevinhoque,

Well I didn’t think I had a problem with 98. You say your getting 20M polys? Are you running VARs, lists, or arrays? Are you using tri-strips and on average how many tris are in one strip?

I hate comparing frame rates because its never an apples to apples comparison.

I will check my pixel format, I made sure that they were the same, but I never checked the generic flag.

By the way the results included 2 layers per poly, so the polys per second were like 1.8M and 3.2M. Also one of the layers is slower than the other. My vertex struct is 64 bytes and in the slow layer I access data from the second 32 bytes causing a second cache hit per vertex, but this will happen on both os’s.

Can you post the info on your app and I’ll check my pixel format.

Thanks…

John.

sorry,

on top of the questions I just asked:
what data are you sending to gl (coords, textcoords, color, normals)?

Thanks…

John.

John, here’s some info from an old opengl app that I just ran.

122368 (tri-striped) polys per frame compiled into a display list
22-23M polys per second
[I’ll let you work out the framerate]
1 256x256 texture only
1 directional light
1 normal per vertex
1 set of 2d texture coords per vertex
Don’t think that vertex colours are defined.
Not sure of the length of the strips. Sorry. Running on Win2k.

It’s basically a simple procedural landscape with some trees. Have got the latest nvidia drivers and vsync turned off (obviously!)

Maybe your 3.2M polys per second isn’t that far off if you’re not tri-stripping and got vsync turned on. Would expect (perhaps?) 1/3 performance if you’re just rendering discrete triangles and then you’ll take another hit for vsync…Would definitely make sure that you’re getting an accelerated pixel format though. 2 people this week have complained about poor opengl performance. I downloaded both their code and found that they were both rendering in software on one of my machines and hardware on the other. They had some uninitialised garbage values in the pixel format descriptor. Zeroing the structure before placing values into it fixed that.

Have you tried using glGetString? It will tell you if whether you’re running in software or hardware.

Another thing. I have always found that opengl runs slightly faster on NT and 2K than 98. Direct3D is the exact opposite though - ever so slightly slower on 2K than 98. Just my $0.02.

Please keep us posted…
Kevin

Nicolas Lelong: It is so slow, because YOU HAVE SOFTWARE RENDERING then. You can only run one application using the 3D card at once. The second one gets software rendering automatically. There are some cards… for professionals… which can host up to 5 and more applications at once, but a consumer card… and I think that’s what you are using… only one.

BlackJack

vsync, hmmm… That I’ve never checked. I do use tri strips, but I average only about 5 tri per strip (still cuts verts down by 50%). Also if I run only the fast layer I get 33 frames (single layer) thats like 3.8M.

oh as for the glString I do that when the app fires up. It always returns nvidia info and ext’s. If I’m running software will it return Microsoft info?

I need to test the app some more when I get home tonight…

Thanks…

John.

@ BlackJack:

Come on, that can’t be true. I just tried because I was too sure you must be wrong. I ran the same app twice and the framerate was just about half of the framerate I get when there’s just one app running. And running gmax (in opengl mode) in the background doesn’t change the framerate at all. Where do you have these information from?

ok,

did some checking and here is what I got:

-VSync was off on both os’s.
-Pixel format is cool and is not returning the generic format.

ok I’ve been using this cool little app called wcpuid.exe to check my system AGP stats. It tells me both os’s on agp 2x. I have Via apollo chipset, the 4x drivers never worked), but I try the new drivers on win2k and pow! Agp 4x! Ok now the app is running the same as in 98. So I install the new drivers in 98 and pow! Wcpuid tells me its now running 4x, but wait the app aren’t any faster. I think the goofy old drivers were running 4x, but reporting 2x?

Now both os’s are running the same even with the nvidia demos!

The question: is the 3.2M polys I getting now good? I’m using vertex arrays with DrawArrays (not indexed), for layer one I’m sending coords (3 x float) and text coords (2 x float), that’s 20 bytes total and for layer 2 I’m sending the same plus color (3 x float). For both layers that’s 52 bytes. The data is always sent from system memory and I’m only running PC133 memory.

Kevinhoque, I don’t know much about compiled display list, is it possible that some of the data is stored in video memory? Are you using indexed arrays?

I think at this point I’ve maxed out my memory bus…

Let me know what you guys think…

Thanks for the help…

John.

Blackjack,

I was aware that software rendering is hidden somewhere - but the problem does not occur every time I have 2 3d apps launched. It only occurs when one (or+) uses a quite considerable amount of geometry (textures?).

I agree that there must be some kind of limitation somewhere. In fact, I wanted to know which limit it is. Obviously, it is not (only) the number of applications.

Nicolas.

If i understand it well, you are not using display lists, no VAR (Nvidia) or VAO (ATI) extension. Just plain OpenGL vertex arrays. And you get 3.2 MTris/sec ? Sounds perfectly normal to me. If you think about it, the driver has to copy and transfer all your data to the video card every frame, what do you expect ? Use display lists, or vendor specific extensions to speed your rendering up, and you’ll easily multiply by 3 or 4 your framerate.

Y.

ya, the number I got last night was:

(polys * polys_to_vert_tristrip_ratio * 52bytes (my data) * frames) = min_data_across_mem_bus;

(115200 * 1.5 * 52 * 14) = 125.7984M bytes;

and I’m only running PC133 not bad.

I tried using VARs and they are very useful, however the amount of data that I have won’t fit into video mem and trying to copy vertex data to the buffer every so often is really slow. I’ve considered placing a subset of all the objects into video mem, but at any given point in time only maybe 2% of the data will get used in the current frame. Plus using 16M of video mem will limit the amount of mem the card can use for caching textures. I’d hate to pull the same texture across the agp bus twice in the same frame.

Stupid question: agp mem is a chunk (or chunks) of my system mem, but the agp bus bypasses the CPU, right? So, storing data in agp mem will still be limited to the PC133 speed, right? Just making sure I understand what’s going on. There is nothing more dangerous in this field than having the wrong idea on how you hardware works…

Thanks…

John.

John, most definitely my geometry is being stored either on the card (likely) or in agp mem.

Can’t you use NV_vertex_array_range and NV_fence if you are using a great deal of memory? What about allocating your vertex arrays in agp mem? Surely you’ll have more of this than video ram? Not sure if there are any constraints here. Have not used VAR myself (yet) although have used the equivalents in d3d. As to how agp mem works - I don’t know. But here’s a link that might help
http://www.intel.com/technology/agp/tutorial/

According to some nvidia docs that are kicking about, agp mem can very often be considered just as fast as video ram. [Although this may be only on systems with fast ddr or rambus ram - something that they don’t mention so more than likely.] But as Ysaneya says, if you’re just using vanilla vertex arrays then 3.2m polys s-1 is probably about right…

Kevin

sometimes, drivers are very important. i had some hangs on my app, and changing the nvidia drivers make these hangs disappear…

Cool, well I think I’m going to try and implement the VARs.

Thanks…

John.