PDA

View Full Version : Performance question



amerio
12-03-2000, 09:08 AM
I'get quite confused with my pb of performance.
So, to sump up:
-Win2000 + win AGP fix
-VIA chipset + latest AGP fix
-128MbRAM
-GeForce2/64Mb, Detonator 6.31 (latest)
-pure Win32 application
-using PerformanceCounter for fps
-disabled VSync

Okay, onto the performance :
-640x480x32 + stencil buffer (tried without)

geometry = 8000 vertices, dispatched in a dozen display list, so let'say 10 DL, each with 1000 vertex arranged in 10 triangle strips of 100 vertices.
My striping code achieve a effiency of 1.25/3
(ie for a 300 vertices mesh, it will make
a 125 vertices triangle strips : quite good! not fast, but efficient).
Each DL is GL_COMPILE.
No abusive state change (at all!)
No stack overflow or abuse (checked).
Each mesh is drawn using N small display list
(around 500 vertex max), imbedded in
another DL (hierarchical DL).
Each inner DL is made with glDrawElements(GL_TRIANGLE_STRIP).

1 light, no texturing at all.

Problem is : 25 fps ! ONLY ?????
in 640x480? (tried fullscreen and windowed)
Without DL (direct call to glDrawElements
with corresponding glVertexPointer each time)
-> 20fps.

My question is :
-Is it normal to get such a loooow framerate on such a card ?

Your help may be :
-a pointer to some src that achieve 60fps
with 10000 vertex in 10 objects...

(I know some of you can do this :-> )

Michael Steinberg
12-03-2000, 10:49 AM
Are you sure your frame rate counter works correct? Do you perhaps call glFinish() or glFlush()?
I get that framerate with an at least 10.000 poly model, that is stripped on a TNT, with lighting turned on. What lightmode do you use? Perhaps spotlights?

zed
12-03-2000, 11:03 AM
have a look at the balls! program from the nvidia site

amerio
12-05-2000, 10:58 AM
I tweacked my code in all the way I could think of.
No glFinish nor glFlush at all.
No lighting at all : glDisable(GL_LIGHTING) -> fps x2 (no more)
So Lighting isn't the bottleneck.
(I use both local light and infinite light)
No Texturing at all (so this is not an issue)
FPS counter is exact ! (checked so many times!...)
Geometry setup may not be the issue : I tried without all the glRotates, glTranslate, etc.. : negligeable fps gain.
I looked at all the demo src I could get a grasp on. None help ! (I mean, either they are showing simple geometry, ie spheres or cubes..., or the NVidia code is using wglAllocateMemoryNV : I CANNOT USE THIS, as I want a portable code)
I'm sure not to send twice the geometry or this kind of error (checked even with printf!).
Not filllimited (a 64x64 window runs at the same speed as a 512x512)
Tried it onto other cards : same fps, so this is not a hardware issue.

Maybe I could send part of my code to some of you, who may wish to help ?

Or, do you know of a freeware that could "track" this kind of bottleneck ?

Thanks for you help !

memo
12-07-2000, 12:15 AM
Hmmm. This is strange. But maybe this helps: are you using a perspective projection setup with gluPerspective ? If yes, with what angle ?

From my observations (using almost the same harware as you), if you either use very large angles or strange values for the angle, the performance can drop.

Be carefull, if you want an angle of x degrees, you have to pass x/2 as parameter to gluPerspective.

Michael Steinberg
12-07-2000, 04:29 AM
No, you won't have to pass an value of x/2 degrees. I never saw I'd see 180 degrees...

Michael Steinberg
12-07-2000, 04:30 AM
Oh, amerio, I could have a look at it.
I want NO texturing NO stencil NO LIGHTING
that makes it always so complicated

th-steinberg@t-online.de

memo
12-07-2000, 04:35 AM
> No, you won't have to pass an value of x/2 degrees. I never saw I'd see 180 degrees...

Don't get me wrong on that: if you want a view angle of 90 deg, pass 45. If you pass 90, you'll get 180 (the image will be distorted). Just try it.

amerio
12-07-2000, 09:15 AM
I do use gluPerspective(60.0, blahblah...)
Not an issue, I guess http://www.opengl.org/discussion_boards/ubb/smile.gif

I'll send part of my code to who can help !
(well, the full app is quite too big
to be posted :> )

Thanks for your efforts. Thanks !

Bob
12-07-2000, 10:54 AM
>>if you want a view angle of 90 deg, pass 45. If you pass 90, you'll get 180

I pass 90 degrees and get 90 degrees. You sure you're not using some obscure version, or thinking about something else?

Gorg
12-07-2000, 11:05 AM
memo is right. gluPerspective takes half the field of view.

martin_marinov
12-08-2000, 01:03 AM
amerio, what I think, is that not the video card or OpenGL, but the computing of the geometry of the scene is at bottleneck in your application. You point, that the framerate is not dependent for example of the window dimensions. This sounds like pure processor overload...
You may check it - for example if u r using MS VC++, u can turn profiling on... (use function timing setting). Then u can see what function exactly gets your fps down.
U may also make a function:
void DrawElements(...) { glDrawElements(...); }
and to replace all glDrawElements calls to this DrawElements - this will help you to devide the computational code from the OpenGL calls. this will also slow down the performance, but it's only for the profiling - later u can make the function macro, or define it inline.
Try to make such a function for every OpenGL API function, which u call and u think that it is slowing your programm. After that run the profiler and check where is the fps flow http://www.opengl.org/discussion_boards/ubb/smile.gif
I'm almost sure, it will be the code of pure computations...

amerio
12-09-2000, 07:54 AM
You say the "computation" of the geometry might be the bottleneck, but my geometry is absolutely static ! Nothing is being morphed, or deformed. Each and every object
is in its own DL. Then I simply call all
DL... (remember, they all are made with GL_COMPILE, and calling some glDrawElements).
So tracking the glDrawElements make no sense to me, as it is called only once: when I build up the DL...
After that, there is only some PushMatrix/Pop (one per object), and that's all !
Maybe tracking the glCallList ?

Sounds like T&L is disabled, but I believe that it is always on with OpenGL...
What gives ?

Oh, about my src code : I'm trying to isolate some part of it to track down that bottleneck... (I'm not working on some kind
of demo : it's a "wannabe" professional app...)

Do you know of some demo w/ src that DOES NOT use wglAllocateMemoryNV ?
Thanks...

mcraighead
12-09-2000, 08:56 AM
The Steve's Balls (or SphereMark) demo supports vertex array range but does not require it. In fact, you should easily be able to get tens of millions of triangles per second with that demo in display list mode.

- Matt

amerio
12-10-2000, 04:08 AM
Oh I'm going crazy ! http://www.opengl.org/discussion_boards/ubb/frown.gif
Just wrote a simple test app, that draws a single strip with 20000 triangles in it.
No lighting, no textures, no nothing special ! Just the minimal app.

Guess what : 25fps.

Using DL, glDrawElements or begin/end doesn't change an inch !

What on earth am'I doing wrong ?????
I dig into the spheremark, but did not see
any super-cheat-tip-godmode !

I can send src and exe (under 49ko) to anybody.

M.Steinberg : I posted it to you, as you proposed your help earlier...

Sylvain
12-10-2000, 09:27 AM
Sorry Matt, but i am using opengl under NT4.0
and there is a BIG difference with performances when i am using VAR and display lists!!! could you please make a version of the balls source code available using the display lists?? i am really interested??
In fact with VAR my gain is x2!! i did not use glFence all my data are stored onboard forever and could not be modified in any way?? Thus, i wonder how the display lists mechanism could be faster than that?? moreover why Nvidia has done VAR if display lists can do the job?

mcraighead
12-10-2000, 12:51 PM
The code is on our website.
http://www.nvidia.com/Marketing/Developer/DevRel.nsf/pages/A5DF320EED5ABA388825684200084EEC

I'm not sure if that is the latest version, but it probably has dlist support...

VAR is always going to be fastest, because it's direct access to the HW. You should be able to do equally well with display lists in certain cases, though.

Again, you should be able to get >10 Mtris/sec with that app's display list mode easily.

It's not too difficult to get the theoretical-maximum numbers, either. I know I was able to get 30.some Mtris/sec with a GF2 Ultra -- just turn off lighting and shrink the window. Not sure if I got that with dlist mode or VAR mode.

- Matt

rafaeldelrey
12-10-2000, 06:35 PM
Amerio,

Are you using the latest OpenGl optimized drivers for your 3D card. I saw a similar problem in a dual boot machine (W98SE / W2000).
When using W2000 the FPS of most applications were quite smaller than that attained when running the same apps at W98.
Most 3D Cards Drivers are not optmized for W2000. Maybe you dont even have hardware acceleration at all when running on W2000.

Good luck,

Rafael Del Rey

amerio
12-11-2000, 02:02 AM
I use the latests Detonator 6.31.
Proc is a AMD 1Ghz. Test under PIII show the same speed. So not a proc issue.
Ran it both under W98/W2000 : same speed
(almost).
Ran it with other cards (ATI) : slower than ever.
And without HW acceleration, I can't get even 1 fps.
All the NV demos runs smoooooth ! So yes, there is HW....
Am I doomed, or something ?
Want to see the code and exe ?

Theo
12-11-2000, 11:54 AM
This isn't going to solve your problem at all, but perhaps you could get a slight more speed up by using glcalllists instead of glcalllist for every single DL?

Sansus
12-11-2000, 12:34 PM
Amerio, por tu nick deduzco que eres español. Bueno, si no me equivoco, te hablo en español pues se me da mejor que el inglés.
A ver.. a mi me pasa algo parecido a ti:
Mi programa es el siguiente:
3000 caras texturadas y con una luz.
Bueno, pues este me va lentísimo en un ordenador que lo flipas:
Un ordenador con 2 procesadores Pentium II Xeon ambos a 450 Mhz con una aceleradora profesional compatible opengl y 3Dlabs con 32 megas de video ram, y medio giga de RAM. Pues con ¡do lo ejecuto en esta makina me va lentiiiiiisimo 30 fps, a lo mejor te pasa lo mismo ke a mi: ke hay ke optimizar un huevo el codigo: ensamblador, mmx, kni...
Bueno pues si haces ke te vaya mejor porfavor escriibeme vale? yo hare lo mismo..
vale graciaaas...
Suso.

amerio
12-11-2000, 11:24 PM
Sorry, but I'm french, and 'amerio' is italian ("parlo un poco italiano") http://www.opengl.org/discussion_boards/ubb/wink.gif
But from what I understand of your post, you've got quite a similar pb. You can directly mail me if you want...

Once again, I'd appreciate if anybody would have a look to the small test app I wrote (200 lines), which draw A SINGLE STRIP with 20000 vertices in a DL(GL_COMPILE) and that hardly reaches 25fps on a GeF2/64Mb.
I MUST be doing sth wrong. But what ??
All srcs I've seen didn't help. I'm not jocking. And I'm getting upset by this pb. http://www.opengl.org/discussion_boards/ubb/frown.gif

Thanks...

martin_marinov
12-14-2000, 05:24 AM
amerio, if u wish, please send me your sample app - I would like to test it on my PC and to profile it... I hope, I'll soon report the results to u http://www.opengl.org/discussion_boards/ubb/smile.gif
e-mails:
medo@sirma.bg
martin_marinov@hotmail.com

nbasili
12-14-2000, 06:26 AM
Modern 3d hardware accelerate textured surfaces so texturing is FASTER than solid or wireframe drawing !!!!!

Check the nvidia demos that come with the card. If you switch to wireframe or solid mode the performance is getting worse.

I would suggest trying texture mapped geometry and tell us if it got any better.

amerio
12-14-2000, 06:56 AM
Thanks, but this is not my issue.
I'm not doing detail poly, or this kind of things. I DO NEED the geometry, as it is the body of many objects...
Want to share src ?
(going depressed, by now... http://www.opengl.org/discussion_boards/ubb/frown.gif( )

Cab
12-14-2000, 09:15 AM
Amerio, if you want, send me your test app (if it is about 200 lines) and I can tell you if I find something strange.
I have a testing app, of a engine I’m working in, where I’m drawing a model (with 39177 triangles and 50725 vertex and 1 single texture) four times (that is 156708 triangles) at more that 43FPS (that is more that 6.7 MTris/seg).
This is using a GL_TRIANGLES display list on a GF2 GTS in W2000. This testing app is running at 640x480x32 windowed application with depth buffer with separate specular enable and one infinite light. It is not optimized for this case as it is a speed test of a general engine of objects with positions, different properties, etc.
And it’s drawing text using a font in a different texture...
Also I’m getting similar results on a Radeon DDR card.
So you should expect good speed using display lists. The same program not storing the model in a display list and using NV_Vertex_Array extension is getting more than 6.0 MTris/seg. Without display lists or the nVidia extension it’s getting (aprox.) 1Mtris/seg on both cards.

One problem I remember I have when starting programming in 3D is having a big texture (512x512) and not creating mipmaps because drawing a small triangle with this texture can be referencing all pixels on the texture. You can try changing your textures with a 32x32 ones (for example) and if it speeds up too much it’s possible that you’re not creating or using mipmaps.

Sansus
12-14-2000, 01:48 PM
Amerio, send me your code (if you want) and I'll try to find the bug or anything strange. Maybe if we all work on it, we'll catch it!!

SilverMind
12-15-2000, 01:40 PM
To me, it sounds like a compilation problem.
Are you sure that you are compiling(in MSVC) a release with optimizations turned on, and linking with the right libs?
Kinda silly, but it might be the problem.

amerio
12-16-2000, 10:35 AM
I've send my small test app to some of you.
By now, none of you have spotted some kind of weakness in my code.
So, It may be a compiler pb. But how ?
I mean, my code simply call a DL...
Anyhow, I've yet to find some src with no NV ext that runs above 50fps with 20000 vertices.
Anyone at NV reading this ? Could it be that the GeF2 can perform fast _only_ with NV ext ? http://www.opengl.org/discussion_boards/ubb/frown.gif (just wondering, not flaming)

mcraighead
12-17-2000, 02:28 PM
No, it is absolutely possible to get fast performance with display lists. I'm not sure what you're not doing right, but there must be something.

- Matt

Cab
12-18-2000, 08:24 AM
Amerio:

I do not find anything wrong in your program. You are doing one of the worst case for a graphics card: drawing a big amount of small triangles (10240), filling a big amount of the screen with up to 128 overdraw layers (this is like filling the entire screen several times with small triangles), with no vertex reuse, from back to front, with depth testing and no cull at all (all of them are facing the camera). As the triangles have one side 40 times larger than the other you can see that is slower rendering the box horizontally than vertically.
If you get a little away from the camera (setting eye from 3.0 to 5.0) you will find that you get 100 to 300 FPS depending on the orientation.
I have used VTune to see where the time has gone and it is in the driver. With GPT you can see it happens in wglSwapBuffers and this is because it has to wait, doing nothing, until the DisplayList finish to render. If you change the fps() call before glutSwapBuffers(); you will notice a speed increase. You can do more things in this position like moving something and you will notice no speed decrease.
Anyway, if you find something different, please tell me.

This is the GPT info from a 4 sec. session:

Total frames analyzed = 420.
Total Time
(secs) Function
========== ========
3.597312 GL11_wglSwapBuffers (<= notice this is from 4.0 secs)
0.231673 GL11_glCallList
0.033702 GL11_glEnd
0.012075 GL11_glBegin
0.007096 GL11_glClear
0.005875 GL11_glVertex3f
0.005312 GL11_glRotatef
0.004318 GL11_wglGetPixelFormat
0.002468 GL11_glDisable
...

Number
of Calls Function
======== ========
10080 GL11_glVertex3f
2520 GL11_glNormal3f
1260 GL11_glRotatef
1260 GL11_glDisable
840 GL11_glPolygonMode
840 GL11_glPushMatrix
840 GL11_glPopMatrix
840 GL11_glMultMatrixf
437 GL11_wglGetCurrentDC
437 GL11_wglGetCurrentContext
420 GL11_glCallList
420 GL11_glGetFloatv
420 GL11_glLoadIdentity
420 GL11_glClear
420 GL11_glTranslatef
420 GL11_glBindTexture
420 GL11_glBegin
420 GL11_glEnd
420 GL11_glEnable
420 GL11_wglGetPixelFormat
420 GL11_wglSwapBuffers

It’s my opinion and I would like if any IHV can give his opinion.

amerio
12-18-2000, 10:22 AM
I agree with you : the test app is one of the worst case one can think of.
But my goal app is not such a nightmare http://www.opengl.org/discussion_boards/ubb/smile.gif.
Anyway, it shows almost the same framerate with the same nb of poly (ie around 30fps for around 10k poly drawn).
I looked at the VAR demo at NV site. It runs damnly fast! (yes, a lot of vertex reuse, but...)

About vertex reuse : I suspect few realworld apps will have a lot of vertex reuse : My app is a VirtualReality Engine => lots of objects with around 1000poly each. So vertex reuse is by definition limited...
My goal app shows around 10/20 objects at the same time, each with its own DL (static objects).

Too bad I should'nt use NV ext. But I wonder if it would even bring some speed (ie more than 2x ?)

Apart : Where can I get a free equivalent of VTune / GPT for Win32 ?

mcraighead
12-18-2000, 02:19 PM
I ran the app and very quickly came to the conclusion that it was a fill rate limitation.

Default window size, 45 fps.

Shrank the window to as small as possible, 380 fps.

P3-700, BX chipset, NV10 SDR, 5.22 drivers (development system, so all I care about is stability), vsync off.

- Matt

Michael Steinberg
12-19-2000, 08:31 AM
Though many games have 1000 poly models, they are actually bodies, so they're closed and thus not all polies are visible at a certain time. I got 26 fps for 20k polys and I don't think that this is very bad...

amerio
12-19-2000, 12:45 PM
To Matt:
Sorry, but how come my fps rate IS CONSTANT when I shrink down the window to 16x16 on my system ? (30fps or so, 640x480 OR 16x16... no speed up).?)
And it runs at 400fps on yours when you shrink down the window ? http://www.opengl.org/discussion_boards/ubb/confused.gif
Mmh ? A driver issue ? (yours is 5.22, mine 6.31) A VIA chipset issue ? (you're a intel addict, I'm a AMD slave) http://www.opengl.org/discussion_boards/ubb/mad.gif

Such a fillrate pb is easy to track. I spotted earlier in this thread that it wasn't that (on MY system... )

I do have tried to shrink down, to resize the poly, etc...

To Micheal:
You got 25fps with 20kpoly; I got 25fps with 10kpoly, optimized strips, no textures, no lighting, even tested with perfect geodesiques spheres (ideal case for a striped object). No gain in fps.
Even with a 16x16 window.
And I don't believe so much in a software only mode (would be under 1fps).

Too bad I can't send the full app.
But as the test app seems slow to me too...

So okay, all of you tell me my code is not the pb. Might be the fillrate (?). But on my system and my client platform, speed is the same (AMD1Ghz, VIA, 128M, etc).
And yes, I do need all those poly...
I'd like to get a answer on this. Do I ask too much to my GeF2 ? Or is there anyway I could improve the code / geometry (but better than strips ???)

Thanks again ! (hope I won't get you tired with my pb... http://www.opengl.org/discussion_boards/ubb/smile.gif)

Michael Steinberg
12-19-2000, 01:43 PM
I don't know, but maybe the fact that you've written it with GLUT can be a bottleneck. I never even had a look at it (GLUT), so I'm probably wrong.

mcraighead
12-19-2000, 03:20 PM
I could run on a different driver but the bottleneck is obvious: fill rate. Drivers would make no difference.

You should make sure vsync is off.

The platform should be irrelevant in this case. You could be using a PCI card on a Pentium 200 MMX for all it matters here, it's fill rate that is the bottleneck.

When I had vsync on, it only went from 45 to ~75 by shrinking the window. That's the only thing I can think of.

- Matt

amerio
12-20-2000, 09:06 AM
I'm gonna give up http://www.opengl.org/discussion_boards/ubb/frown.gif
I'm SURE (is it bold enough?) that VSYNC is off : if I decrease the nb of primitives, I went up to 300fps and more.
Moreover, I get only around 30fps (50 when wind is blowing is the right direction). Should I consider my monitor refresh rate is 30-50 Hz ? Naaaa.
So okay :
-my code is not the pb (you all tell me).
-Drivers are not the pb (I trust you).
-GLUT is not the pb (pure win32 code shows same speed, yes I tested).
-Fillrate is the pb (okay okay, it won't run faster with so many "large" poly facing camera).
But it suprise me.
I tried setting the polys size to very small,
and speed didn't increase that much.
So fillrate... (okay, don't knock my head! I'm just dubious http://www.opengl.org/discussion_boards/ubb/wink.gif)
Just compare with the VAR tut on NV site (the one that fills all the screen with a waving biiiiiig flag, with 100k poly, at 150fps!)
Please just tell me : "You ask too much for your GeF2" or "You do it the wrong way" or anything... (snif)

(anyway, it's sad it simply runs FASTER on your comp. without a better computer/card).

mcraighead
12-20-2000, 09:48 AM
Well, if what I see is that it does increase performance, and dramatically so, when the window shrinks, I really can't help you.

- Matt

amerio
12-21-2000, 02:26 PM
http://www.opengl.org/discussion_boards/ubb/smile.gif http://www.opengl.org/discussion_boards/ubb/smile.gif http://www.opengl.org/discussion_boards/ubb/smile.gif
I FOUND OUT ! (well, actually, all credits go to Carlos Abril. Oh thank you!)

Here is an excerpt of his mail:
<<
I found that if you call
glPolygonMode(GL_FRONT, GL_FILL) and glPolygonMode(GL_BACK, xxx) with xxx
different mode that GL_FILL your speed will slow down significantly (even if
you have glCullFace(GL_BACK) and glEnable(GL_CULL_FACE)) so I suggest
setting it to glPolygonMode(GL_FRONT_AND_BACK, GL_FILL)
>>

And it works ! and solves my pb.
In my test app, I actually did this mistake.
And now, speed is around 200fps for 20k poly ! (yeeeees !) http://www.opengl.org/discussion_boards/ubb/smile.gif
Tried it in my goal app : fps x2.5 ! Yeapee !
(now with textures, 3 lights, lighmaps!)

But it makes me think of it as a driver limitation rather than a GL limitation (just wondering) (as culling is enabled). What do you think of it ? Is it NV specific, or multi-vendor ?

(oh, I'm just so happy by now! THANKS !)

mcraighead
12-21-2000, 06:09 PM
Well, I think I know what the problem was, but I can't go into much detail.

GeForce hardware supports polygon mode directly and completely in hardware. But there are other things that can slow you down that you may have been hitting.

I can't specify any further than that because to do so would require saying far too much about specific HW and driver details.

I strongly recommend that you do NOT use polygon mode. It's inefficient even for drawing a wireframe -- each line gets drawn twice. If you do use it, different front and back modes is a good way of asking for trouble. In fact, supporting a different mode for front and back was a feature OpenGL should probably have never included.

We do not, in general, try to optimize for stupid applications. So if you tell us to use a different front and back polygon mode, we won't look at whether CULL_FACE is enabled. You should just be using the same mode for both faces.

- Matt

12-21-2000, 06:56 PM
> I can't specify any further than that
> because to do so would require saying far
> too much about specific HW and driver
> details.

Aw, c'mon, I live for that kind of dirt!

I can guess, but the details become so
blurry when I do that.

mcraighead
12-21-2000, 07:05 PM
Nope, this kind of dirt requires large bribes. http://www.opengl.org/discussion_boards/ubb/smile.gif In cash, preferably. http://www.opengl.org/discussion_boards/ubb/wink.gif

No, if you want dirt on our products, you'll have to ask about something obsolete, like the RIVA 128 (which is before my time at NVIDIA, anyway). I will freely admit how many aspects of that product sucked. http://www.opengl.org/discussion_boards/ubb/smile.gif

[But it was the right thing for the time.]

- Matt