Low TnL performance on r300 ?

I wonder if anyone with Radeon 9700 (Pro) has notice if standard vertex lit scenes with more than 100.000 polygons have low FPS. I’m using VAO with indexed geometry and at scenes with more than 100.000 polys (trigs) it seems that the FPS is dropping rapidly under 20 FPS (they are similar with those on a Radeon 8500 LE). I’ve read some threads on this forum that said that the triangle pipeline of the r300 is completely programmable (no fixed function dedicated circuits) and I wonder if this is the cause for low FPS at high poly count. This is also evident from the new benchmarks of Radeon compared with Geforce FX (see 8 light benchmark from 3d mark). I ask this because I need to be sure that I’m using the card at its full caps and doing something wrong. Can someone with r300 confirm this ?

100,000 triangles is quite a lot. If you can, try breaking it up into smaller chunks.

As a side note, I just created one model with 3 spheres of about 35,000 triangles each, and my program was able to maintain a frame rate of about 150FPS (on my system, Duron 1.2GHz, and a 9700 Pro, with out fragment programs or vertex programs enabled, and one VAO per mesh).

Unfortantly I can’t test a single mesh of 100,000 triangles, because 3D Studio models can only have 65k indices per mesh.

A Geforce 4 Ti 4200 can do a lot better than a r300 in that situation, that’s strange! I know that r300 (from benchmarks) can do a lot better than even a Geforce FX on vertex shaders. Seems that ATI had to do a lot of optimizations on programmable vertex pipeline to afford to have a fully flexible (no dedicated circuits) triangle engine. So I’m suspecting that they do have a completely programmable vertex pipeline but this cannot beat in stressing situations a dedicated circuit (my post was regarding standard vertex lit pipeline).

I don’t know… My HDR rendering program seems to do pretty good.
The spheres model has a total of 118,800 triangles. http://www.area3d.net/nitrogl/test.jpg

You might want to take a look at this thread I started here:

http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008527.html

I never got a really good explanation as to why the performance was so bad without VAO, but using VAO seemed to help.

Let us know if you find anything out.

– Zeno

The performance could be due to exceeding a max buffer size in the driver or hw that is causing the geometry to be sent in an inefficient manner. This sounds like an anomoly that you should report to devrel@ati.com, but I can’t really say for sure since I don’t know what else is going on in the program and whether reducing the vertices or polygons causes disproportionate speedup.

-Evan

Evan, I was wondering where I could find some documentation about the best way to use VAO (especially performance hit when using the GL_DISCARD_ATI/ GL_PRESERVE_ATI flags and what is the best no of vertices to use in an array object) for both 8500 and 9700. Nvidia has a lot of documents describing the way to program with VAR, but for ATI I couldn’t find nothing but the extension specification and some simple examples. For now, I have lots of objects in the scene (hundreds) and every one of them has their own VAO. In this way the vao’s size can vary from 100 or so vertices to thousands or tens of thousands. Is it better to use one big vao for all objects? Or one for dynamic geometry and the other for static? How big is the cache vertex on 8500/9700 ?

Thanks

hopefully :
"
The newly-ratified ARB_vertex_buffer_object extension will probably let me do
the same thing for NV_vertex_array_range and ATI_vertex_array_object. "
John Carmack .plan

But I can’t find it yet, and the latest ARB talk isn’t available either.

Be patient.
Even if the spec was published on the OpenGL Extension Registry, you would still have to wait for the next driver release before you even consider it to be implemented.

confirmation that performances are low using ATI_vertex_object with 8500LE VS NV fixed pipeline + VAR.

we now should wait for the next ARB extensions for vertex throughput but i guess that fully programmable should be always slower than the fixed (and 100% hwired) pipeline. :slight_smile:

hth

Yeah, T&L performance on R8500 is pretty low, even with VAO. I still haven’t found a way to go over 11 M Tris/sec on a 100% static scene and no CPU usage. This seems to be a pure driver issue; try the D3D Optimized mesh sample, it goes over 40 M Tris/sec.

Y.

Check out http://www.fl-tw.com/opengl/GeomBench/

Look at the various ways he renders and you’ll get your 40million/second results you’re looking for.