Is it me or the 9700? Only 1/3 gf4 speed!

I’m porting a program of mine to the Radeon 9700 pro, and I’m having a performance issue that I don’t understand. Here’s what’s going on:

I’m drawing several long single textured and lit triangle strips using vertex arrays with glDrawElements. No OpenGL extensions right now. No VAR/VAO. On a Geforce 4 Ti4600, I get approximately 140 fps when drawing this data. On the 9700, I only get about 45 fps. Obviously, the 9700 should be beating the ti4600.

Here is what I have checked so far:

  1. vsync off on both machines
  2. anti-aliasing off on both machines
  3. aniso filtering off on both machines
  4. Latest ATI catalyst drivers
  5. If I only clear the screen and don’t draw the geometry, the radeon pulls ahead with over 1600 fps, compared to the gf4ti4600 at about 1200 fps.
  6. AGP is enabled at 4x
  7. One potentially strange thing is that, in the display properties under “Adapter”, it says that my adapter string is “Radeon 9700/9500 SERIES secondary”. What does “Secondary” mean?

Humus, if you read this, I am getting about 60 fps on your phong demo at 1024x768, and about 53 on your colored mandelbrot demo. Since I assume these are highly graphics card limited…is that about what I should expect?

Thanks guys,
– Zeno

Originally posted by Zeno:
[b]I’m porting a program of mine to the Radeon 9700 pro, and I’m having a performance issue that I don’t understand. Here’s what’s going on:

I’m drawing several long single textured and lit triangle strips using vertex arrays with glDrawElements. No OpenGL extensions right now. No VAR/VAO. On a Geforce 4 Ti4600, I get approximately 140 fps when drawing this data. On the 9700, I only get about 45 fps. Obviously, the 9700 should be beating the ti4600.

Here is what I have checked so far:

  1. vsync off on both machines
  2. anti-aliasing off on both machines
  3. aniso filtering off on both machines
  4. Latest ATI catalyst drivers
  5. If I only clear the screen and don’t draw the geometry, the radeon pulls ahead with over 1600 fps, compared to the gf4ti4600 at about 1200 fps.
  6. AGP is enabled at 4x
  7. One potentially strange thing is that, in the display properties under “Adapter”, it says that my adapter string is “Radeon 9700/9500 SERIES secondary”. What does “Secondary” mean?

Humus, if you read this, I am getting about 60 fps on your phong demo at 1024x768, and about 53 on your colored mandelbrot demo. Since I assume these are highly graphics card limited…is that about what I should expect?

Thanks guys,
– Zeno

[/b]

Maybe you are over saturating the triangle cache?

The secondary is the second VGA out.

Hmm, do you mean vertex cache? What would it mean for a vertex cache to be “over saturated”? Doesn’t it just deal with vertices as they come in?

My triangle strips might not be vertex cache friendly, but if that’s true it should be true on both cards.

Erm… Yeah, I guess.

My guess is that the cache buffer is overflowing and it has to refill it on a second try (probably causing a stall). Maybe the GF4 has a larger cache? I donno.

You could always lowering the amount of data being passed to the card, and see if that helps…

Without knowing the format of your vertices, or exactly what you are doing when you render, the first thing that comes to mind is that you are hitting some sort of SW limit. In general, DrawElements by itself can be quite painful from a performance point of view. I would suggest trying to at least lock the arrays, so the dirver can know the size of the arrays.

-Evan

Thanks Evan. Here’s some more info:

I was drawing about 32*(32-1)*2 elements with each call to glDrawElements. The vertices come out of a larger array of size 256x256 with 3 components per element.

I fixed the performance problem on this static geometry by using VAO.

That is only half of my problem, however, as the other half of my geometry needs a more complex shader with more data per vertex.

I’m having a lot of difficulty getting VAO to cooperate on this. It seems that I have only two choices, using glVertexAttribPointerARB() which requires 4 components per vertex (I only need 1 or two!) or using glArrayObjectATI() which also limits my flexibility with components per vertex AND only seems to expose three possible arrays (GL_VERTEX_ARRAY, GL_TEXTURE_COORD_ARRAY, and GL_NORMAL_ARRAY).

Is there a nice general solution for binding VAO’s to vertex attributes? Something that works like glVertexAttribPointerARB() where I can just give it the number of the attribute and the number of components?

Thanks again,
Zeno

You are looking for vertex attrib array object. It is in oure extensions pdf. Here is the link:

http://www.ati.com/developer/atiopengl.pdf

From your description of your data, I can understand exactly why the driver did poorly. If you weren’t locking it, the driver needs to send everything as if in immediate mode, or it needs to figure out the correct sub-range of data to copy to the HW. I guess that we were deciding a data range to copy, and you were only using a relatively small portion of it. This is what can make vanilla DrawElements a pretty bad performer.

-Evan

Actually, if your strips go “the wrong way” (crosswise) in your bigger data block, then you’ll get poor performance no matter what, unless you can upload ALL your data into a single AGP chunk.

I’d suggest making sure that all vertices that you draw together (as one strip submission), live together in memory (using more or less sequential vertex indices).

Thanks again Evan, that’s exactly what I was looking for. I love this fragment program stuff

Jwatte -

I just want to clarify what you mean. For instance, if I had an array of vertices like this where the number represents it’s place in a 1-d array:

1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16

Then are you saying that it would be fastest to draw the triangle strip as:

a) 1,5,2,6,3,4,8 or
b) 1,2,5,6,9,10,13,14 ?

Either way, it’s sortof in order. In mine, though, I am doing something that may be particularly bad…like this:

1,5,2,6,3,4,8,12,8,11,7,10,6,9,5

That is, I turn the corner at the end and go back the other way. I guess it might just be better to make multiple calls to glDrawElements?

– Zeno

1,5,2,6,3,4,8,12,8,11,7,10,6,9,5

I don’t think that actually works. You don’t get the 3, 7, 8 triangle.

In any case, don’t make multiple glDrawElements calls. Either strip normally and use degenerate strips to connect them, or use glMultiDrawElementsEXT.

Originally posted by Korval:
I don’t think that actually works. You don’t get the 3, 7, 8 triangle.

Just a typo. It should be 1,5,2,6,3,7,4…

Regardless of the reason I am amazed there is such a big performance difference.

EXT_multi_draw_arrays looks interesting, can’t believe I missed that, as does EXT_draw_range_elements. Has anyone done any benchmarking with this extension. Does it make much difference?

btw Zeno, your flight sim engine looks excellent!