How fast is fast?

How fast should VBOs be?

Looking through the forum archives, I’ve found hints on speeding them up…but not what speeds are possible.

Currently, on a 2.6GHz P4 with a Radeon 9700Pro I’m getting around 75MPoly/sec with static data, 1 light, no textures. Vertices and normals as floats, interleaved in 1 VBO, colors as unsigned bytes, interleaved with the vertices, or in separate VBO. Drawing with

glDrawElements(GL_TRIANGLES,size,GL_UNSIGNED_SHORT,0);

with indices in a GL_ELEMENT_ARRAY_BUFFER VBO.

With lighting off, it rises to about 105MPoly/sec.

(On an athlon 1.4 with GeForce5200, I’m getting about the same speeds relative to the theoretical peaks, ~20M with lights and ~30M without)

So is this as good as I’m likely to get, or should I keep looking for ways to speed it up?

You should always try keeping it fast if it is simple However 75M sound impressive to me. I would say, as long as you get 40-60 fps its ok.

Well, the ~20M I was getting with the first try sounded pretty fast to me…then I switched to 4 element vertices, and got 40M+… then interleaved the vertices and normals instead to get to where it is now…

If 3x speed boost was that easy, there is bound to be at least a a bit more speed in there somewhere

(and fps only helps if I have something specific to render… I’m still at the r&d stage, so need to figure out speeds so I can estimate poly budgets for stuff)

Originally posted by 3B:
How fast should VBOs be?

Fast enough so that it is not a bottleneck.

If 3x speed boost was that easy, there is bound to be at least a a bit more speed in there somewhere

That’s usually not how optimization goes. Typically, you fix the large, easy-to-fix problems that give you a healthy speedup. After some point, however, you have to start fighting for every fraction of a millisecond. In some cases, you might spend an entire weeks worth of effort for a mere 0.5 milliseconds, and be glad to get that. Whereas, your first optimizations in that code gave you 2 milliseconds for 2 hours of work.

In your case, I tend to agree with everyone else. For the moment, vertex transfer is probably no longer your bottleneck, so you can stop worrying about it.

Note that it’s much easier to get zillions of vertices per second when you run at low frame rates and use a single render state. State changes, and frame swaps, cost a lot when measured in vertices/second terms, but are of course necessary to show something that’s actually interesting.

To figure out “how fast can it be” then do some math. If you use DYNAMIC_DRAW or STREAM_DRAW, then it’s likely AGP memory; AGP 8x can do 2 GB/s (assuming nothing else is in the way on the memory bus); divide that into your vertex size and index buffer size, and you get a theorethical maximum. Whether the vertex processing, triangle set-up, and other parts of the card can keep up is another question.

100 M Tris/sec sounds about right, I was reaching that with my terrain engine under ideal conditions. (1 texture, no lights, no fog, no lod) That was on a XP2000 GF5900u. That was with VAR but it should be the same speed as VBO.

[This message has been edited by Adrian (edited 10-27-2003).]

I guess I misphrased my original question I’m actually looking more at the entire geometry part of the pipeline than just the VBOs… i.e. am I formatting my data the way the hardware likes it, using the right calls to send it, etc.

At this point, I’m mainly working on narrow benchmark type code, just to see what is possible on current hardware… haven’t gotten to do much GL programming in a few years, and even then it was targeting non T&L hardware, so I’ve got a bit of catching up to do

Though given that Adrian is getting similar numbers, and nobody else saw any obvious problems with my numbers, I’ll assume for now that I’m not too far off and start working on learning ARB_vertex_program instead