Possible and resonable to combine VBO with other RenderMode ?

Hellhound · September 17, 2004, 12:31pm

Hi,

i’ve read for a time ago, in one of the GameProgramming Gems Books, that it make only sense to use VBOs on high poly models with more than 3000 tris …

Now i’ve played a little bit arround with VBOs and some other Extensions as possible render modes… In the last hours i’ve tried to combine both modes: On high poly models i render in VBO mode, on less poly models i will render in some other Extension. All works fine until i’ve combined both…

When i use the ATI_VERTEX_ARRAY Extension for a sphere object (200 tris) and the VBO Extension for a High Poly model (more than 72.000 tris) i only see the model from the last render path …

Corrrect me if i am wrong, but could it be possible, that each render part cleares the frame buffer and make it inpossible to combine the modes?

When i use for both models the same mode, all works fine …

Thx, for any hint,
Christian

Korval · September 17, 2004, 12:57pm

VBO is meant to fully replace any other mode of rendering vertex data. Do not try to combine it with ATI_VAO or NV_VAR. You should use VBO’s for everything; in theory, it should be no slower than non-VBO rendering.

Hellhound · September 18, 2004, 2:10am

Ok, thanks Korval for your reply.

zeckensack · September 18, 2004, 10:08am

My opinion differs
VBOs should replace VAO, VAR and geometry in display lists, I agree.

But I think there are still valid applications for
a)vertex arrays in system memory
For situations with very complex vertex shaders and/or high pressure on the card memory.

b)immediate mode
For one-shot dynamically generated geometry.

Remember that all that VBOs give you is a higher bandwidth path between vertex storage and the graphics chip. Points to note:
1)Copying the data to a VBO is never free, and that may completely outweigh the advantages of the VBO mechanism, depending on circumstances. Especially if the data in the VBO is never reused.

2)Modifying a vertex array in system memory is much faster than modifying VBO contents. It’s simply closer to the CPU, with relaxed caching and alignment restrictions. If you “map” a VBO, you may in fact end up getting a system memory shadow of the “true” VBO, which is fast, but implies yet another copy.

3)VBOs don’t make vertex processing itself faster. If you have a 50+ instruction vertex shader on a current card, the AGP is likely fast enough to handle whatever amount of vertices the chip can process.

4)VBOs consume card memory. You may be better off spending that for more textures, PBuffers, and whatever data structures can be put there by the driver.

Korval · September 18, 2004, 3:58pm

VBOs don’t make vertex processing itself faster. If you have a 50+ instruction vertex shader on a current card, the AGP is likely fast enough to handle whatever amount of vertices the chip can process.

No. VBO’s, in theory, should use far less CPU time than regular vertex arrays. Regular vertex array data has to be copied to AGP or video memory, then a token has to be inserted into the command stream to tell the card where to render from. With VBO’s, all that needs to happen is the insertion of the token. You may not be getting around the GPU bottleneck, but you’re getting better async rendering and more CPU time.

VBOs consume card memory.
That’s for the driver to decide. A smart driver can page out unused VBO’s to main memory and page them in when they are used.

You may be better off spending that for more textures, PBuffers, and whatever data structures can be put there by the driver.
And how is the user going to know when the graphics card is running low on resources? The user isn’t told nearly enough to be able to make informed decisions about this sort of thing; as such, it is best left up to the graphics card.

zeckensack · September 18, 2004, 5:03pm

Originally posted by Korval:
[b] [quote]VBOs don’t make vertex processing itself faster. If you have a 50+ instruction vertex shader on a current card, the AGP is likely fast enough to handle whatever amount of vertices the chip can process.

No.[/b][/QUOTE]Why not? You’re making a point about CPU usage, and that’s valid. But that doesn’t make my statement wrong.

VBO’s, in theory, should use far less CPU time than regular vertex arrays. Regular vertex array data has to be copied to AGP or video memory, then a token has to be inserted into the command stream to tell the card where to render from. With VBO’s, all that needs to happen is the insertion of the token. You may not be getting around the GPU bottleneck, but you’re getting better async rendering and more CPU time.
There are circumstances where CPU time just doesn’t matter much. When your application performance is limited by vertex processing, you got one.

I wouldn’t be so sure about the exact transfer mechanism. There is certainly more than one way to do it. But I won’t go there. Because I explicitly limited this issue to highly expensive vertex processing, it simply doesn’t matter. It does matter, of course, if you remove that vertex processing bottleneck. I never claimed anything else.

(card memory)
[b]That’s for the driver to decide. A smart driver can page out unused VBO’s to main memory and page them in when they are used.

(better spent on textures etc)
And how is the user going to know when the graphics card is running low on resources? The user isn’t told nearly enough to be able to make informed decisions about this sort of thing; as such, it is best left up to the graphics card.[/b]
The user doesn’t need to know. But there should be no objection when I say that paging mechanisms get more efficient if you reduce the problem complexity. Less data to manage equals better paging, always.

You shouldn’t bite a chunk out of a limited resource that doesn’t need to be put there in the first place, regardless of how much of that resource you may have left. You shouldn’t use allocate a VBO just because you can. You should do it because whatever you are going to do can benefit from the VBO mechanism.

I’m a big fan of VBOs, but it just isn’t true that they are universally and always the “best” solution. VBOs can be misused, too. Try this .

It’s an issue of using the right tool for the job. And there are times when a different tool does the job better, or just as well, but at a higher bang-for-the-buck ratio.

imported_jwatte · September 18, 2004, 5:58pm

If vertices are left in AGP, and your CPU is memory limited, then the vertices will compete with the CPU for reading out of main memory. Thus, even if your shader is long, if there are more useful things you can do with the CPU, it makes sense to offload vertices, assuming you don’t start paging. If you start paging, well, nothing’s going to save you no matter what, except letting the user choose the “reduce level of detail for everything” option the next time around

Korval · September 18, 2004, 6:08pm

. When your application performance is limited by vertex processing, you got one.
Sure. If all your application does is throw vertices at the card, then this is possible.

However, most programs that render 3D graphics tend to do more than that. CAD and other graphics creation apps need to be listening to the mouse and doing other tasks. Games, obviously, have tons of other stuff they can be doing.

So, unless you’re just making a graphics demo, you want all the CPU time you can get.

You shouldn’t use allocate a VBO just because you can. You should do it because whatever you are going to do can benefit from the VBO mechanism.
However, as I pointed out, you’re always going to benifit from VBO’s. At the very least, you claim some CPU time that you can use to make your app more responsive/use that new physics system/better collision detection/etc.

VBOs can be misused, too. Try this.
Pathological use of the API is pathological use of the API, whether it is VBOs, textures, or anything else. If you’re doing something stupid, you can always expect bad performance.

knackered · September 19, 2004, 2:43am

I’ve done a fair amount of benchmarking on VBO recently.
The upshot is, if you’re drawing a small number of large batches, then VBO’s are the best option, because of their ability to be updated quickly. However, if you’re drawing a large number of small batches, then display lists are the best option, because the setup cost of a glCallList is far smaller than setting up VBO offsets and calling drawelements on lots of tristrips, for instance. Sometimes you don’t have much control over your source data and its layout, so it’s best to let the display list compiler sort out baddly arranged data.
Display lists are highly optimised these days, so long as there aren’t any state changes compiled into them. I suppose this is because it gives the driver the opportunity (and the time, as nobody ever expects d/l compilation to be quick) to convert the data into a layout optimal for that hardware. Sort of like the ethos behind the GLSL driver compilation model.

zeckensack · September 19, 2004, 10:58am

Originally posted by Korval:
[b]Sure. If all your application does is throw vertices at the card, then this is possible.

However, most programs that render 3D graphics tend to do more than that. CAD and other graphics creation apps need to be listening to the mouse and doing other tasks. Games, obviously, have tons of other stuff they can be doing.[/b]
That’s why I specified my points the way I did. Including the ifs and reasons.

If I condense them further, can you agree to these basic points?
1)Copying the data to a VBO is never free.
2)Modifying a vertex array in system memory is much faster than modifying VBO contents.
3)VBOs don’t make vertex processing itself faster.
4)VBOs consume card memory.

These aren’t obscure, irrelevant issues. If I wanted to go there, I would have told the fairy tale of reduced local bandwidth consumption and consequently higher effective fillrate (especially with active blending) if you don’t put your vertex data in card memory. As a matter of fact, this is both correct and entirely irrelevant in practice.

OTOH the above points are not irrelevant in practice.

So, unless you’re just making a graphics demo, you want all the CPU time you can get.
Maybe I want to make a graphics demo, and maybe I don’t. You shouldn’t need to care. All I wanted to do was point out a few pitfalls that may or may not be a problem for an application. It just depends. It always does.

However, as I pointed out, you’re always going to benifit from VBO’s. At the very least, you claim some CPU time that you can use to make your app more responsive/use that new physics system/better collision detection/etc.
<=>
Pathological use of the API is pathological use of the API, whether it is VBOs, textures, or anything else. If you’re doing something stupid, you can always expect bad performance.
How does that mix? If I can misuse VBOs in a way that reduces performance, how can VBOs always benefit me? You’re contradicting yourself.

I agree that the usage model behind that link is just wrong. Which brings me back to the reason why I initially posted to this thread. Had the implementor of this “stupid” usage model known that “copying the data to a VBO is never free”, he might have understood earlier why it didn’t work out, and could have chosen to do things differently.

If Hellhound planned on doing the same thing, after reading pure unconditional praise about VBOs, that would not have been stupid, because he asked expecting good advice, and no one told him about the potential issues.

I have a pet project that doesn’t benefit from VBOs either. I know that for a fact, because I have a VBO path in there that I can just turn on with the flick of a preprocessor macro, and it has always been worse than plain system memory arrays, or even “mutated” immediate mode (mutated := glVertex3fv), on every driver I’ve tried, practically since the VBO extension was first officially released up until today. I’m entirely certain that I didn’t just make it all up.

I really don’t get what’s going on here, Korval. You’re not going to prove me wrong by telling me what requirements, in your opinion, a “proper” application should have.

Korval · September 19, 2004, 11:08pm

If I condense them further, can you agree to these basic points?
1)Copying the data to a VBO is never free.
2)Modifying a vertex array in system memory is much faster than modifying VBO contents.
3)VBOs don’t make vertex processing itself faster.
4)VBOs consume card memory.
1: Compared to what? The copy that the driver is going to do when rendering with regular vertex arrays? Effectively, doing that VBO copy is no different from using regular vertex arrays, so my statment that VBO’s are at least as fast as regular vertex arrays still stand.

2: Once again, compared to what? The driver is still going to have to do a copy, which is where the CPU eating problem happens. And that copy happens each time that vertices are built. The only problem is if the user maps the VBO and starts writing randomly to a pointer that could be in uncached or AGP memory.

3: True, but neither does anything else.

4: That’s not necessarily true. The VBO implementation on TNTs don’t have to consume video or AGP memory. Since they’re using software T&L at all times, they have to be in main memory. If the driver doesn’t want to, then it won’t use video memory. Where VBOs (and textures, etc) go is up to the driver.

OTOH the above points are not irrelevant in practice.
Are they? The stuff on copying data is pretty irrelevant. Most data isn’t being constantly generated. And the data that is can be uploaded at the same speed as regular vertex arrays (in theory).

How does that mix? If I can misuse VBOs in a way that reduces performance, how can VBOs always benefit me? You’re contradicting yourself.
Because I assume that the user isn’t an idiot. It’s like asking whether you should use texture objects, or just keep uploading the texture with glTexImage calls each time you need to “bind” a new texture. There’s a clearly right answer and a clearly wrong one.

If Hellhound planned on doing the same thing, after reading pure unconditional praise about VBOs, that would not have been stupid, because he asked expecting good advice, and no one told him about the potential issues.
Once again, that goes under the, “The person I’m talking to isn’t an idiot” area. This is the “Advanced” forum; it should go without saying that shuffling memory around is slower than not doing so.

I’m entirely certain that I didn’t just make it all up.
I would say that either the drivers are doing something stupid, or that your code is violating some assumption about how VBO’s are to be used. There is nothing in the VBO spec that would ever cause proper use of the API to be slower than vertex arrays. As such, if something is causing this to be the case, then it is either the driver or the user who is at fault.

Actually, come to think of it, there’s a 3rd reason: “bad” vertex formats. Drivers can have some pretty odd restrictions on vertex formats and alignment. Trying to use shorts and bytes can work on some cards and not on others (and, by work, I mean at a decent speed). These are due to hardware limitations, which don’t change that often. 4-byte aligned floats is the only thing that is really guarenteed to work with good performance.