White Paper about Vertex Buffer Objects at nvidia website

Just for your info. I have seen that there is a new paper about VBO @ developer.nvidia.com

What, no direct link? Argh!

Originally posted by Elixer:
[b]What, no direct link? Argh!

[/b]

LET’S GET HIM!

-SirKnight

Actually I just looked over this paper and it seems to be pretty good. I’d like to see papers like this come out much sooner though. Like about the time VBO was first introduced. The presentations at that time confused the crap outa me and I never quite “got it” at first. Having this paper at that time would have got me going on VBO a lot sooner and quicker.

-SirKnight

shrugs For the basics I found that the spec was extremely helpful – much, much more so than the specs for -any- other extension. Those examples at the bottom showed me everything I needed to know to immediately replace my implementation of ATI_vertex_array_object.

Whew… took me this long to find the article.

Interesting note about PBO and the fact that this document is still labled ‘Nvidia confidential’ Where has this document been hiding?

yeah, i love to see the pbo information… essencially explains them, too…

can’t wait

Why dave? what are you planning to do with the pbo extension that gets you so excited?

playing with it… what else?

and i’m specially waiting for the vendor independent port of an app of a friend. and he needs PBO to do this. and i can’t wait to see his work running on my hw…

Originally posted by Zak McKrakem:
Just for your info. I have seen that there is a new paper about VBO @ developer.nvidia.com

This is some cool info. I didn’t expect glVertexArray() to be the sinner, performance wise.

BTW, VBO rocks. I’ve been using since the first time it was exposed in a driver.

About the mentioned GL_PIXEL_PACK_BUFFER and GL_PIXEL_UNPACK_BUFFER as new targets for PBO: are both really necessary? I think, a target GL_PIXEL_BUFFER with a usage pattern of *_DRAW or *_READ should do the same thing?

Hampel

Originally posted by cschueler:
This is some cool info. I didn’t expect glVertexArray() to be the sinner, performance wise.

It’s not too surprising, but I’m glad they’re being direct about it. OTOH, it goes against the claim that VBO mitigates the memory management problem. Expensive binding still pushes us to batch many objects into common buffers instead of having many independent VBOs. So yeah, we can allocate a dozen VBOs instead of one big [partitioned] VAR chunk, but not thousands of VBOs like some have suggested.

I’ll have to test this, but it also seems to me after reading this that if Map/Unmap potentially demotes a buffer back to system memory, I may be better off keeping my parallel system RAM and AGP/Video copies of vertex data (i.e., for efficient arbitrary reads and faster draws). I’d hoped that specifying COPY (READ and DRAW) would do this dual-buffer trick internally, but I don’t get the sense that’s the case. Or am I reading too much into this?

Avi

I’ll have to test this, but it also seems to me after reading this that if Map/Unmap potentially demotes a buffer back to system memory

Yes, I’m a little worried about this too for the same reason - I keep an application copy of the vertex data in system memory too - for arbitary reading and enabling the app to make changes to a single copy of the vertex data while the various renderers copy from that common store.
I think I’d have prefered it if, when the map call is made, we could specify a system memory pointer for the buffer to ‘unmap’ (copy) from once the unmap call is made - instead of being given a portion of memory to copy into.
Wouldn’t that potentially eliminate a redundant copy?

> but it also seems to me after reading
> this that if Map/Unmap potentially
> demotes a buffer back to system memory,
> I may be better off keeping my parallel
> system RAM and AGP/Video copies

For me, Map/Unmap together with STREAM_DRAW works as advertised. I can hit the transform limit with 16 bytes / vertex over AGP 4x, interleaving CPU calculation and drawing calls.

for( … )
{
glBufferData( …, 0, STREAM_DRAW )
pointer = glMapBuffer( … );
CPU_Calculation( pointer );
glUnmapBuffer();

glDrawElements( … );
}

It’s the most simple scheme to use. Better even than the DirectX DISCARD/NOOVERWRITE.

Originally posted by Hampel:
[b]About the mentioned GL_PIXEL_PACK_BUFFER and GL_PIXEL_UNPACK_BUFFER as new targets for PBO: are both really necessary? I think, a target GL_PIXEL_BUFFER with a usage pattern of *_DRAW or *_READ should do the same thing?

Hampel[/b]

These are just separate binding points. They have nothing to do with the usage hint.

It’s true that you don’t “need” separate binding points, but they make the design cleaner, IMO.

Originally posted by knackered:
Why dave? what are you planning to do with the pbo extension that gets you so excited?

Perhaps that it’s a DECENT replacement for the crappy render texture extension we have now? Gah, I HATE WGL_ARB_render_texture.

From the whitepaper “Use glDrawRangeElements instead of glDrawElements”

Hmm, I use glMultiDrawElementsEXT so I could really do with a glMultiDrawRangeElementsEXT

I think I’d have prefered it if, when the map call is made, we could specify a system memory pointer for the buffer to ‘unmap’ (copy) from once the unmap call is made - instead of being given a portion of memory to copy into.

Isn’t that what glBufferSubData does? It takes a pointer and copies that into the VBO.

My concern stems from my current use of VBO’s. I do a lot of streaming from the hard disk. The behavior that I would like to see is for the VBO’s to remain in video memory at all times as I frequently (maybe every 5-20 seconds) upload new information to them. When I do an upload, it will be to VBO’s that are not in actual rendering use.

I don’t think STREAM_DRAW is appropriate for this circumstance. And DYNAMIC_DRAW, on nVidia hardware, seems to want to use AGP memory rather than video. The only thing that’s left seems to be STATIC_DRAW, but will this cause problems (as the driver expects the data to be uploaded only once)?

Perhaps that it’s a DECENT replacement for the crappy render texture extension we have now? Gah, I HATE WGL_ARB_render_texture.

How would PBO be any faster than copying to a texture? And, therefore, how would this be a decent alternative to ARB_render_texture, which (at least on ATi cards) is faster?

Originally posted by cschueler:
[b]For me, Map/Unmap together with STREAM_DRAW works as advertised. I can hit the transform limit with 16 bytes / vertex over AGP 4x, interleaving CPU calculation and drawing calls.

for( … )
{
glBufferData( …, 0, STREAM_DRAW )
pointer = glMapBuffer( … );
CPU_Calculation( pointer );
glUnmapBuffer();

glDrawElements( … );
}
[/b]

Interesting. I’m guessing your access flags to glMapBuffer are WRITE_ONLY, which wouldn’t need to force a demotion. You might see what happens if you change that to READ_WRITE, even without actually reading the memory.

The question for me is whether the whole buffer gets permanently demoted, or whether there now exist two parallel copies, one readable and one in faster memory. of course, that implies that unMap does a copy back to the fast buffer when you’re done editing.

I guess the other thing to test is glBufferSubData–whether, if there are N small changes to a buffer it pays to do N glBufferSubData calls or just big one for the whole buffer and where the cutoff N is. Anyone tried that?

Avi