VBO Tutorial ready for review!

Okay, I said I would do it, I’ve done it.

http://206.251.36.107/programming/vbointro.mhtml

This is a (fairly polished) draft using three separate demos, one with just simple geometry, one using lighting, and one with textures. Right now it’s on a mod_perl vps which has been having some “out of my hands” problems lately, so if anyone gets a time out and cannot access the site, let me know and I will have to move a static version of the pages to somewhere else.

If you are not familiar with VBO’s and are interested, it is certainly complete enough for you to read and learn from. I would very much appreciate feedback from people vis, what parts of the tutorial are confusing or unclear.

There are a few places where I’ve left [???] in brackets. That’s a clue that your input would be appreciated, if you already know this stuff and can provide some clarification or details.

It will probably take about 1/2 hour to read.

One further issue is that this is still a “fixed pipeline” method, which is also being obsoleted. I think that is okay, since it will probably be easier for people to learn VBOs then shaders rather than both at once. The relevance here is wrt to texturing, since as I point out, using the old functions you cannot have more than one texture per VBO. I’ve never used shaders (that’s my next task, having found a copy of the 2nd ed “orange book” at the library). Anyway, if someone could confirm that using GLSL with VBO’s circumvent this apparent problem, that be great – even better if you can come up with a couple of sentences explaining this for the first paragraph of page 3.

Anyway, I think you’ll be impressed, I put some time into this.

I’m not famailiar with this limitation. Even in the fixed function world, most cards will let you bind multiple textures to different texture units and provide multiple sets of texture coordinates for each vertex.

Also, I took a quick read of your tutorial. Here is some feedback.

  • In regards to your first example, glRotatef is also removed in 3.2.

every function call in C/C++ requires a stack be created, then freed at the end

To the best of my knowledge this isn’t true. A stack frame (not stack) needs to be setup but, generally, memory allocation does not occur. It really just copies some values and moves a pointer by an offset. Function call overhead can still be an issue though.

Rather than re-describing the object every frame, we describe it once, store the data in a buffer (literally, a float array), and perform frame by frame transformations on the existing data to move and rotate the object. What’s more – that buffer is kept in video memory

You focus on using VBOs for static data. They are also useful for dynamic and streaming data. You should make your scope clear, otherwise people might think VBOs are only used for static data. Also, I wouldn’t claim that the buffer is kept in video memory. Hopefully it is, but to be safe, call it driver controlled memory.

In fact, we should really either break this into two buffer objects and do something similar here, or else use GL_QUADS

I believe quads and quad strips are removed in 3.2. Consider changing the example to use an indexed triangle list, since that is the most useful. You can also render a box with a single triangle strip and no index data but that might be too fancy for a simple example.

Regards,
Patrick

Let me say this: I am impressed !

I read the first one entirely and skipped through the other two, and it looked very good. The step-by-step approach is exactly what newbies need (it reminded me of nehe.gamedev.net, which i used to learn GL back then).

There are a few sentences, where it becomes apparent, that you are still learning this stuff yourself and don’t know all the details of GL or C, but as far as i could see there were no false statements and it won’t stop beginners from learning what the tutorial is about.

About the “one texture per VBO”-thing: It is correct that you can only bind one set of textures at a time, which means all geometry rendered in one draw-call (glDrawArrays etc.) will use this set (and in fact the same holds for ALL states, even shaders, so shaders cannot change this).
However, to use several different textures, you do not need to generate a VBO per texture that you want to use. Instead you need to make several drawcalls. With each drawcall you define where to start reading in the VBO and how many triangles to draw. That means you need to bind texture A, render the VBO’s range (x1 - y1), bind texture B, render the range (x2 - y2), and so on.

For better performance people sort / group all triangles in their VBOs so that all triangles that use the same texture are located at the same position in the VBO, so that you can render all of them with one drawcall. But for starters that’s not necessary.

I suggest you learn shaders soon, they have a big influence on how the pipeline works. When you feel confident with them, people would certainly like to see more tutorials, such things are always very welcome.

Jan.

For simple stuff I still use immediate mode. When a lot of geometry is involved, I use Display Lists. Are VBOs faster than Display Lists?

Thanks.

Are VBOs faster than Display Lists?

They’re more flexible than DLs. And more likely to work on more platforms (ATI in particular). VBOs get exercised more in games than DLs.

Are VBOs faster than Display Lists?

In my experience, usually not (on NVidia). It depends on a lot of things, as yet unspecified in one location, and some likely never publically even documented yet.

For instance, one thing stock VBOs can’t do that Display Lists can is avoid all the name-to-pointer lookups needed to latch the vertex attributes, element index lists, etc. for rendering. However, apply NVidia’s bindless graphics extensions on top of VBOs and you can. This provides a nice speedup in real-world apps (not 7X, but still decent), when there are lots of batches.

For another, Display Lists (which are built using GPU vendor code) can reorganize/reformat/repack batches to forms which are maximally efficient for your GPU. With VBOs? Well, you have to do that, so hope you’re as in-the-know as your vendor’s driver gurus.

In other words, no.

VBOs get exercised more in games than DLs.

Please, show us where you’re getting your stats. And which game are you currently working on?

For instance, one thing stock VBOs can’t do that Display Lists can is avoid all the name-to-pointer lookups needed to latch the vertex attributes, element index lists, etc. for rendering.

So can a VAO.

For another, Display Lists (which are built using GPU vendor code) can reorganize/reformat/repack batches to forms which are maximally efficient for your GPU. With VBOs? Well, you have to do that, so hope you’re as in-the-know as your vendor’s driver gurus.

And when ATI makes their display list implementation anything more than the absolute bare minimum necessary for them to call it an OpenGL implementation, let me know. Until then, my recommendation stands: use buffer objects.

What can happen is nice, but what does happen is what matters. And what does happen is that ATI does as little as they can possibly get away with. Relying on them to optimize more than one rendering path is asking for trouble.

Better to stick with the well-trodden path of buffer objects.

Please, show us where you’re getting your stats.

I’m fairly sure the Doom 3 engine doesn’t use display lists for geometry, instead preferring buffer objects. It’s only one engine, but there aren’t very many OpenGL engines out there. And even fewer with any real behind-the-scene’s knowledge about them.

In terms of freeware engines, Ogre3d uses buffer objects rather than display lists. I’m pretty sure Torque doesn’t use DLs either. I didn’t bother checking the others.

And here is what does happen:

Frame draw times (in ms, random scene):

VBOs              : 18.5ms
VBOs+VAOs         : 17.2ms  (gain:  7%)
VBOs+bindless     : 15.8ms  (gain: 15%)
VBOs+VAOs+bindless: 14.0ms  (gain: 24%)

Display Lists     : 10.7ms  (gain: 42%)

Note that bindless above is only for vertex attribute and element index arrays. No shader bindless applied yet. Also this is on a latest gen relatively fast CPU/mem/NVidia GPU with large CPU caches. The gains should be larger on slower CPUs/mems and smaller caches.

ATI does as little as they can possibly get away with.

Wow. Before that comment, I was beginning to think you worked for them. :wink: Fighting so hard for performance on ATI.

The fact is, while some apps must strike the best balance of cross vendor performance between ATI,NVidia,Intel,iPhone,etc. (and thus must settle for lowest-common denominator capability), others (like ours) just don’t care at all! They can just pick the best vendor/GPU at the time and go with it.

You need to stop assuming that all your readers care what performance looks like on ATI card, and caveat your assertions. Not everyone is dumbing down their app to the single lowest common denominator, and some even support “multiple” fast paths ( :eek: )

And how better to encourage industry competition than by exposing what vendors are doing well (or poorly), so other vendors can improve, if they care.

You need to stop assuming that all your readers care what performance looks like on ATI card, and caveat your assertions.

Wait. You’re saying that the number of readers who care about general OpenGL performance (the performance of a cross-platform graphics API across, you know, all platforms) is smaller than the number of readers who can force their users to use NVIDIA hardware and drivers. Are you kidding?

Do you have any statistics to back that up? Because it seems highly unlikely on the face of it.

This is an OpenGL forum, not an NVIDIA_GL forum. Here, we don’t assume that everyone can force the use of NVIDIA hardware on their users. Therefore, if you want to talk about NVIDIA-specific performance characteristics, it is you who needs to qualify your assertions. You need to remember that NVIDIA isn’t the God of Graphics. And you need to remember that OpenGL is first and foremost cross-platform.

And how better to encourage industry competition by exposing what vendors are doing well (or poorly), so other vendors can improve, if they care.

Do you honestly think it’s a secret that ATI’s implementation is the bare minimum? NVIDIA has had the best GL implementation since they started writing them. Nothing has changed in this for over a decade.

ATI cares about OpenGL exactly and only as much as they have to. Which means they care about the paths that big OpenGL software uses. And big OpenGL software uses buffer objects, not display lists. Therefore, ATI cares about buffer object performance, not display lists.

As much as you may want ATI to optimize their display list implementation, wanting it won’t make it happen. Since ATI only cares about what the big developers do, it will only happen if big developers force ATI to do it. And they will only do it if they think ATI can give them a big performance gain.

And even then, the question arises: will it give them a performance gain? Or can ATI just tell them a few buffer packing tricks, and they will get strong performance just from that? Why make ATI write all of that (bug-prone) software when you can just make a few tiny adjustments to your mesh format and get good performance?

Well, because I’ve taken a responsibility to criticize your tutorial, I have to roll up my sleeves and start… :slight_smile:

Specifically, the obsoleted commands are glBegin, glEnd, and glVertex [others???].

There are so many deprecated functions that it is rather silly to number all of them. Just mention some of them.

“Immediate mode” has always seemed a little crazy

Immediate mode was a great innovation in the firs half of 1990s. SGI did a great job by enabling it.

As we know, every function call in C/C++ requires a stack be created,

Not true! Read any book about microprocessors and you’ll find out how it works. The stack is used for transferring parameters, but the stack is not created on each call.

Besides immediate mode, there are a number of other techniques of which you may be aware.

Generally, there are only two methods: immediate and retain mod. Retain mode can come in many flavors. VBO is one of them.

At the time of this writing, there is little to no straight-forward material regarding VBO’s on-line;

That is also not true! Whatever you think I have to strongly disagree with this statement!

…and presumes, for example, knowledge of vertex arrays (another obsoleted method).

VBO is based on VA philosophy. Maybe you would have a better knowledge of VBO if you learned VA first. They share the same concept. Except that VA stores data on the client side and deals with real pointers, and VBO with offsets inside buffers.

First, a basic description of how Vertex Buffer Objects work. Rather than re-describing the object every frame, we describe it once, store the data in a buffer (literally, a float array), and perform frame by frame transformations on the existing data to move and rotate the object. What’s more – that buffer is kept in video memory, so it does not have to be repeatedly transferred to the card (unless you exhaust that memory, of course).

pjcozzi has already given some comments on this. Buffer does not have to be on the server’s side. Furthermore, even if you declare it as a static it depends on the driver if it will really be placed in the video memory. Those flags are only a tip to the driver and are not obligatory.

But it is worth noting that STRIP vertices are slightly less useful in general with VBOs – you are better off defining each polygon.

It depends! Be more specific. It is better to use GL_QUADS than GL_QUAD_STRIP in the case of a cube, but not in general case.

The final command in inithings() is glBindBuffer (GL_ARRAY_BUFFER, 0);. This simply “turns off” GL_ARRAY_BUFFER so that it safely refers to nothing.

Not exactly. It turns back to VAs and pointers to client’s memory. Please, read any of the documents that you have spit on again.

All this transferring of data into video memory occurs before the first frame is drawn.

Only for the static buffers.

The next command is glEnableClientState (GL_VERTEX_ARRAY)…

This paragraph doesn’t describe anything. If you have learnt VAs before VBO it will be clearer even to you.

There are other possibilities besides GL_ARRAY_BUFFER, but this is the only one used in our tutorial. The basic point is that you must work on one assigned buffer at a time.

:slight_smile: What a possibilities!? :slight_smile:

Other comments:

  • Do not use floats for colors. It makes your object bigger than necessary. Each vertex in your implementation consumes 13B without real need. If you have 1M vertices it is consumption of 13MB of valuable memory. Further more, bigger VBO requires more resources to be transferred to server’s memory.
  • Third page I didn’t read, so comments on textures will come later…

Strongly agree with you, Dark!

Buffer does not have to be on the server’s side.

Small point. Buffers are stored server side. But that doesn’t mean “video memory”. It can just as easily be RAM allocated from your address space by the driver.

  • Do not use floats for colors. It makes your object bigger than necessary.

Unless of course, you actually need floating-point colors (HDR) :wink:

However, I agree that you shouldn’t encourage the use of floats everywhere. It would also be a good place to introduce the idea of conversion between the vertex buffer data and the data that the rest of the system expects for that component. If you use unsigned bytes for colors, you have to talk about normalization of the color values (so that 255 becomes 1.0 internally, etc).

Or you could just link to this page, which contains a detailed description of all of this.

You’re off inventing things now. Reread.

As a Display List and NVidia user, this is very interesting. Thanks.

Well, thanks for all the feedback gang!

Yeah, I’ve started reading the orange book. Previously, I thought “shader” referred to something that worked with pixel texture, color, etc, and not vertex transformation, so I will have to make a note about this.

You’re totally correct. I’ll put something more appropriate there.

Actually demo #3 does make dynamic changes to the buffer.

Good point.

Wow. I can see the point of the ELEMENT_ARRAY_BUFFER and indexing now. I’ll write a short 4th demo duplicating the 2nd or 3rd one, using indices to map quad data to triangles.

I presume glTranslate also, since that is a fixed pipeline thing? Anyone know where I can check?

Point taken. Maybe I’ll add “to me, anyway”, or temper that somewhat.

Okay. That should be explained better.

Nah, I gotta stand by that one. I don’t think it is all necessarily incompetent, I just think much of it is aimed at more specific audiences, such as specifically C++ programmers, or specifically ES programmers…

…or specifically VA programmers, which was my point. I’m not denigrating the idea – altho considering those are now obsolete too, it seems silly to recommend studying that material just so you can understand the VBO material.

Hmmm. Okay. Since the STATIC vs. DYNAMIC vs. STREAM is supposed to be an “optimization hint”, I suppose this is implementation specific. I’ll make a note to that effect, or maybe someone can elaborate?

Oh? It makes it plain that You need to enable a “client state” for each of the pointer stylesused with the VBO. (“client state” perhaps needs more explanation). Since this is the first demo, I presume the reader is unfamiliar with the commands so I explain their purpose, which is not “nothing”. I do not presume that “if they had learnt something about VA’s they will already understand”, and hence don’t need to read this anyway.

Hmmm. Interesting. The reason I use floats everywhere is because I have read previously that GL translates everything into floats anyway, so you save a step.

I had not even noticed you could use anything smaller than a float for color, actually. That somewhat complicates putting the array together, I guess. I’ll see what I can figure out and maybe work it into the forth demo.

Anyway, I probably won’t have too much time until the weekend, but I’ll post again when I make changes.

Thanks again all!

I don’t think it is all necessarily incompetent, I just think much of it is aimed at more specific audiences, such as specifically C++ programmers, or specifically ES programmers…

Right. Because C programmers obviously can’t read C++ enough to understand what a function call looks like.

If you showed any programmer OpenGL in C#, Java, Lua, any programing language (outside of assembly), I would expect them to understand how it works and how to code it in C. If they can’t, then they’re not a very good programmer.

Hmmm. Okay. Since the STATIC vs. DYNAMIC vs. STREAM is supposed to be an “optimization hint”, I suppose this is implementation specific. I’ll make a note to that effect, or maybe someone can elaborate?

What he’s saying is that your comment is implying that you aren’t allowed to change the contents of a buffer after the first frame is drawn. He’s not talking specifically about the hints.

The reason I use floats everywhere is because I have read previously that GL translates everything into floats anyway, so you save a step.

The translation is free. So use whatever makes your data smallest. Though there are some limitations on what you can do, based on the pointer type. Table 2.5 of the OpenGL 3.2 compatibility specification lists them.

The quick reference card: http://www.khronos.org/files/opengl-quick-reference-card.pdf shows the commands removed from the 3.2 core profile, in blue.

glEnableClientState is one of the removed functions, its for vertex arrays, not VBO’s.

As a Display List and NVidia user, this is very interesting. Thanks.

but not surprising as a couple of ppl from nvidia have said display lists are the fastest method of drawing static stuff and not VBOs. Then again in that becnhmark DLs do seem a bit too fast, i.e. is it legit

glEnableClientState is one of the removed functions, its for vertex arrays, not VBO’s.

glEnableClientState is used to activate arrays for the standard vertex attributes. GL 3.0 deprecated the use of standard vertex attributes, so this function (and the Disable) were likewise deprecated.

However, if you’re going to write code that uses the standard vertex attributes, then you’ve already decided that you’re going to use deprecated functionality. So at that point, anything goes.

Hmm. I’d really like to resolve this one. ALL of the other tutorials I saw do use EnableClientState, including this one, which was the only one recommended by anyone here:

http://www.songho.ca/opengl/gl_vbo.html#draw

As it turns out, VBO’s are not even in the index of the last edition of the redbook (7, dated “2010”), and AFAICT are not explicitly discussed at all – there is a general discussion of “Buffer Objects” which ends with “Using Buffer Objects with Vertex-Array Data”* which does use EnableClientState (102-103). This is not about VAO’s tho, since VAO’s are the next section after that. So what is this talking about?

Going with the simple code from demo1


void renderObjects () {
	glColor4f(0.75f,0.0f,0.0f,1.0);
	glEnableClientState(GL_VERTEX_ARRAY);
	glBindBuffer(GL_ARRAY_BUFFER, Cube.VBO);
	glVertexPointer(3, GL_FLOAT, 0, NULL);
	glPushMatrix();
		glRotatef(Cube.R[0], 1.0f, 0.0f, 0.0f);
		glRotatef(Cube.R[1], 0.0f, 1.0f, 0.0f);
		glDrawArrays(GL_QUAD_STRIP,0,16);
	glPopMatrix();

	glBindBuffer(GL_ARRAY_BUFFER, 0);
	glDisableClientState(GL_VERTEX_ARRAY);
}

Nb, the push/pop and rotate is unnecessary. How would you get this to render without EnableClientState? Even if I put the color data into the array (demo2) it will not render without that.

*it would seem to me a “vertex buffer object” would be a “buffer object” containing a “vertex array”, maybe I stand to be corrected…since VertexPointer, NormalPointer, ColorPointer, and InterleavedArrays are also depreciated, I am beginning to believe that VBO’s are actually obsoleted too, meaning it will be what – display list only? I guess I have really been “led down the garden path” on this one, eh. I’d prefer not to do the same to anyone else, so if someone can confirm that, I will have to add a big caveat to my now pointless tutorial :mad: