PDA

View Full Version : VAO: Performance questions



Ed Daenar
07-27-2013, 03:49 PM
I'm having a lot of trouble finding decent documentation about VAOs beyond how to use them, which is clear enough. I'm mostly concerned about the impact of changing a VAO state as I'm not sure how the driver is going to react to this.

The context of the problem:

I'm using meshes with a large amount of triangles which require some form of spatial subdivision. As a first design alternative, this is being implemented as a single VAO with the attached VBOs that contain the mesh information and an IBO with the whole index that would allow to draw the whole mesh, if necessary (rarely used in this case, but still).

The mesh is subdivided with an Octree-like structure and each node contains an IBO with the index data of the triangles stored in the node. There's obviously other data stored that's not used for rendering, including other topology data, which is irrelevant to the question.

When rendering, the sections that are to be drawn are culled, the IBOs gathered and finally the single VAO is bound and its IBO is reassigned before a glDrawElements call is issued. So it goes:
- Bind VAO
- Loop for each IBO
--- Bind IBO with glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,iboID)
--- Render with glDrawElements

The question:

What would be the expected impact of changing a VAO's state bind state? I understand this is a very relative question, but I'm ok with a relative answer as well. Is the impact expected to be the same as changing any other bind or is it going to cause something more expensive like a client side rebuilding of all the VAO data and subsequent upload to the GPU?

Two alternative implementations that come to mind are:
- Store each node's data into separate VAOs, so it would be interesting if I could get some insight in the cost of swaping a VAO compared to swaping a VAO's buffer.

- Create a huge single IBO that is generated as the mesh is split in pieces and store in each node only the starting index and element count then use glMultiDrawElements to send all the calls at once. Note that I have not used glMultiDraw* in any real situation yet, so I may perhaps be misunderstanding its usage.


I'm leaning towards reimplementing everything with the last method as it would seem intuitively a better choice, but I would like to get some insight on the actual performance difference between these design patterns.

Yes, I understand that it will all depend on what else my application does, so I'm not asking for hard numbers. If you can share your experience with this I'd be glad to hear it!

Thank you!

hlewin
07-27-2013, 08:27 PM
Hmm - I used to think of VAOs more like a for-convenience thing than anything else, so I'd guess changing particular attributes should be no problem performance-wise.

Before bothering with implementing a more efficient way to emit the draw-commands I'd check twice if a vast number of gl-Calls is slowing down things. It reads like you'd be doing one rebind to the node index-buffer and then doing a draw per node. That is unlikely to hit performance if you don't have a huge number of nodes in my experience.

hlewin
07-27-2013, 08:49 PM
Reading the WIKI here (http://www.opengl.org/wiki/Vertex_Specification#Vertex_Array_Object) makes me wonder though. I didn't think that the GL_ELEMENT_ARRAY_BUFFER​ is part of a VAOs state to start with and being unable to bind a buffer to that target without a VAO is something that would be completely new to me although I've implemented and used those without problems...

Alfonse Reinheart
07-27-2013, 09:03 PM
The wiki, by and large, describes only the behavior of the core profile. If you're using the compatibility profile, you won't get that behavior. Or if you're using NVIDIA drivers from a year or two back.

As to the main thrust of the OP, the performance characteristics of VAOs are generally unknown. There have been, to my knowledge, no recent, detailed performance analyses of the use of VAOs against non-VAO rendering across multiple hardware vendors. Valve did some tests, saying that VAOs weren't helping them, but it's not clear what their specific rendering circumstances were. And of course, the (relatively) new ARB_vertex_attrib_binding extension could have effects on top of VAOs or without them.

At the end of the day, if performance really matters to you, you're going to have to profile it yourself.

hlewin
07-27-2013, 09:09 PM
This is something I'd find noteworthy. I'll have to spend a thought or two how this will effect the behaviour from the things I've implemented so far when switching profiles.

hlewin
07-27-2013, 10:01 PM
Quick-reading the VAO-spec (http://www.opengl.org/registry/specs/ARB/vertex_array_object.txt) I cannot find anywhere where element arrays are mentioned.

Alfonse Reinheart
07-27-2013, 11:21 PM
I don't see where it mentions that it stores glVertexPointer's data either; that doesn't mean it's not there.

The specific text in question is:


The resulting vertex array object is a new state vector, comprising all the state values listed in tables 6.6 (except for the CLIENT_ACTIVE_TEXTURE selector state), 6.7, and 6.8 (except for the ARRAY_BUFFER_BINDING state).

If you check the 2.1 specification and track down table 6.8, sure enough, you'll see ELEMENT_ARRAY_BUFFER_BINDING there.

hlewin
07-28-2013, 12:00 AM
From the VAO-spec linked above:

Queries for VERTEX_ARRAY_POINTER, NORMAL_ARRAY_POINTER, COLOR_ARRAY_POINTER, SECONDARY_COLOR_ARRAY_POINTER, INDEX_ARRAY_POINTER, TEXTURE_COORD_ARRAY_POINTER, FOG_COORD_ARRAY_POINTER, or EDGE_FLAG_ARRAY_POINTER return the value stored in the currently bound
vertex array object.
I'll have a look at the whole GL-spec though. Semantically I understood the wiki-paragraph about VAOs as to contain the array-settings enumerated above as well as the generic vertex-attributes which would seem fitting as these contain the data about how vertices are specified whereas element-indices refer to the vertices defined as a whole.

hlewin
07-28-2013, 12:28 AM
This really seems contradictory, but I couldn't spot that the element-index-pointer (dunno the GLenum out of my head) is part of the vao. It's likely that only the buffer-bindings for pointers contained in the vao get actually stored. The vao spec:

All state related to the definition of data used by the vertex processor is encapsulated in a vertex array object.
The sentence you cited is either confusing by not making clear that "state values" means "such state values" or it is strange that the vao should contain data that cannot be queried for. I didn't test this yet, but maybe I'll come back to this.

Alfonse Reinheart
07-28-2013, 01:03 AM
The sentence you cited is either confusing by not making clear that "state values" means "such state values"

I don't see how "such" would in any way affect the meaning of the sentence. Indeed, that would make the sentence less accurate. It's saying that all of the pieces of state in those tables, minus the specific pieces mentioned, are part of VAO state.

It's not saying "stuff like what's listed in the table." It's saying "these things right here are exactly and only what makes up the VAO's state vector."


or it is strange that the vao should contain data that cannot be queried for.

Table 6.8 is a state table. In that state table is both the enumerator name for the piece of state and the function used to query it.

The meaning is completely unambiguous.

The only reason they mentioned those others in the glGetPointerv query section is because they were already listed there in 2.1 as being queries for glGetPointerv. So they had to specifically update the language to say that they came from the object. Note that glGetIntegerv (the function you use to query buffer bindings) doesn't have an equivalent list because that list would be too huge.

hlewin
07-28-2013, 01:36 AM
"such" as in referring to the pieces of state that are related to the definition of data. I would not be sure that a buffer-binding would fall into that category for itself - only in conjunction with a pointer.

Alfonse Reinheart
07-28-2013, 01:50 AM
I would not be sure that a buffer-binding would fall into that category for itself

Even ignoring the fact that OpenGL doesn't consider any particular state to be different or special from any other particular state, the text did specifically except GL_ARRAY_BUFFER_BINDING from VAOs, and it specifically did not except GL_ELEMENT_ARRAY_BUFFER_BINDING from them. So if the ARB's intent was that they didn't want the element array buffer to be part of the VAO, they would have said so.

Give it up already. It's part of the VAO's state, and it's right there in black and white. And if you want more proof, GL 3.0 lays it out even more clearly, as the state tables 6.6-6.9 are named, "Vertex Array Object State", and GL_ARRAY_BUFFER_BINDING is explicitly moved to 6.10, named "Vertex Array Data (not in Vertex Array Objects)".

Any ambiguity is what you've read into it, not what's in the text. Looking at the revision history of ARB_VAO, you may have simply been remember an earlier, more ambiguous version of the text, as it seems to have changed a bit since the original release.

hlewin
07-28-2013, 02:21 AM
All state related to the definition of data used by the vertex processor is encapsulated in a vertex array object.
You know what data used by the vertex processor is?

Ed Daenar
07-28-2013, 06:00 AM
The wiki, by and large, describes only the behavior of the core profile. If you're using the compatibility profile, you won't get that behavior. Or if you're using NVIDIA drivers from a year or two back.

As to the main thrust of the OP, the performance characteristics of VAOs are generally unknown. There have been, to my knowledge, no recent, detailed performance analyses of the use of VAOs against non-VAO rendering across multiple hardware vendors. Valve did some tests, saying that VAOs weren't helping them, but it's not clear what their specific rendering circumstances were. And of course, the (relatively) new ARB_vertex_attrib_binding extension could have effects on top of VAOs or without them.

At the end of the day, if performance really matters to you, you're going to have to profile it yourself.

So if there's truly no good insight about VAO performance in general I suppose it's safe to assume that there's even less information on the effects of changing a VAO's state, performance wise.

My intention was to probe my options to see if I was unaware of some known performance issue with this approach (changing VAO state in a critical section of the rendering loop), but I guess I'll have to do my own testing to see how it behaves with my own real world problem. I'm expecting however that the implementation I'll be doing to compare to the current one (i.e., single VAO with single IBO and IBO start / count values stored on nodes and rendered with glMultiDraw*) is going to be better performing regardless of the effects of IBO switching, but I guess that will give me an idea of its impact, if ever I require to do this again.

Thanks for the feedback everyone.

Alfonse Reinheart
07-28-2013, 10:16 AM
You know what data used by the vertex processor is?

Why do you keep pointing to these irrelevant passages? I pointed to the one that matters: the one that lists every piece of state that the VAO contains. That's the one that matters because that's the one that actually lists every piece of state the VAO contains. It doesn't use wiggle language like "vertex processor" or other general terms. It explicitly lists that state which is encompassed by the object. The VAO contains the state used in 6.6, 6.7, and 6.8, minus the exceptions. Period.

Indeed, this is how every standard OpenGL object is defined in the spec. There's a "general description" of it, followed by a description of the glBind* command. And it is in the description of the glBind* command where you get the actual normative definition of the state vector: a link to the table(s) that specify exactly what state is in the object. This is consistently how objects are defined.

For example, sampler objects have this line:


Additionally, a sampler object may be created to encapsulate only the second category - the sampling state – of a texture object.

"the sampling state" is not a well-defined concept. Which is fine, because it is followed up on by this line:


When a sampler object is first used in one of these functions, the resulting sampler object is initialized with a new state vector, comprising all the state and with the same initial values listed in table 23.18.

This is the one that actually defines what state is used, because it actually lists the specific OpenGL state the object stores.

Also, there's no definition in the OpenGL 2.1 specification of "vertex processor". So that statement's normative value is pretty much nil. Indeed, that silly sentence has managed to survive through GL 4.4, and there's still no definition of "vertex processor". Indeed, it's the only place in either the core or compatibility spec that uses that term.

thokra
07-29-2013, 02:37 AM
the performance characteristics of VAOs are generally unknown.

And that's a darn shame. We really should come up with a performance test suite. If only it weren't for all that work ...


this is being implemented as a single VAO with the attached VBOs that contain the mesh information and an IBO with the whole index that would allow to draw the whole mesh

Since this is often misunderstood: the VBO isn't attached to the VAO, i.e. ARRAY_BUFFER_BINDING isn't VAO state! The only direct association of VAOs and buffer objects is through the ELEMENT_ARRAY_BUFFER_BINDING. The vertex arrays (i.e. the stuff you setup with glVertexAttribPointer), however, are associated with the array buffer binding that is currently set when glVertexAttribPointer is called. That's how you get the connection between VAOs and VBOs. The corresponding state variable is VERTEX_ATTRIB_ARRAY_BUFFER_BINDING. Although probably not that common, you can have multiple vertex arrays associated with different VBOs in a single VAO and switch between them by enabling and disabling the arrays.


I'm expecting [..]

With no substantial empirical data existing in regards to GL performance, you can't really expect anything. Your safe bet is: implement multiple algorithms that produce the same output and measure the performance. Then decide.