Custom Vertex Attributes

I am trying to implement batch rendering, where I pass a large number of triangle vertexes and corresponding object data (like texture control flags, extra variables, etc.). Right now, each sprite has just over 164 bytes of data. I know that OpenGL has at least 16 vertex attributes, with each element having the size of 4 floats. A lot of batch rendering tutorials use them, but they never try to pass that much data for each object.

I was thinking about separating all 164 bytes of data into chunks of 12 bytes, and loading them all into 16 vertex attributes. However, I’ve readabout the attributes reserved by OpenGL for various purposes, which means I can’t use them.

Technically, of the 164 bytes, 20 bytes are used for vertex/uv coordinates, while the rest are per object (so they remain the same for the same object). I want to do batch rendering for 10 000 objects, and that requires 1.44 megabytes of non-vertex/uv coordinate data.

I would like to know what other options I have, that allow passing that much data for batch rendering (with decent performance that is better than enormous uniform arrays).

This is a common misunderstanding; most of the time the restriction does not apply and you are perfectly free to use them.

Where the restriction does apply is if you are attempting to use a generic attribute and it’s aliased fixed attribute in the same shader. So you cannot, for example, use glClientActiveTexture(GL_TEXTURE0)/glTexCoordPointer(…) and glVertexAttribPointer(8, …) together.

However, if you only use generic attributes you are completely free to use all of them and there is no problem. Note that this will always be the case if you’re using a core profile.

Likewise, if you never use glClientActiveTexture(GL_TEXTURE0)/glTexCoordPointer(…) you’re completely free to use glVertexAttribPointer(8, …) without any issues.

In which case, you might be better off adding a single integer attribute for “object ID” and using that to index into uniform arrays (or textures if you would exceed the limits on uniforms). Or you might not (dependent fetches have a cost); you’d need to benchmark it to be sure.

I was thinking about using Uniform Buffers, since they can support up to 64Kb of data each, and I can have a few dozen of those.

Would Uniform Buffers be significantly slower than vertex attributes, especially when it comes to that much data?

Technically, of the 164 bytes, 20 bytes are used for vertex/uv coordinates, while the rest are per object (so they remain the same for the same object).

Then what you really want is to have some form of index for each vertex, which you use to fetch the per-object data from a buffer. Each vertex in the same object will have to have a copy of that index, but two-bytes-per-vertex is better than 20 bytes.

I was thinking about using Uniform Buffers, since they can support up to 64Kb of data each

No, they cannot. Even in OpenGL 4.6, the minimum value for GL_MAX_UNIFORM_BLOCK_SIZE is 16KB. Granted, most implementations of 4.0+ offer at least 64KB. But 16KB is the only value you can be certain of.

What you want is best done with an SSBO, where the maximum size is measured in megabytes (the minimum required is 16MB, and most implementations offer limits that are “most of available GPU memory”). That way, you won’t have to worry about whether adding more sprites will blow past some limit.

As Alfonse says, you aren’t guaranteed more than 16K per UBO. Buffer textures can have at least 64K texels (so up to 1MB for e.g. GL_RGBA32UI), but are constrained to the available texture formats. SSBOs can be much larger but require OpenGL 4.3.

You’ll have to measure it. Array accesses and texture lookups have a cost, but so does memory consumption. Which of the two wins out depends upon the specifics of the program, the hardware, screen resolution, etc.

64k is supported by HD 4000 with latest drivers, so that’s why I picked that. I don’t really consider anything less than HD 4000.

What you want is best done with an SSBO, where the maximum size is measured in megabytes (the minimum required is 16MB, and most implementations offer limits that are “most of available GPU memory”). That way, you won’t have to worry about whether adding more sprites will blow past some limit.

How much slower (or faster) is SSBO compared to Uniform Buffers when it comes to uploading data. Since I have to load like ~150 bytes for each of the 10k+ objects, I can either do it for a single SSBO, or multiple 64k long Uniform Buffers (one for each per object variable).

Since single SSBO can fit everything I need, I’m guessing it will be quicker to load since I don’t have to bind/unbind when I load different variables (as opposed to binding/unbinding multiple Uniform Buffers when I switch between data I’m trying to load).
However, on OpenGL’s side, is there any difference between SSBO and Uniform Buffer when it comes to performance, lets say, when using an SSBO and a Uniform Buffer of the same size?

How much slower (or faster) is SSBO compared to Uniform Buffers when it comes to uploading data.

Buffer objects are not typed. I swear, someday I’m going to put that in my forum signature.

The performance of transferring data to a buffer object has nothing to do with how it is used.

I can either do it for a single SSBO, or multiple 64k long Uniform Buffers (one for each per object variable).

I don’t see why using a buffer for uniform data would in any way necessitate multiple distinct objects. You can bind a range of a buffer for use as uniform data. You would then change which range of the buffer to use based on which portion of the sprite batch you’re rendering.

Maybe. Uniforms are guaranteed to be read-only, SSBOs are read-write. The implementation can determine that a given shader never writes to a SSBO, but I wouldn’t assume that it will always take full advantage of that.

If you want to know for sure, measure it.

Buffer objects are not typed. I swear, someday I’m going to put that in my forum signature.

The performance of transferring data to a buffer object has nothing to do with how it is used.

Okay thanks, I will keep that in mind.

Kind of a silly question, but can I do something like:

glBindVertexArray(VAO);
glBindBuffer(GL_ARRAY_BUFFER,VBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER,SSBO);

...

LOOP FOR EACH OBJECT:
glBufferSubData(GL_ARRAY_BUFFER​, sizeof(arrayBufferStuff)*objectIndex​, sizeof(arrayBufferStuff)​, &arrayBufferStuff);
glBufferSubData(GL_SHADER_STORAGE_BUFFER, sizeof(ssBufferStuff)*objectIndex​, sizeof(ssBufferStuff)*objectIndex​, &ssBufferStuff);

… or should I buffer the data as C++ arrays first, before writing sending them to GPU in one go? …

glBindVertexArray(VAO);
glBindBuffer(GL_ARRAY_BUFFER,VBO);
glBufferData(GL_ARRAY_BUFFER, sizeof(arrayBufferStuff)*graphics2DMaximumObjects, &arrayBuffer, GL_STATIC_DRAW);

glBindBuffer(GL_SHADER_STORAGE_BUFFER,SSBO);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(ssBufferStuff)*graphics2DMaximumObjects, &ssBuffer, GL_STATIC_DRAW);

I am especially sketched out by …

glBindBuffer(GL_ARRAY_BUFFER,VBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER,SSBO);

… since I don’t know whether calling glBindBuffer twice in a row will overwrite binding points of two different types.

[QUOTE=CaptainSnugglebottom;1289405]

LOOP FOR EACH OBJECT:
glBufferSubData(GL_ARRAY_BUFFER​, sizeof(arrayBufferStuff)*objectIndex​, sizeof(arrayBufferStuff)​, &arrayBufferStuff);
glBufferSubData(GL_SHADER_STORAGE_BUFFER, sizeof(ssBufferStuff)*objectIndex​, sizeof(ssBufferStuff)*objectIndex​, &ssBufferStuff);

… or should I buffer the data as C++ arrays first, before writing sending them to GPU in one go?[/QUOTE]

the first suggestion uses OBJECT_COUNT x glBufferSubData, the latter just 1 x glBufferSubData function call(s), therefore put the data first into a c++ vector before uploading it. and have a look at how you have to pad the shader storage buffer data (std430): https://www.khronos.org/registry/OpenGL/specs/gl/glspec45.core.pdf#page=159

have a look at the DSA version glNamedBufferSubData(…) too, or glMapNamedBuffer(…), or a persistent mapped buffer with glMapNamedBufferRange(…)