buffer_storage

OK, so GL_ARB_texture_storage gave us immutable texture specification. You bind the texture, call glTexStorage* on it, and the storage is immutable.

Buffer objects should be able to have something similar. So there would be a glBufferStorage function that allocates X bytes for the buffer, but also marks it as immutable. So you cannot call glBufferData on it ever.

The main problem this avoids is someone trying to use glBufferData every frame to invalidate it, but maybe gets the hint or size parameters wrong. That’s an improper invalidation of the buffer, and it results effectively in creating a new buffer object with theoretically new characteristics. So instead, you invalidate it with glMapBufferRange, and this extension would introduce a specific API for invalidating buffers: glBufferInvalidate.

The latter might only be usable on immutable buffers. But that’s debatable.

However, this should also do some other things. Since it’s an either/or process (either you use glBufferStorage or glBufferData, just like with glTexStorage), we can take the opportunity to correct some mistakes with buffer objects. Namely: hints.

Hints with glBufferData are optional, in the sense that you can more or less ignore them. If you use GL_STATIC_DRAW, you can call glBufferData as many times as you want.

With immutable storage however, hints should be requirements. I don’t suggest keeping the previous hint enums around, as they’re confusing to users and are too fuzzy (where is the dividing line between STREAM and DYNAMIC?). Whatever the new “hint” system is, it should absolutely affect the behavior of the buffer object.

If there is an equivalent to “DRAW”, for example, then it should be enforced. DRAW means that you will write but not read. So it should be an INVALID_OPERATION to use glGetBufferSubData or glMapBufferRange for reading. Same goes for “READ” and “COPY”.

I don’t really have a fully developed suggestion for a new hint system, but whatever it is should specifically have a semantic component.

After thinking about it more, the “hints” should be behaviors. The possible options are as follows:

1: No touching. You create the storage with glBufferStorage and that’s it. You may not call glBufferSubData, glMapBufferRange, or glGetBufferSubData. You can call glBufferInvalidate. This allows you to read into the buffer from OpenGL (say, via transform feedback) after an invalidate, but even that may not be a good idea.

2: Write exactly once. After calling glBufferStorage, you get exactly one glBufferSubData or glMapBufferRange call. After that, you get errors. Also, no invalidation. Obviously, glGetBufferSubData is out, as is read mapping.

3: Write only after invalidate. As write exactly once, except that you can keep doing it so long as you invalidate between each write. So you can call glBufferStorage, glBufferSubData, then glBufferInvalidate, then glBufferSubData, then glMapBufferRange with GL_INVALIDATE_BUFFER_BIT, etc. Every write goes to invalidated memory.

Note: I’m not particularly happy with this behavior, as it prevents you from mapping multiple times and writing to different locations. Perhaps allowances can be made for GL_INVALIDATE_RANGE_BIT.

4: Writeable. You can pretty much do whatever, whenever, so long as you are writing to the buffer. So no read mapping and no glGetBufferSubData calls.

5: Read only after invalidate. Basically, you compute some data on the GPU, read it back on the CPU, then invalidate the buffer, then do it over again. Your read accesses to the buffer must be bracketed by invalidates, as for “Write only after invalidate”. The same notation applies.

I don’t see much point in a “Read only once.”

6: Readable: You can read pretty much willy-nilly. Just no writing at all.

7: Wild-west. Basically what we have now. Whatever, whenever, read, right, arithmetic, whatever it is you want to do, you can do it.

Sounds very interesting. I have only two problems with it:

  1. I would never restrict MapBufferRange calls to a single invocation, rather just limit the usable map flags because, as you said, mapping several subranges would make some headaches this way. Also the amount of validations would be pretty high. I would rather say we should introduce one more “undefined behavior” (yes, I know we hate those, but they are pretty useful to provide high performance in situations where there is no clear behavior across hardware vendors as an example). So if one writes a single range multiple times through buffer mapping in case of e.g. a “Write only after invalidate” buffer then the results would be undefined. This way we can force developers not to use it this way, I believe.

  2. In case of the “No touching” behavior I think you mean something like STREAM_COPY? I don’t totally understand what you meant there with the transform feedback example, but if you cannot even write to the buffer through transform feedback and/or image stores then what is the buffer good for? I’m pretty sure I’ve misunderstood something here but if I did maybe others will as well, so please can you clarify it?

Also the amount of validations would be pretty high.

Perhaps, but they would only be done on read/write operations, which are already fairly heavyweight.

This way we can force developers not to use it this way, I believe.

The problem with marking it as undefined is that it will likely work. If the implementation does not take active steps to detect this behavior and stop it, then the code will continue to function.

In case of the “No touching” behavior I think you mean something like STREAM_COPY?

More just COPY in general. These behaviors only specify the pattern of CPU-based reads/writes to buffers. Reads and writes from OpenGL commands would always work.

The only difference between STATIC_COPY and STREAM_COPY under my system is whether you ever invalidate the buffer or not. And I don’t think it’s worth it to introduce a new behavior just for that; the OpenGL implementation can detect if you invalidate a buffer.

Okay, thanks for the clarification and, again, interesting proposal.

I understand what you are trying to propose, but I don’t understand what problem it solves?

Regards
elFarto

First, it adds immutable buffer objects in the same style ARB_texture_storage introduced immutable texture objects, and second, it gives more control over creating buffer objects for particular use case scenarios.

Mutable textures have the very real problem that they can be mipmap/cubemap incomplete. Buffers don’t have that problem.

I understand what you are trying to propose, but I don’t understand what problem it solves?

One problem with buffer objects is the quantity of “hidden knowledge” about how to use them properly to achieve performance. Oh, it’s easy if all you’re doing is static data. But streamed data is very hard to get right, especially across platforms. Rob Barris had a couple of posts that were finally able to point us in a good direction surrounding buffer object streaming. But that’s not exactly part of the API, is it?

What inspired me to make this proposal was a question on StackOverflow from a new user about buffer objects. Every frame, he was generating new data and using glBufferData to upload it. The problem is that he would increase or decrease the size of the data uploaded. And therefore change the size of the buffer object. Which effectively means that the driver was not just invalidating memory, but reallocating it.

A good API should be easy to use correctly and hard to use incorrectly. Invalidation via glBufferData is hard to get right; it requires knowing that you should keep the size and usage hints the same. If you aren’t on this forum and aren’t an experience OpenGL programmer, then you do not know this. Indeed, odds are that you have no idea what buffer invalidation is if you aren’t.

The other inspiration were a couple of facts about usage hints on NVIDIA and AMD. Did you know that AMD ignores usage hints entirely? They don’t mean anything for them. And did you know that if you gave GL_DYNAMIC_DRAW to NVIDIA implementations, you killed upload performance? If one of the two major OpenGL implementations basically ignores the hint system, then there is a problem.

If you write primarily on AMD hardware, then you may be convinced that a certain hint is giving you the correct performance. That’s going to be a rude awakening when you slide over to NVIDIA and find that your DYNAMIC hint basically killed your performance.

Now yes, performance is certainly going to be different cross-platform. Even with behavior-based usage. One platform might make “write anytime” perform as well as others. But at least there are guidelines about what the proper ways to use the buffer are.

I think the GL_AMD_pinned_memory extension is a nice approach for uploading use-once data. (The way it is uses a new buffer target to create pinned memory objects may be debatable.)

You just point the driver to the data in your memory space and promise that you don’t modify it until the driver has finished using it, which you can check via a sync object.

I think an updated buffer object should provide something similar, as it is IMHO much easier to use than GL_ARB_map_buffer_range.

I think the GL_AMD_pinned_memory extension is a nice approach for uploading use-once data.

But that’s not what the extension does. It doesn’t upload anything. It’s purpose is to allow client memory to be used more or less like a buffer object. It’s real purpose is to get around the fact that glTexImage* functions have to be finished with client memory when they return.

I wouldn’t be adverse to a glBufferClient function, which like glBufferStorage would create an immutable buffer object with certain properties. But to be honest, I’d much rather have a special set of glTexSubImage* calls or something else than to overload buffer object functionality for this purpose.

Well, afaik AMD hasn’t published the extension spec yet, so it’s all a bit of speculation, but though their presentation showed the extension on PBOs, there was nothing that indicates that is wouldn’t work the same way for any other type of BO.

Ok, but I’m not sure how replacing one method that has a hint the driver (mostly) ignores, with another method that has a hint the driver could possibly ignore, improves things.

Also, do you have any more information on the difference between ‘invalidating memory’ and ‘reallocating it’. Reading through Rob’s suggestion, it seems invalidating the buffer means the driver will orphan the currently allocated memory and allocate another slab. I can’t see how changing the size between calls is going to impact that.

edit Is glFlushMappedBufferRange meant to return a GLsync object…?

Regards
elFarto

Ok, but I’m not sure how replacing one method that has a hint the driver (mostly) ignores, with another method that has a hint the driver could possibly ignore, improves things.

The driver could ignore it, in the sense that it could choose not to do something more optimal in the restricted cases. But the primary difference is simple.

AMD’s drivers ignore the hints, but they do not ignore usage patterns. Essentially, they pick optimal memory layouts and allocations based on how you actually use the buffer, not on how you say you’re going to. If the API forces you to use the buffer a certain way, then it’s really no different from AMD’s perspective. There’s just a nice API there to guarantee that you don’t go back on your word. They don’t have to look at what that word is when picking out where your memory goes.

Also, there’s the fact that the API is self-documenting. You use the most restrictive form of buffer object that you can live with. You don’t have to guess at what magical access pattern driver developers will be optimizing for, since the access patterns are enforced by the API.

I can’t see how changing the size between calls is going to impact that.

It all depends on how it gets implemented. If a driver sees you doing lots of buffer invalidation, then it can basically allocate two pieces of memory the same size as the requested buffer and just swap back and forth between them. Or if you do a lot of invalidations, 3 pieces of memory. Or whatever.

It’s all about giving implementations more information up-front about how you’re going to use the buffer, and then forcing the user to live within those restrictions.

My only thoughts on this are that we already have a fair number of buffer object hints (in creation and binding) that are completely implementation defined in their effects, as Dark Photon[I think] once called it the “buffer object Ouija board”, adding more flags, enumeration and such although in theory could help, I would suspect it might just make that Ouija board bigger.

The point is that they’re not hints. A hint is not something you have to abide by. That’s why they fail (besides being poorly specified): because you can set them to one thing and then do something else.

If you use the “write exactly once” behavior, you get exactly one chance to write to the buffer; all subsequent attempts will fail with a GL_INVALID_OPERATION error. The point of behaviors is that they would be enforced.

The only guesswork you might have to do is which behavior to use in which implementations. And even then, the behaviors are designed to be functional subsets of one another. So it would take a pretty poorly written implementation to make “write anytime” faster than “write once then invalidate”. They could be the same speed, but there’s no reason for the more restricted one to be slower.

And even in the unlikely event that each behavior had completely unknown performance characteristics on implementations relative to one another, the access patterns are well specified. And finite. Therefore, the only possible “Ouija board” issue would be if “write anytime” allowed access patterns not specified by behaviors that were somehow faster than one of the more restrictive ones. And that seems very unlikely; the list of behaviors is always open to debate if someone has an access pattern that would legitimately be faster.

as arekkusu said, textures have great problem because of the mipmaps.

imagine this situation:
the app creates a new texture,
then it specifies all even mips with some internal format (e.g. rgba8)
then it specifies all odd mips with different internal format (e.g. rgb5)

at this point the texture is incomplete but if the app does one of these 2 things: re-specify the odd mips with rgba8 or re-specify the even mips with rgb5, then the texture should become complete.

of course this would be quite idiotic behaviour by the app, but still it is completely valid and permitted by the api spec and so the implementation should handle it correctly.

thus the implementation should always save any image in any format you specify for any mip, even if there are no 2 mips with the same format.

imagine what nightmare is this for the implementation.

also these saved images should be placed in system memory - no point of uploading them to the video memory for at this point the texture is unusable anyway and we dont know what will be the final format when/if it eventually becomes complete.
so each texture must also have a backing system memory copy at least until it becomes complete, which can be avoided with the new immutable property.

this is the main reason for the new immutable texture thing.
the buffers don’t have mips and this bad problem does not exist for them.

the buffers don’t have mips and this bad problem does not exist for them.

I know. But buffers have other problems, which I addressed. I don’t see how what you’re talking about is relevant.

you say the main problem this would solve is to avoid using BufferData just for invalidation because the user may mess up the params.
i am very much against api designs that try to prevent the programmer from making mistakes, so for me this reason is invalid. but i wont go into a debate about it.

another problem you mention are the usage hints. i agree here, mandatory (not hint-only) usage flags would be good to have.
but i dont see what this has to do with immutability - why a buffer has to be immutable in order to have mandatory usage flags?

i am very much against api designs that try to prevent the programmer from making mistakes, so for me this reason is invalid.

OK, let’s play this game.

Why does glTexStorage make the texture immutable? Preventing the programmer from making mistakes is the pretty much the entire point of making glTexStorage textures immutable.

After all:

That is a mistake. That creates a texture which cannot be used. It is an incomplete texture. But the driver still has to accept it; just because the texture is incomplete now doesn’t mean that it won’t be complete in the future. The driver is required to live with this half-state.

Aside from making the texture immutable, glTexStorage does nothing that you could not do before in your own code. You could create all of your textures by calling glTexImage* on them in a loop, using NULL as the data, then upload with glTexSubImage* as needed. You could set the GL_TEXTURE_BASE/MAX_LEVEL parameters correctly, and thus ensure that your textures are mipmap complete. You could use sized internal formats instead of unsized ones. And you could make them “immutable” by never touching any of this stuff again.

So why does this extension exist, if it does not allow you to do anything that you couldn’t before? What is it about glTexStorage that opens up optimization potential?

Consider this except from the the issues section of ARB_texture_storage:

Immutability is valued by the ARB. By the driver developers. Why?

Because an API that allows the programmer to make mistakes also is an API that requires that the implementation accept those mistakes. It has to deal with it. It has to handle erroneous badness with a smile.

“api designs that try to prevent the programmer from making mistakes” are also API designs that make it easier for implementers to optimize. Because the user cannot put the system in an invalid state, there is no question about what the valid states are. And therefore, the driver doesn’t have to worry about those cases where weirdness is happening. Everything is simple, regular, and expected. And once the driver developer knows how the user is going to go about their business, they can get to their business of making things fast.

I see no reason why that logic should not apply to buffer objects too. The ability to call glBufferData on an object multiple times with different sizes and hints, all without changing the object name, goes against the last part of the ARB_texture_storage quote. The part about the eventual migration to bind-time validation.

Making buffer objects immutable, but allowing for invalidation via a specialized function call, means that you can retain bind-time validation, since you know the size of it even after it is invalidated. A glBindBufferRange that was in-range before validation remains in-range after. While the GPU pointers may have changed, the range didn’t.

Yes, it prevents mistakes, but it was allowing those mistakes to begin with that is a big reason for the “buffer object Ouija board”, as Dark Photon would say. Preventing mistakes is good for everyone.

but i dont see what this has to do with immutability - why a buffer has to be immutable in order to have mandatory usage flags?

Because it makes it impossible to change the usage flags later. If you can only call glBufferStorage one time, then the implementation knows that this object name will be forever associated with a block of memory of known size and will always have a particular usage pattern. The driver doesn’t have to be written to deal with changing this usage pattern in the future.

Again, the same argument for glTexStorage’s immutability.