glMapBuffer for textures

ratta · January 30, 2005, 9:46am

Hi,
could it be added something like glMapBuffer
for textures (or render/frame buffers)?
glMapBuffer can be used for vertex and pixel
buffer, textures are not buffers, but are they
really different from pixel buffers?
Or would it be difficult to implement such a thing
in existing hardware?
(i know i can create a pixel buffer, map it, and
copy data to the texture, but without the copy?)

Thanks

Korval · January 30, 2005, 10:03pm

textures are not buffers, but are they
really different from pixel buffers?
Yes. They are.

When you use a buffer object (by use, I mean upload data in some way, either via mapping or with BufferData), you and the driver have a certain understanding. That understanding is that the data you uploaded will not be changed in any way, shape, or form (unless you explicitly ask it to do so, as in PBO).

However, you have no assurances with textures. Drivers can, and virtually all of them will, move bits of data around as they see fit. Depending on the format of the data you uploaded, it may not be laid out as you expect. At the very least, you should assume that some swizzling is going to happen, if not format conversion.

Buffer objects store blocks of binary data to use in place of client-specified memory. As such, the driver can’t change what is actually stored there. Textures are not the same thing. They are driver-side memory, and the driver is allowed to pick a format and data ordering that is most optimal and that satisfies the user-provided hints.

To map textures in some meaningful, implementation independent, way is simply not reasonable. While D3D may let you do it, I’m pretty sure that it just does a copy internally. No matter what, you’re going to need some kind of copy operation.

It’s also the reason why render-to-vertex-array is probably going to be implemented by binding a buffer object as render target rather than being able to bind a texture as a vertex buffer. Buffer objects have client-specified behavior, while textures have driver-specified behavior.

zeckensack · February 1, 2005, 12:54am

What data format do you suppose are textures really in? Let’s say you have an RGB8 internal format. Is it BGR or RGB (or GRB/GBR/BRG/RBG)? With padding or without? If it’s padded, can you just clobber the padding when you write, or is it actually required to be some specific value? Or maybe you like this first draft of a channel-separated layout:

BBBB GGGG RRRR
BBBB GGGG RRRR
BBBB GGGG RRRR
BBBB GGGG RRRR

How many variations would you want to support, if you were given the opportunity to map textures?

Memory “locks” like every form of hardware level exposure are a terrible liability. Once you go there as an IHV you need to maintain hardware compatibility (aka stifled innovation) or otherwise – and that’s much worse – emulate old behaviour in software. Might in this case include copies, figuring out what was touched and what wasn’t, restoring clobbered padding, losing performance to the inevitable dozens of super-stupid apps that lock down 20 megs to touch a single byte and likewise to the semi-stupid apps that brilliantly map/memcpy/unmap instead of just using the rational alternatives.

To hell with all maps. There’s nothing worse to include in any serious API IMO.
(I still can’t see why VBO actually allows this …)

Korval · February 2, 2005, 8:06am

I still can’t see why VBO actually allows this …
Because it is a useful thing for buffer objects. Buffer objects store data exactly as you specify. As such, there’s no question of driver reformatting or whatever. Also, mapping is useful if you’re regenerating a buffer dynamically and you wish to avoid the copy inhierent in glBufferSubData.

zeckensack · February 2, 2005, 11:17am

Originally posted by Korval:
[quote]I still can’t see why VBO actually allows this …
Because it is a useful thing for buffer objects. Buffer objects store data exactly as you specify. As such, there’s no question of driver reformatting or whatever. Also, mapping is useful if you’re regenerating a buffer dynamically and you wish to avoid the copy inhierent in glBufferSubData.[/QUOTE]That’s where I lose you.
Why would a driver that can map non-system memory be required to create a copy on BufferSubData? That’s contradictory. If it can map, and that turns out to perform reasonable, it can surely use that trick for its BufferSubData implementation.

This is no argument for exposing the ability to map to clients of the driver.

system · February 2, 2005, 3:31pm

That’s where I lose you.
Why would a driver that can map non-system memory be required to create a copy on BufferSubData? That’s contradictory. If it can map, and that turns out to perform reasonable, it can surely use that trick for its BufferSubData implementation.

This is no argument for exposing the ability to map to clients of the driver.
I think you don’t understand what mapping means.
Let’s say you have an AGP system, and you have a VBO and you want to stream data to you VBO every frame redraw.
You call MapBuffer.
Driver allocates space in AGP.
You write to AGP.
You UnmapBuffer.
Driver can now choose to transfer to video memory.

With BufferSubData
You call new float
You write to this array
You call BufferSubData
Driver can transfer to AGP, then to video (I think this is optimal)
You call delete

Conclusion
In case 2, you wasted time with new and delete.
The driver had to copy from your array.

Korval · February 2, 2005, 6:23pm

Why would a driver that can map non-system memory be required to create a copy on BufferSubData? That’s contradictory. If it can map, and that turns out to perform reasonable, it can surely use that trick for its BufferSubData implementation.
I don’t think you understand what BufferSubData has to do. Here’s the situation.

I have a VBO that’s 1MB in size. Now, a VBO is, basically, just a char*, right? So, let’s pretend that the driver implements VBO’s purely in client memory, not AGP or video. So, the pointer is in your client space and all is good. The thing is, the driver owns this memory, even though it allocated it in your client memory space. But, of course, drivers do this all the time.

When you call glMapBuffer, all it returns is the char* that it allocated. You can freely treat this as an array of 1024*1024 bytes and do with it what you will. If I’m generating data, clearly the best thing to do is to generate it directly into the mapped buffer.

However, glBufferSubData is different. It takes a pointer to an already existing array of bytes. If I’m generating data, the best performance I can get is by generating my data into a 1MB block that I allocated myself, then calling glBufferSubData who copies that data into the VBO memory.

Clearly the mapped case is faster, as the BufferSubData case has to copy data out of my array and into the VBO.

Now, if the VBO were in AGP memory, nothing changes. Assuming that glMapBuffer works as one would expect (ie, returning a pointer to the VBO data in AGP memory), I can use it like before. Now, I have to be careful to generate my data sequentially and to never read from this pointer. But that’s all.

Nothing changes for the BufferSubData case either. It still requires an extra copy.

If the VBO were in video memory, and the driver can map video memory directly, then nothing changes. In the map case, I’m still generating data directly into the destination. The BufferSubData case still needs an extra copy that the map case doesn’t.

Now, here’s the thing. Let’s say that the driver can’t map video memory directly. This is the only bad case, as the card now must allocate a 1MB block of memory, download the VBO data from the card, and give it to you. However, many drivers (with the proper VBO hints) cache such data in main memory, to eliminate the allocation and download steps. At which point, mapping is no slower than BufferSubData, as both require copying.

Indeed, mapping is probably faster, since the driver memory is probably uncached and properly aligned for DMA purposes, whereas client-allocated memory is not. Which means that the driver may need to do a second copy of the buffer when calling BufferSubData.

So, yes, mapping is better. When you’re generating data. But if you’re not actually generating the data, if you already have it in an array (from the disc, for example), you may as well use BufferSubData and let the driver do the optimized copy.

zeckensack · February 3, 2005, 5:47am

All clear.
What I didn’t understand was this:

Originally posted by Korval:
the copy inhierent in glBufferSubData.
I thought it referred to driver-side implications.

edited:
There’s no big problem with replacing a whole buffer object. The gripe I have with maps is that it invites applications to do all sorts of silly things based on the assumption that a map is always a true and direct map, and doesn’t need to move data around or mangle it. E.g. having a map/unmap API suggests it would be entirely fine and efficient to
a)map a 1MB buffer object, replace the first 4 bytes and unmap it
b)do sparse overwrites. I.e. only replace the color elements given the following vertex structure:

struct Vertex
{
   float position[3];
   float normal[3];
   float texcoord0[2];
   uint color;
};

If this is efficient usage, then I wonder why in every official example of map usage, directly preceding a write map there appears a BufferData(NULL) call.

And I wonder what was going on when I experimented with VBO mappings on NVIDIA hardware. Write bandwidth to the buffer clearly exceeded my AGP bandwidth, so either it was system memory or AGP memory. However, the buffer couldn’t have lived in AGP memory because the hardware could consume the buffer at more than my AGP bandwidth. Thus my conclusion was that the map was emulated, and the buffer was copied back and forth to satisfy map semantics. BufferData(NULL) eliminates the need to copy it down from graphics memory, so there’s my explanation for that, too.

An emulated map, unlike a theoretical perfect map, doesn’t have any of the performance benefits you’re pointing out.

Korval · February 3, 2005, 7:26am

The spec can’t tell you what is “efficient usage”. It can only give you and the driver developers an API and the required behavior of it. Unfortunately, there’s no call to ask the driver if mapping is real or emulated, so there’s ultimately no way to know for sure.

To be honest, though, if you asked for stream draw, the driver should expect that you’re going to frequently call BufferSubData or MapBuffer, so it should (at the least) have a copy in system memory if it can’t do real mapping. The fact that nVidia’s implementation likes to play nonsensical games with BufferData(NULL) is just because they aren’t writing a good implementation.

Maps can be emulated, and if they are, then they are a fancy glBufferSubData call (which is still likely to be a bit faster, as this is still driver-allocated memory, which could be DMA’d faster /easier than standard malloc’d memory). Ultimately, glMapBuffer shouldn’t be slower than BufferSubData, unless you gave the driver the wrong hint. At which point, you should accept the GIGO principle: Garbage in, Garbage out.