Map Texture Objects

We’ve currently got the capability to map buffer objects, which enables loading of data into buffers without needing to use any intermediate system memory arrays or otherwise do a memory copy. We don’t have that ability with textures - true, we can do something similar with PBOs but it involves more round-tripping.

Being able to map a texture object has the following uses:

[ul]
[li]The ability to implement “texture orphaning” for dynamic textures which must be fully replaced each frame, without needing a round-trip through a PBO.[/li][li]The ability to more directly get data into a texture which can provide advantages for both creation and updating use cases, and without needing any intermediate steps (this can facilitate any kind of texture streaming scheme).[/li][li]The ability to read data from a texture without the driver first having to do a copy back to system memory.[/li][/ul]

This suggestion is to overload the glMapBuffer, glMapBufferRange, glFlushMappedBufferRange and glUnmapBuffer calls to also accept a texture target as the target parameter. This target can be any that is valid for any of the glTexImage/glTexSubImage calls. Behaviour should otherwise be identical, and the texture must be unmapped before it can be used in a draw call.

Issues I can see include:

[ul]
[li]Is it only allowed to map mip level 0, or - if not - how to specify mapping a mip level other than 0? Suggest to perhaps use glMapBufferRange with an appropriate offset?[/li][li]glBindBuffer or glBindTexture?[/li][li]What happens if glTexImage or glTexSubImage is called while a texture is mapped?[/li][/ul]

Being able to map a texture object has the following uses:

OK, so… how do you define what you get when you map it?

It’s easy to say that you just “map a texture”, but textures are opaque to allow drivers to hide their particular implementations. That’s why pixel transfers are complicated, while buffer uploads are simple byte copies. It hides details like swizzling, the specific bit-pattern of formats, etc.

So now you want to directly expose the vagaries of the range of OpenGL hardware. There are three ways to go about it:

  1. Force the driver to use a single, specific standard across vastly different hardware. Say goodbye to cross-platform portability, let alone future-proofing.

  2. Extend the query API to tell you how to interpret the data for a particular format, thus allowing different hardware to expose its particular eccentricities to you. Of course, since most existing sources of streamed data (FFMpeg buffer writing, DirectShow, etc) will export to their own format, you have to use an intermediate buffer. They must write to some memory, you convert it to the hardware’s version in the mapped space. In short: no different from having those APIs write to a mapped PBO (assuming they can).

Also, I’m just guessing, but I’m fairly sure NVIDIA’s not going along with that. They’re seem really protective of their IP and implementation details.

  1. Split the difference and allow the user to tell OpenGL that a particular texture will need to adhere to a particular structure. That is, it should be mappable. Of course, now you have to make glTexStorageMappable functions for 3 separate types (1D, 2D, 3D). Either that, or you’re going to have to create a bunch of new image formats that force the texture to use a specific format. Or some kind of texture parameter or something.

Even assuming that there would be a particular structure that all hardware could support.

The ability to implement “texture orphaning” for dynamic textures which must be fully replaced each frame, without needing a round-trip through a PBO.

Doesn’t glInvalidateTexImage give us that already? You upload to it, use it, invalidate to it, and write again. It seems much easier than mapping a texture just to invalidate it.

The ability to read data from a texture without the driver first having to do a copy back to system memory.

I’m not sure how useful that ability is. For most GPU situations, that memory is across a (relatively) slow bus. You’d be better off issuing a DMA and then doing something else until it’s over, which we effective have with PBOs.

Also, there has never been a guarantee that mapping means you’re talking to GPU memory.

This suggestion is to overload the glMapBuffer, glMapBufferRange, glFlushMappedBufferRange and glUnmapBuffer calls to also accept a texture target as the target parameter.

Absolutely, not!

Ignoring the question of whether this is good and useful functionality, that is not the way to implement it. OpenGL has enough confusing and overloaded functions that do twenty different things to pointlessly add another 4 to that list.

glMapBufferRange means glMapBufferRange. We’re only recently getting to the point where we don’t have to call glVertexAttribPointer with an argument that we have to pretend is a pointer, and it really fetches some of its data from somewhere else. There’s no need to screw up that important progress just to avoid adding a few new API functions.

The ARB is not running out of functions. There is not some hard limit that they’re coming up on, such that OpenGL can’t have more functions. There’s absolutely no reason to overload a perfectly good API when you can just have glMapTexSubImage et. al.

If this is going to happen, then it gets its own API. Don’t screw up APIs that actually make sense just to shoehorn in new functionality. That road leads to stupidity like AMD_pinned_memory (good functionality, Godawful API).

Hmmm - as rants go I’d give it maybe a 2. It would have been a 6 or 7 but it blatantly contradicts other things you’ve ranted in favour of (or against, as appropriate) in the past.

Really? Because I don’t remember the time I argued that IHVs should put their texture format on display for everyone to see. Or when I said that OpenGL’s API needs to get worse and more confusing by senselessly overloading functions. Or when I said that mapping memory assured you of getting GPU access.

But whatever it takes to avoid addressing the substance of my argument, right? Because the best way to present your case is to dismiss any inconvenient facts.

Why do you always make things personal? I don’t seek you out; I barely take note of the fact that it’s mhagan presenting an idea. I’m only "rant"ing at your ideas because you post a lot of them and don’t put much thought into them. This idea is nothing more than, “wouldn’t it be wonderful if we could map textures?” There’s no consideration of the ramification of such a decision. No explanation for how this could work cross-hardware. Even your suggestion of API shows how little actual thought you put into it. The only bit of substance to this is the basic idea: map texture memory.

If you’re going to seriously present an idea beyond the basic concept of “let us do this somehow”, then put some effort into it. Show that you’re better than just throwing ideas against a wall and hoping that one sticks.

Ok then.

First of all, the ability to Map (or Lock in older version) a texture is something that has been in D3D for an eternity (in D3D11 both textures and buffers even use the very same API call). So far as the hardware vendors are concerned, this is a complete non-issue. There are no deep, dark, proprietary internal representations going on here; textures are just the same as buffers - a stream of bytes.

Now let’s get one thing real clear before continuing. This is not about adding functionality to GL that D3D also has. This is about adding functionality that may be generally useful, irrespective of whether D3D has it or not. D3D is not relevant beyond this point.

So point 1 is this: the argument that vendors may not want to put their internal texture formats on display is bogus.

Point 2 is this: even on hardware that may have it’s own funky internal representation, the whole point of OpenGL as an abstraction layer is to abstract that away from the programmer. This is something that already happens with e.g. a glTexSubImage call. Any hypothetical Map/Unmap of a texture can go through the very same driver code paths as are used for glTexSubImage to accomplish this. So even in such a case, this amounts to use of a problem that has already been solved as an argument against.

Point 2 also exposes way number 4 to go about it: if the internal representation is already appropriate for exposure to the programmer, then just expose it as it. Give the programmer a pointer and be done with it. This case could be satisfied e.g. where the internal representaion matches the internalFormat param used when creating the texture. If the internal representation is not appropriate, then likewise give the programmer a pointer, but add a conversion step that happens in the driver - either at Map time (for reading) or Unmap time (for writing). As I said - this is something that already happens with glGetTexImage/glTexSubImage, the driver already contains code to do it, so arguments against it won’t fly.

Now onto specifics.

glInvalidateTexImage? No; that just accomplishes one part of the requirement, which is to orphan the texture. It does absolutely nothing about the second part, which is to avoid round-tripping through PBOs or program-allocated system memory in order to perform the update. Mapping a texture solves that; instead of the round-trip and extra memory copies you write directly (or as directly as the driver allows).

Overloading the buffer calls. Yes, it’s ugly, yes it’s confusing, yes, a set of extra entry points would be better. And to head this one off at the pass - there is no need for separate entry points for 1D/2D/3D textures; follow the pattern established by glInvalidateTexSubImage instead - one entry point that works with all types.

Portability? You’re going to need to come up with some compelling reasons as to why it’s a problem for portability, rather than just waving the word around. No, endianness is not one; we already use floats, unsigned ints, ints, unsigned shorts and shorts in buffer objects; this is another problem that has already been solved; endianness as an argument against is also bogus.

Specific utility of this suggestion? I thought I’d made it clear but let’s restate it again. It’s explicitly not a case of “wouldn’t it be great if…”; it’s to avoid round-tripping through intermediate storage and/or intermediate objects when loading data into a texture. In other words, to serve the same utility as glMapBuffer/glMapBufferRange. No, PBOs don’t already provide this as there is still a requirement for the driver or GPU to copy data from the PBO to the texture. No, this wouldn’t invalidate the utility of PBOs as there are still cases where you may want an asynchronous transfer. The functionality suggested is to enable transfer of data in a similar manner to mapping a PBO, but without the additional intermediate step of transferrring from a PBO to the texture.

And moving on:

I’d dismissed the argument against as frivolous and vexatious because it read as an argument against being made purely for the purpose of arguing against rather than for constructively discussing pros and cons. No, I don’t see substance in it to be addressed; much of it is bogus and can be shown to be so.

Do I post a lot of ideas? No - there were precisely two others in this section of the forum.

As a side note, it would make texture streaming a lot less painful, a lot less. Also, for those GL implementations that have a unified memory architecture, this would be great. Various platforms have various hackage ways to “write directly to a texture” but they are icky and horribly non-portable (OMAP3/4 I am talking about you)

My take is that at texture creation (somehow :surprise:) a texture’s representation is specified so that mapping makes sense. One way for that somehow to happen is to have for each acceptable internal format value GL_FOO create the enumeration GL_FOO_MAPABLE, where MAPABLE means that when it is mapped the texture data is in a format specified in the spec (bytes/pixels, pixel format, line padding, etc). Even compressed textures would be ok. My only take where this can go icky is the GL_RGB8 is stored as GL_RGBA8 (and analogous GL_RGB8UI is stored as GL_RGBA8UI). I am not crazy about querying the GL implementation about this packing and padding since it will make the code that uses the mapable texture jazz hard to test reliably beyond “just try it on several different boxes and hope for the best”.

Texture streaming was the main use case I had in mind, yes. It would also apply for more general texture loading in console or other limited-memory scenarios where the extra overhead of PBOs (or system memory buffers) may be too onerous.

Agree, adding *_MAPPABLE internal formats seems the best way to go. The driver does need a way to distinguish between textures that would have this usage pattern and those that wouldn’t, and optimize accordingly. If *_MAPPABLE internal formats were added then the list of valid internal formats for this suggestion could be reduced (to exclude e.g. the wacky GL_RGB8-style formats) and a hypothetical spec could be more explicit about internal representations. That would also sidestep the need to query the implementation about padding/etc.

It would be important to not go down the D3D route of applying seeming-arbitrary restrictions, such as you can only map with discard (“orphan” behaviour), or you can’t map e.g. a texture array. In a fully general case it’s all just data managed by the driver, so there should not need to be any such distinction.

Alfonse is perfectly right. The internal swizzling/tiling used by the hardware is not something that you should forget about. This can change from vendor to vendor, and from GPU generation to GPU generation. Even if there would be any meaningful way to expose to the application this layout, the number of different layouts an application might have to handle would be impossible to tackle.

Also, considering that once the application knows the swizzling, the uploads to these swizzled structures would be non-trivial thus it would not even reach the best case scenario of a pure CPU memcpy. GPUs on the other hand have DMA engines or other ways to directly perform copies from linear to swizzled memory at full speed, without utilizing any CPU power, thus a memcpy to PBO plus a hardware upload is almost guaranteed to be faster despite the intermediate copy.

As Alfonse pointed it out, orphaning can be achieved with ARB_invalidate_subdata.

Finally, a note on kRogue’s “it would make texture streaming a lot less painful, a lot less” comment:
Believe me, if you would have to deal with 3 vendor’s 4 GPU generation’s different tiling mechanism individually then you would reconsider your statement.

The only potential approach is that if the user explicitly requests a mappable texture then the driver could potentially give him a texture storage with a linear layout, thus mapping would make sense (wouldn’t be much more different than a texture buffer, except for the addressing, or actually pretty similar to the APPLE_client_storage extension, which is kind of like AMD_pinned_memory but for textures). However, then you would have to pay the cost at rendering time as accessing data with a linear layout but with addressing coherency of e.g. a 2D texture would give you poor performance.

Exposing direct texture data mapping might make sense on a console or other fixed hardware as you only have a fixed number of swizzling modes that you have to handle and they don’t change. But not for a cross-platform API.

Finally, a note on kRogue’s “it would make texture streaming a lot less painful, a lot less” comment:
Believe me, if you would have to deal with 3 vendor’s 4 GPU generation’s different tiling mechanism individually then you would reconsider your statement.

I think that if mapping a texture was in the spec and if its format was specified (as mhagain) suggested the issues of dealing with different GPU’s idiosyncrasies would drop.

For what it is worth, OMAP3’s texture_stream extension (which you have to dig through TI’s website to find) is a horror interface to map texture memory for texture streaming… it uses ioctl on a finite number of /dev/foo … finite across the whole system, not per process or per context, finite across the whole system… so it just plain sucks. The platform needs/wants it for essentially presenting the stream coming from the camera with GL… one can make an argument that getting the bytes directly from the camera and copying them is… icky and that an extension should be used where there is some kind of texture stream to handle this, that is not the case (there the a texture stream in EGL land, so on paper it is possible)…

The extension you posted: http://www.opengl.org/registry/specs/APPLE/client_storage.txt is what everyone wants with texture streaming me thinks, and is ideal for unified memory arch gizmos.

For a unified memory arch, especially where texture from pixmap is supported, I’d bet that such GPU’s can use (at a potential performance loss) directly texture data that is non-swizzled, non-tiled. For texture streaming that performance loss is much less next to the bandwidth and CPU overhead of convert, etc… my thinking is that by saying the texture is mapable one is saying, dude I am likely to be streaming (be it reading or writing) and so the texture data is very dynamic and it’s data is not “made by GL”.

Though I must confess, that the Apple extension you posted for streaming is all that one really wants at the end of the day, though I am concerned that the spec does not spell out when changes to that client data are reflected in GL… and it is much more that one wants, since client memory is pageable/virtual etc, the Apple extension looks like it can be horror to implement and it is to some extent, overkill.

[QUOTE=aqnuep;1243328]Alfonse is perfectly right. The internal swizzling/tiling used by the hardware is not something that you should forget about. This can change from vendor to vendor, and from GPU generation to GPU generation. Even if there would be any meaningful way to expose to the application this layout, the number of different layouts an application might have to handle would be impossible to tackle.

Also, considering that once the application knows the swizzling, the uploads to these swizzled structures would be non-trivial thus it would not even reach the best case scenario of a pure CPU memcpy. GPUs on the other hand have DMA engines or other ways to directly perform copies from linear to swizzled memory at full speed, without utilizing any CPU power, thus a memcpy to PBO plus a hardware upload is almost guaranteed to be faster despite the intermediate copy.[/quote]

The point was made that even if such internal representations did exist, OpenGL already abstracts them away for glTex(Sub)Image calls; this is a solved problem.

The point was also made that D3D allows mapping of textures but yet doesn’t suffer from any of these hypothetical reasons-for-objection. Operating systems and drivers may differ, but the underlying hardware is still the same. It’s great to theorize about the way things might work internally in hardware, but such theories don’t really hold up in the face of a working example that refutes them.

The point was made that texture invalidation only satisfies one (small) part of this suggestion. It fails to meet the main part, which is to avoid the need for intermediate copies.

Tell me, how D3D allows mapping of swizzled textures in any meaningful way on Windows? I’ve never heard about such mechanism.

No, the underlying hardware is not the same. NVIDIA hardware works differently than AMD, and both work differently than Intel. Also, even a single vendor’s GPUs might change the mechanism from one generation to the other, as I already mentioned.

Show me that working example.

Ask Microsoft and the hardware vendors. The point remains that it does happen, and it happens without any the objections being raised affecting it. I’m willing to come back to that point as often as is necessary.

That is not what I meant. Of course different vendors have different hardware, and of course different hardware generations from the same vendor may be different, but that’s something that current mechanisms also have to deal with - so it’s not relevant to this particular item.

It’s also the case that for a given PC with a given generation of (say) NVIDIA hardware, it doesn’t matter what the OS is - that NVIDIA hardware is the same. Likewise for a given PC with a given generation of AMD hardware or a given PC with a given generation of Intel hardware.

It’s also the case that the purpose of a API abstraction is so that you as the developer do not have to worry about things like “NVIDIA hardware works differently than AMD, and both work differently than Intel. Also, even a single vendor’s GPUs might change the mechanism from one generation to the other”.

And it’s also the case that this is not a problem for D3D. No amount of objections can detract from the fact that here is an example where it is not a problem and where it works.

OK, here: IDirect3DTexture9::LockRect (d3d9helper.h) - Win32 apps | Microsoft Learn
And here: ID3D10Texture2D::Map (d3d10.h) - Win32 apps | Microsoft Learn
And here: ID3D11DeviceContext::Map (d3d11.h) - Win32 apps | Microsoft Learn

Again; this is something that is already being done, this is something that is already out there and working, there are no technical reasons whatsoever why it cannot be done for GL.

[QUOTE=mhagain;1243346]OK, here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb205913%28v=vs.85%29.aspx
And here: ID3D10Texture2D::Map (d3d10.h) - Win32 apps | Microsoft Learn
And here: Microsoft Learn: Build skills that open doors in your career

How do you know that this mapping mechanism doesn’t give you a pointer to a memory which just has a linearized copy of the texture data? Neither in OpenGL nor in D3D there is guarantee that when you map a memory area then you actually will directly write to that area. The driver might just allocate a new piece of memory, copy the texture data there (unless you asked DISCARD) and then when finished it just re-uploads it. Believe me, this will happen in most (if not all) cases because of the following reasons:

  1. If the texture is tiled/swizzled (which is true for almost all textures, except for buffer textures or compressed textures that have kind-of “raw” data in them) then you have to do a copy in order to allow linear access.
  2. If the texture is in memory not visible to the CPU (which is true for all, except a small range of video memory) then you have to do a copy in order to allow access at all.

So if you think about, the API might look different in case of D3D, but it is actually the D3D equivalent of pixel buffer objects, except that it does both-ways communication at once (which can even backfire at you as you may perform unnecessary reads if you don’t use DISCARD and NO_OVERWRITE properly).

OK, let’s break this down again.

There are currently two primary ways to update a texture in OpenGL, both using glTexSubImage2D but with or without a PBO bound.

(1) Without a PBO you write into system memory and call glTexSubImage2D; the driver copies the data off and transfers it to the texture at some arbitrary future point in time (which may be immediately but is before the texture is next used in a draw call; if the texture is not currently being used for a pending draw call the copy off can potentially be skipped and the driver can transfer immediately).

(2) With a PBO you write into the PBO and call glTexSubImage2D; the driver transfers from the PBO to the texture at some arbitrary future point in time (which may be immediately but is before the texture is next used in a draw call).

Under both of these ways, and so far as OpenGL is concerned, any hypothetical top-secret proprietary vendor-specific internal representation does not exist. I am not suggesting that change if textures were to be mappable, and I do not know why such a focus was put on it seeming (or being made seem) as if I were. The OpenGL API as it is exposed to the programmer has no business dealing with that kind of detail, and it should stay that way.

Let’s look at what making a texture mappable can offer to both ways.

For without a PBO the scenario should be easy and obvious. Instead of writing into your own system memory pointer you write into a pointer provided by the driver. This pointer may or may not be a direct pointer to the raw texture data, and - here’s the thing - it does not matter which it is. The driver manages that part of it for you. If it can give you a pointer directly to the texture data then that’s what you get. If it can’t then you get a pointer to the driver’s own internal backing storage. But either way, it’s internal driver behaviour and the mechanics of it are completely irrelevant to this suggestion. What is relevant to this suggestion is that instead of having to go “raw data -> your storage -> driver storage” you get to go “raw data -> driver storage”; i.e. you get to avail of the reason why glMapBuffer was provided for buffer objects; avoiding an extra memory copy.

This is emphatically not a replacement for glTexSubImage2D from system memory data. It is expected that there would still be cases where glTexSubImage2D is still the most appropriate code path to use, or where any potential performance advantage from avoiding the memory copy does not matter (texture loading - assuming use of glTexStorage - would be one such example). The intention is to provide an additional option that drivers may provide a more optimal path for, and that programs can take advantage of in cases where that additional performance is important and significant to them.

For with a PBO it’s less clear and I don’t believe that the suggestion has merit in this case. First of all you’re not updating a texture, you’re updating a PBO (the driver updates the texture from the PBO). Secondly there is a clear use case for PBOs which this suggestion doesn’t meet (and doesn’t pretend to meet) and that’s asynchronous pixel transfers.

By the way, “copy the texture data there (unless you asked DISCARD)” is untrue; this copy is also not needed if a texture were to be mapped with write-only access (which it is expected would be the normal case). In the worst case, all that the implementation needs is to allocate some scratch memory and give you a pointer to that; the implementation looks after everything else.

TexSubImage without PBO is actually raw data -> driver storage.
Also, when using PBOs you don’t have to first create a system memory stuff that you’ll copy to your PBO, you can directly use the PBO in the first place. I don’t know, however why people don’t do this in the first place.

That’s not true. If I map a buffer range for WRITE_ONLY, but not DISCARD/INVALIDATE, then the user might write only a single byte to the range or even worse, some disjoint sections inside the range thus when transferring back with your approach you would copy back junk data from places that the user didn’t write. Thus WRITE_ONLY does require a readback, as the driver cannot know what part of the mapped range will actually be written to and what part is left untouched. That’s why DISCARD/INVALIDATE was invented. Otherwise there wouldn’t be any point in having them in the first place.

Well, would you look at that, a new extension has just shown up in the registry:

GL_INTEL_map_texture

Allows you to map a texture on the GPU, however it looks like they skip over the complicated bit of handling tiled textures, and force you to a linear layout, which is a shame.

Regards
elFarto

Indeed, apparently, atleast one GPU maker (though it is Intel) saw that when having a unified memory architecture, then mapping a texture where the texture is stored linearly is a good idea to have.

Sounds like your dream care true: http://www.opengl.org/registry/specs/INTEL/map_texture.txt

Mapping textures is a feature that everybody has had for a long time. That’s how Mesa accesses texture storage, Mesa drivers must internally expose the map/unmap interface for textures. All Mesa hardware drivers from DX7 to DX11.1-level hardware fully support and implement the interface. It’s also the fastest codepath for uploading/streaming textures.

The only problem with OpenGL is that it doesn’t expose component ordering, i.e. you don’t know if a texture is internally stored as RGBA, BGRA, ABGR, or ARGB, RG or GR, etc. Also you don’t really know the bpp either, because if you ask for GL_RGBA16, the implementation is allowed to give you a GL_RGBA8 texture. And if you ask for GL_LUMINANCE8_ALPHA8 or GL_RGB8, you can get GL_RGBA8 as well. The GL map/unmap interface just needs a way to query this info, so it’s not a big deal.

I don’t need to map/unmap textures in OpenGL, because I don’t use OpenGL, I implement it. However if I used OpenGL, it’s something I would definitely want.

Why everybody seem to ignore the problem of tiling/swizzling? Yes, a software implementation doesn’t have to care about. Neither does an implementation that only allows mapping linear textures. But as mentioned before, sampling linear textures is way slower than sampling tiled textures so what you save at upload time you lose it multiple times, each time you actually use the texture.