PDA

View Full Version : Map Texture Objects



mhagain
10-09-2012, 02:47 PM
We've currently got the capability to map buffer objects, which enables loading of data into buffers without needing to use any intermediate system memory arrays or otherwise do a memory copy. We don't have that ability with textures - true, we can do something similar with PBOs but it involves more round-tripping.

Being able to map a texture object has the following uses:



The ability to implement "texture orphaning" for dynamic textures which must be fully replaced each frame, without needing a round-trip through a PBO.
The ability to more directly get data into a texture which can provide advantages for both creation and updating use cases, and without needing any intermediate steps (this can facilitate any kind of texture streaming scheme).
The ability to read data from a texture without the driver first having to do a copy back to system memory.


This suggestion is to overload the glMapBuffer, glMapBufferRange, glFlushMappedBufferRange and glUnmapBuffer calls to also accept a texture target as the target parameter. This target can be any that is valid for any of the glTexImage/glTexSubImage calls. Behaviour should otherwise be identical, and the texture must be unmapped before it can be used in a draw call.

Issues I can see include:



Is it only allowed to map mip level 0, or - if not - how to specify mapping a mip level other than 0? Suggest to perhaps use glMapBufferRange with an appropriate offset?
glBindBuffer or glBindTexture?
What happens if glTexImage or glTexSubImage is called while a texture is mapped?

Alfonse Reinheart
10-09-2012, 04:47 PM
Being able to map a texture object has the following uses:

OK, so... how do you define what you get when you map it?

It's easy to say that you just "map a texture", but textures are opaque to allow drivers to hide their particular implementations. That's why pixel transfers are complicated, while buffer uploads are simple byte copies. It hides details like swizzling, the specific bit-pattern of formats, etc.

So now you want to directly expose the vagaries of the range of OpenGL hardware. There are three ways to go about it:

1. Force the driver to use a single, specific standard across vastly different hardware. Say goodbye to cross-platform portability, let alone future-proofing.

2. Extend the query API to tell you how to interpret the data for a particular format, thus allowing different hardware to expose its particular eccentricities to you. Of course, since most existing sources of streamed data (FFMpeg buffer writing, DirectShow, etc) will export to their own format, you have to use an intermediate buffer. They must write to some memory, you convert it to the hardware's version in the mapped space. In short: no different from having those APIs write to a mapped PBO (assuming they can).

Also, I'm just guessing, but I'm fairly sure NVIDIA's not going along with that. They're seem really protective of their IP and implementation details.

3. Split the difference and allow the user to tell OpenGL that a particular texture will need to adhere to a particular structure. That is, it should be mappable. Of course, now you have to make glTexStorageMappable functions for 3 separate types (1D, 2D, 3D). Either that, or you're going to have to create a bunch of new image formats that force the texture to use a specific format. Or some kind of texture parameter or something.

Even assuming that there would be a particular structure that all hardware could support.


The ability to implement "texture orphaning" for dynamic textures which must be fully replaced each frame, without needing a round-trip through a PBO.

Doesn't glInvalidateTexImage (http://www.opengl.org/wiki/GLAPI/glInvalidateTexSubImage) give us that already? You upload to it, use it, invalidate to it, and write again. It seems much easier than mapping a texture just to invalidate it.


The ability to read data from a texture without the driver first having to do a copy back to system memory.

I'm not sure how useful that ability is. For most GPU situations, that memory is across a (relatively) slow bus. You'd be better off issuing a DMA and then doing something else until it's over, which we effective have with PBOs.

Also, there has never been a guarantee that mapping means you're talking to GPU memory.


This suggestion is to overload the glMapBuffer, glMapBufferRange, glFlushMappedBufferRange and glUnmapBuffer calls to also accept a texture target as the target parameter.

Absolutely, not!

Ignoring the question of whether this is good and useful functionality, that is not the way to implement it. OpenGL has enough confusing and overloaded functions that do twenty different things to pointlessly add another 4 to that list.

glMapBufferRange means glMapBufferRange. We're only recently getting to the point where we don't have to call glVertexAttribPointer with an argument that we have to pretend is a pointer, and it really fetches some of its data from somewhere else. There's no need to screw up that important progress just to avoid adding a few new API functions.

The ARB is not running out of functions. There is not some hard limit that they're coming up on, such that OpenGL can't have more functions. There's absolutely no reason to overload a perfectly good API when you can just have glMapTexSubImage et. al.

If this is going to happen, then it gets its own API. Don't screw up APIs that actually make sense just to shoehorn in new functionality. That road leads to stupidity like AMD_pinned_memory (good functionality, Godawful API).

mhagain
10-09-2012, 07:06 PM
Hmmm - as rants go I'd give it maybe a 2. It would have been a 6 or 7 but it blatantly contradicts other things you've ranted in favour of (or against, as appropriate) in the past.

Alfonse Reinheart
10-09-2012, 11:36 PM
Really? Because I don't remember the time I argued that IHVs should put their texture format on display for everyone to see. Or when I said that OpenGL's API needs to get worse and more confusing by senselessly overloading functions. Or when I said that mapping memory assured you of getting GPU access.

But whatever it takes to avoid addressing the substance of my argument, right? Because the best way to present your case is to dismiss any inconvenient facts.

Why do you always make things personal? I don't seek you out; I barely take note of the fact that it's mhagan presenting an idea. I'm only "rant"ing at your ideas because you post a lot of them and don't put much thought into them. This idea is nothing more than, "wouldn't it be wonderful if we could map textures?" There's no consideration of the ramification of such a decision. No explanation for how this could work cross-hardware. Even your suggestion of API shows how little actual thought you put into it. The only bit of substance to this is the basic idea: map texture memory.

If you're going to seriously present an idea beyond the basic concept of "let us do this somehow", then put some effort into it. Show that you're better than just throwing ideas against a wall and hoping that one sticks.

mhagain
10-10-2012, 06:00 AM
Ok then.

First of all, the ability to Map (or Lock in older version) a texture is something that has been in D3D for an eternity (in D3D11 both textures and buffers even use the very same API call). So far as the hardware vendors are concerned, this is a complete non-issue. There are no deep, dark, proprietary internal representations going on here; textures are just the same as buffers - a stream of bytes.

Now let's get one thing real clear before continuing. This is not about adding functionality to GL that D3D also has. This is about adding functionality that may be generally useful, irrespective of whether D3D has it or not. D3D is not relevant beyond this point.

So point 1 is this: the argument that vendors may not want to put their internal texture formats on display is bogus.

Point 2 is this: even on hardware that may have it's own funky internal representation, the whole point of OpenGL as an abstraction layer is to abstract that away from the programmer. This is something that already happens with e.g. a glTexSubImage call. Any hypothetical Map/Unmap of a texture can go through the very same driver code paths as are used for glTexSubImage to accomplish this. So even in such a case, this amounts to use of a problem that has already been solved as an argument against.

Point 2 also exposes way number 4 to go about it: if the internal representation is already appropriate for exposure to the programmer, then just expose it as it. Give the programmer a pointer and be done with it. This case could be satisfied e.g. where the internal representaion matches the internalFormat param used when creating the texture. If the internal representation is not appropriate, then likewise give the programmer a pointer, but add a conversion step that happens in the driver - either at Map time (for reading) or Unmap time (for writing). As I said - this is something that already happens with glGetTexImage/glTexSubImage, the driver already contains code to do it, so arguments against it won't fly.

Now onto specifics.

glInvalidateTexImage? No; that just accomplishes one part of the requirement, which is to orphan the texture. It does absolutely nothing about the second part, which is to avoid round-tripping through PBOs or program-allocated system memory in order to perform the update. Mapping a texture solves that; instead of the round-trip and extra memory copies you write directly (or as directly as the driver allows).

Overloading the buffer calls. Yes, it's ugly, yes it's confusing, yes, a set of extra entry points would be better. And to head this one off at the pass - there is no need for separate entry points for 1D/2D/3D textures; follow the pattern established by glInvalidateTexSubImage instead - one entry point that works with all types.

Portability? You're going to need to come up with some compelling reasons as to why it's a problem for portability, rather than just waving the word around. No, endianness is not one; we already use floats, unsigned ints, ints, unsigned shorts and shorts in buffer objects; this is another problem that has already been solved; endianness as an argument against is also bogus.

Specific utility of this suggestion? I thought I'd made it clear but let's restate it again. It's explicitly not a case of "wouldn't it be great if..."; it's to avoid round-tripping through intermediate storage and/or intermediate objects when loading data into a texture. In other words, to serve the same utility as glMapBuffer/glMapBufferRange. No, PBOs don't already provide this as there is still a requirement for the driver or GPU to copy data from the PBO to the texture. No, this wouldn't invalidate the utility of PBOs as there are still cases where you may want an asynchronous transfer. The functionality suggested is to enable transfer of data in a similar manner to mapping a PBO, but without the additional intermediate step of transferrring from a PBO to the texture.

And moving on:

I'd dismissed the argument against as frivolous and vexatious because it read as an argument against being made purely for the purpose of arguing against rather than for constructively discussing pros and cons. No, I don't see substance in it to be addressed; much of it is bogus and can be shown to be so.

Do I post a lot of ideas? No - there were precisely two others in this section of the forum.

kRogue
10-10-2012, 10:49 AM
As a side note, it would make texture streaming a lot less painful, a lot less. Also, for those GL implementations that have a unified memory architecture, this would be great. Various platforms have various hackage ways to "write directly to a texture" but they are icky and horribly non-portable (OMAP3/4 I am talking about you)

My take is that at texture creation (somehow :surprise:) a texture's representation is specified so that mapping makes sense. One way for that somehow to happen is to have for each acceptable internal format value GL_FOO create the enumeration GL_FOO_MAPABLE, where MAPABLE means that when it is mapped the texture data is in a format specified in the spec (bytes/pixels, pixel format, line padding, etc). Even compressed textures would be ok. My only take where this can go icky is the GL_RGB8 is stored as GL_RGBA8 (and analogous GL_RGB8UI is stored as GL_RGBA8UI). I am not crazy about querying the GL implementation about this packing and padding since it will make the code that uses the mapable texture jazz hard to test reliably beyond "just try it on several different boxes and hope for the best".

mhagain
10-10-2012, 12:02 PM
Texture streaming was the main use case I had in mind, yes. It would also apply for more general texture loading in console or other limited-memory scenarios where the extra overhead of PBOs (or system memory buffers) may be too onerous.

Agree, adding *_MAPPABLE internal formats seems the best way to go. The driver does need a way to distinguish between textures that would have this usage pattern and those that wouldn't, and optimize accordingly. If *_MAPPABLE internal formats were added then the list of valid internal formats for this suggestion could be reduced (to exclude e.g. the wacky GL_RGB8-style formats) and a hypothetical spec could be more explicit about internal representations. That would also sidestep the need to query the implementation about padding/etc.

It would be important to not go down the D3D route of applying seeming-arbitrary restrictions, such as you can only map with discard ("orphan" behaviour), or you can't map e.g. a texture array. In a fully general case it's all just data managed by the driver, so there should not need to be any such distinction.

aqnuep
10-10-2012, 12:20 PM
Alfonse is perfectly right. The internal swizzling/tiling used by the hardware is not something that you should forget about. This can change from vendor to vendor, and from GPU generation to GPU generation. Even if there would be any meaningful way to expose to the application this layout, the number of different layouts an application might have to handle would be impossible to tackle.

Also, considering that once the application knows the swizzling, the uploads to these swizzled structures would be non-trivial thus it would not even reach the best case scenario of a pure CPU memcpy. GPUs on the other hand have DMA engines or other ways to directly perform copies from linear to swizzled memory at full speed, without utilizing any CPU power, thus a memcpy to PBO plus a hardware upload is almost guaranteed to be faster despite the intermediate copy.

As Alfonse pointed it out, orphaning can be achieved with ARB_invalidate_subdata.

Finally, a note on kRogue's "it would make texture streaming a lot less painful, a lot less" comment:
Believe me, if you would have to deal with 3 vendor's 4 GPU generation's different tiling mechanism individually then you would reconsider your statement.

The only potential approach is that if the user explicitly requests a mappable texture then the driver could potentially give him a texture storage with a linear layout, thus mapping would make sense (wouldn't be much more different than a texture buffer, except for the addressing, or actually pretty similar to the APPLE_client_storage extension (http://www.opengl.org/registry/specs/APPLE/client_storage.txt), which is kind of like AMD_pinned_memory but for textures). However, then you would have to pay the cost at rendering time as accessing data with a linear layout but with addressing coherency of e.g. a 2D texture would give you poor performance.

Exposing direct texture data mapping might make sense on a console or other fixed hardware as you only have a fixed number of swizzling modes that you have to handle and they don't change. But not for a cross-platform API.

kRogue
10-10-2012, 12:43 PM
Finally, a note on kRogue's "it would make texture streaming a lot less painful, a lot less" comment:
Believe me, if you would have to deal with 3 vendor's 4 GPU generation's different tiling mechanism individually then you would reconsider your statement.


I think that if mapping a texture was in the spec and if its format was specified (as mhagain) suggested the issues of dealing with different GPU's idiosyncrasies would drop.

For what it is worth, OMAP3's texture_stream extension (which you have to dig through TI's website to find) is a horror interface to map texture memory for texture streaming.. it uses ioctl on a finite number of /dev/foo .. finite across the whole system, not per process or per context, finite across the whole system.. so it just plain sucks. The platform needs/wants it for essentially presenting the stream coming from the camera with GL.. one can make an argument that getting the bytes directly from the camera and copying them is.. icky and that an extension should be used where there is some kind of texture stream to handle this, that is not the case (there the a texture stream in EGL land, so on paper it is possible)..

The extension you posted: http://www.opengl.org/registry/specs/APPLE/client_storage.txt is what everyone wants with texture streaming me thinks, and is ideal for unified memory arch gizmos.

For a unified memory arch, especially where texture from pixmap is supported, I'd bet that such GPU's can use (at a potential performance loss) directly texture data that is non-swizzled, non-tiled. For texture streaming that performance loss is much less next to the bandwidth and CPU overhead of convert, etc... my thinking is that by saying the texture is mapable one is saying, dude I am likely to be streaming (be it reading or writing) and so the texture data is very dynamic and it's data is not "made by GL".

Though I must confess, that the Apple extension you posted for streaming is all that one really wants at the end of the day, though I am concerned that the spec does not spell out when changes to that client data are reflected in GL.. and it is much more that one wants, since client memory is pageable/virtual etc, the Apple extension looks like it can be horror to implement and it is to some extent, overkill.

mhagain
10-11-2012, 05:15 AM
Alfonse is perfectly right. The internal swizzling/tiling used by the hardware is not something that you should forget about. This can change from vendor to vendor, and from GPU generation to GPU generation. Even if there would be any meaningful way to expose to the application this layout, the number of different layouts an application might have to handle would be impossible to tackle.

Also, considering that once the application knows the swizzling, the uploads to these swizzled structures would be non-trivial thus it would not even reach the best case scenario of a pure CPU memcpy. GPUs on the other hand have DMA engines or other ways to directly perform copies from linear to swizzled memory at full speed, without utilizing any CPU power, thus a memcpy to PBO plus a hardware upload is almost guaranteed to be faster despite the intermediate copy.

The point was made that even if such internal representations did exist, OpenGL already abstracts them away for glTex(Sub)Image calls; this is a solved problem.

The point was also made that D3D allows mapping of textures but yet doesn't suffer from any of these hypothetical reasons-for-objection. Operating systems and drivers may differ, but the underlying hardware is still the same. It's great to theorize about the way things might work internally in hardware, but such theories don't really hold up in the face of a working example that refutes them.


As Alfonse pointed it out, orphaning can be achieved with ARB_invalidate_subdata.

The point was made that texture invalidation only satisfies one (small) part of this suggestion. It fails to meet the main part, which is to avoid the need for intermediate copies.

aqnuep
10-11-2012, 07:53 AM
The point was also made that D3D allows mapping of textures but yet doesn't suffer from any of these hypothetical reasons-for-objection.
Tell me, how D3D allows mapping of swizzled textures in any meaningful way on Windows? I've never heard about such mechanism.


Operating systems and drivers may differ, but the underlying hardware is still the same.
No, the underlying hardware is not the same. NVIDIA hardware works differently than AMD, and both work differently than Intel. Also, even a single vendor's GPUs might change the mechanism from one generation to the other, as I already mentioned.


It's great to theorize about the way things might work internally in hardware, but such theories don't really hold up in the face of a working example that refutes them.
Show me that working example.

mhagain
10-11-2012, 09:00 AM
Tell me, how D3D allows mapping of swizzled textures in any meaningful way on Windows? I've never heard about such mechanism.

Ask Microsoft and the hardware vendors. The point remains that it does happen, and it happens without any the objections being raised affecting it. I'm willing to come back to that point as often as is necessary.


No, the underlying hardware is not the same. NVIDIA hardware works differently than AMD, and both work differently than Intel. Also, even a single vendor's GPUs might change the mechanism from one generation to the other, as I already mentioned.

That is not what I meant. Of course different vendors have different hardware, and of course different hardware generations from the same vendor may be different, but that's something that current mechanisms also have to deal with - so it's not relevant to this particular item.

It's also the case that for a given PC with a given generation of (say) NVIDIA hardware, it doesn't matter what the OS is - that NVIDIA hardware is the same. Likewise for a given PC with a given generation of AMD hardware or a given PC with a given generation of Intel hardware.

It's also the case that the purpose of a API abstraction is so that you as the developer do not have to worry about things like "NVIDIA hardware works differently than AMD, and both work differently than Intel. Also, even a single vendor's GPUs might change the mechanism from one generation to the other".

And it's also the case that this is not a problem for D3D. No amount of objections can detract from the fact that here is an example where it is not a problem and where it works.


Show me that working example.

OK, here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb205913%28v=vs.85%29.aspx
And here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb173869%28v=vs.85%29.aspx
And here: http://msdn.microsoft.com/en-us/library/windows/desktop/ff476457%28v=vs.85%29.aspx

Again; this is something that is already being done, this is something that is already out there and working, there are no technical reasons whatsoever why it cannot be done for GL.

aqnuep
10-11-2012, 01:33 PM
OK, here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb205913%28v=vs.85%29.aspx
And here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb173869%28v=vs.85%29.aspx
And here: http://msdn.microsoft.com/en-us/library/windows/desktop/ff476457%28v=vs.85%29.aspx

How do you know that this mapping mechanism doesn't give you a pointer to a memory which just has a linearized copy of the texture data? Neither in OpenGL nor in D3D there is guarantee that when you map a memory area then you actually will directly write to that area. The driver might just allocate a new piece of memory, copy the texture data there (unless you asked DISCARD) and then when finished it just re-uploads it. Believe me, this will happen in most (if not all) cases because of the following reasons:

1. If the texture is tiled/swizzled (which is true for almost all textures, except for buffer textures or compressed textures that have kind-of "raw" data in them) then you have to do a copy in order to allow linear access.
2. If the texture is in memory not visible to the CPU (which is true for all, except a small range of video memory) then you have to do a copy in order to allow access at all.

So if you think about, the API might look different in case of D3D, but it is actually the D3D equivalent of pixel buffer objects, except that it does both-ways communication at once (which can even backfire at you as you may perform unnecessary reads if you don't use DISCARD and NO_OVERWRITE properly).

mhagain
10-13-2012, 10:23 AM
OK, let's break this down again.

There are currently two primary ways to update a texture in OpenGL, both using glTexSubImage2D but with or without a PBO bound.

(1) Without a PBO you write into system memory and call glTexSubImage2D; the driver copies the data off and transfers it to the texture at some arbitrary future point in time (which may be immediately but is before the texture is next used in a draw call; if the texture is not currently being used for a pending draw call the copy off can potentially be skipped and the driver can transfer immediately).

(2) With a PBO you write into the PBO and call glTexSubImage2D; the driver transfers from the PBO to the texture at some arbitrary future point in time (which may be immediately but is before the texture is next used in a draw call).

Under both of these ways, and so far as OpenGL is concerned, any hypothetical top-secret proprietary vendor-specific internal representation does not exist. I am not suggesting that change if textures were to be mappable, and I do not know why such a focus was put on it seeming (or being made seem) as if I were. The OpenGL API as it is exposed to the programmer has no business dealing with that kind of detail, and it should stay that way.

Let's look at what making a texture mappable can offer to both ways.

For without a PBO the scenario should be easy and obvious. Instead of writing into your own system memory pointer you write into a pointer provided by the driver. This pointer may or may not be a direct pointer to the raw texture data, and - here's the thing - it does not matter which it is. The driver manages that part of it for you. If it can give you a pointer directly to the texture data then that's what you get. If it can't then you get a pointer to the driver's own internal backing storage. But either way, it's internal driver behaviour and the mechanics of it are completely irrelevant to this suggestion. What is relevant to this suggestion is that instead of having to go "raw data -> your storage -> driver storage" you get to go "raw data -> driver storage"; i.e. you get to avail of the reason why glMapBuffer was provided for buffer objects; avoiding an extra memory copy.

This is emphatically not a replacement for glTexSubImage2D from system memory data. It is expected that there would still be cases where glTexSubImage2D is still the most appropriate code path to use, or where any potential performance advantage from avoiding the memory copy does not matter (texture loading - assuming use of glTexStorage - would be one such example). The intention is to provide an additional option that drivers may provide a more optimal path for, and that programs can take advantage of in cases where that additional performance is important and significant to them.

For with a PBO it's less clear and I don't believe that the suggestion has merit in this case. First of all you're not updating a texture, you're updating a PBO (the driver updates the texture from the PBO). Secondly there is a clear use case for PBOs which this suggestion doesn't meet (and doesn't pretend to meet) and that's asynchronous pixel transfers.

By the way, "copy the texture data there (unless you asked DISCARD)" is untrue; this copy is also not needed if a texture were to be mapped with write-only access (which it is expected would be the normal case). In the worst case, all that the implementation needs is to allocate some scratch memory and give you a pointer to that; the implementation looks after everything else.

aqnuep
10-15-2012, 01:57 PM
What is relevant to this suggestion is that instead of having to go "raw data -> your storage -> driver storage" you get to go "raw data -> driver storage"; i.e. you get to avail of the reason why glMapBuffer was provided for buffer objects; avoiding an extra memory copy.

TexSubImage without PBO is actually raw data -> driver storage.
Also, when using PBOs you don't have to first create a system memory stuff that you'll copy to your PBO, you can directly use the PBO in the first place. I don't know, however why people don't do this in the first place.


By the way, "copy the texture data there (unless you asked DISCARD)" is untrue; this copy is also not needed if a texture were to be mapped with write-only access (which it is expected would be the normal case). In the worst case, all that the implementation needs is to allocate some scratch memory and give you a pointer to that; the implementation looks after everything else.
That's not true. If I map a buffer range for WRITE_ONLY, but not DISCARD/INVALIDATE, then the user might write only a single byte to the range or even worse, some disjoint sections inside the range thus when transferring back with your approach you would copy back junk data from places that the user didn't write. Thus WRITE_ONLY does require a readback, as the driver cannot know what part of the mapped range will actually be written to and what part is left untouched. That's why DISCARD/INVALIDATE was invented. Otherwise there wouldn't be any point in having them in the first place.

elFarto
10-16-2012, 02:46 AM
Well, would you look at that, a new extension has just shown up in the registry:

GL_INTEL_map_texture (http://www.opengl.org/registry/specs/INTEL/map_texture.txt)

Allows you to map a texture on the GPU, however it looks like they skip over the complicated bit of handling tiled textures, and force you to a linear layout, which is a shame.

Regards
elFarto

kRogue
10-16-2012, 04:05 PM
Indeed, apparently, atleast one GPU maker (though it is Intel) saw that when having a unified memory architecture, then mapping a texture where the texture is stored linearly is a good idea to have.

Groovounet
10-23-2012, 02:42 AM
Sounds like your dream care true: http://www.opengl.org/registry/specs/INTEL/map_texture.txt

Eosie
10-25-2012, 06:27 AM
Mapping textures is a feature that everybody has had for a long time. That's how Mesa accesses texture storage, Mesa drivers must internally expose the map/unmap interface for textures. All Mesa hardware drivers from DX7 to DX11.1-level hardware fully support and implement the interface. It's also the fastest codepath for uploading/streaming textures.

The only problem with OpenGL is that it doesn't expose component ordering, i.e. you don't know if a texture is internally stored as RGBA, BGRA, ABGR, or ARGB, RG or GR, etc. Also you don't really know the bpp either, because if you ask for GL_RGBA16, the implementation is allowed to give you a GL_RGBA8 texture. And if you ask for GL_LUMINANCE8_ALPHA8 or GL_RGB8, you can get GL_RGBA8 as well. The GL map/unmap interface just needs a way to query this info, so it's not a big deal.

I don't need to map/unmap textures in OpenGL, because I don't use OpenGL, I implement it. However if I used OpenGL, it's something I would definitely want.

aqnuep
10-25-2012, 08:12 AM
The only problem with OpenGL is that it doesn't expose component ordering...
Why everybody seem to ignore the problem of tiling/swizzling? Yes, a software implementation doesn't have to care about. Neither does an implementation that only allows mapping linear textures. But as mentioned before, sampling linear textures is way slower than sampling tiled textures so what you save at upload time you lose it multiple times, each time you actually use the texture.

kRogue
10-25-2012, 12:26 PM
If you are streaming a texture, the cost of of the format change to make the texture swizzled/tiled is going to be quite big next to the performance loss of the texture not being tiled/swizzled. However is a texture is static (or it's contents are generated by the GPU and the GPU has the hardware bits to tile/swizzle it during render) then one wants it tiled/swizzled.

So... really back to exactly that which was stated in the beginning: provide an additional set of internalFormat enums that say "I want the texture linearly stored so I can map and stream it".

aqnuep
10-25-2012, 02:29 PM
How often you use a texture only on a single surface even when you update the texture every frame? Almost never, except the obvious case of video streaming. In that case, I agree it might make sense, but even then, it is questionable whether it would improve overall performance (I believe file loading and decoding will be still more expensive than doing an upload from a PBO).

But for all other cases, it is unlikely to even be as fast as PBOs + tiled textures, and as I understood most comments were talking about the general case, e.g. when you only do typical texture streaming as required by a "loading-free" renderer that displays huge worlds.

Eosie
10-26-2012, 01:01 PM
Why everybody seem to ignore the problem of tiling/swizzling? Yes, a software implementation doesn't have to care about. Neither does an implementation that only allows mapping linear textures. But as mentioned before, sampling linear textures is way slower than sampling tiled textures so what you save at upload time you lose it multiple times, each time you actually use the texture.

Tiling is not an issue. There are ways to make a tiled texture appear as linear to the CPU. After all, textures have to mapped in glTexImage2D anyway (except for some ancient Intel GPUs, which have more options), so exposing the map/unmap interface for textures doesn't really add anything new. Swizzling as in ARB_texture_swizzle isn't an issue either, because it has nothing to do with how the texture is stored in memory. What you probably thought is that the internal format might have components in a different order or it can be a completely different format (other than the user requested). Such information can be exposed by adding new glGet queries.

Eosie
10-26-2012, 01:26 PM
If you are streaming a texture, the cost of of the format change to make the texture swizzled/tiled is going to be quite big next to the performance loss of the texture not being tiled/swizzled. However is a texture is static (or it's contents are generated by the GPU and the GPU has the hardware bits to tile/swizzle it during render) then one wants it tiled/swizzled.
I don't consider tiling and swizzling an issue at all. There would be no swizzling on the driver side anyway. The user would have to store the image in the native component ordering and in the actual format being used by the hardware.


So... really back to exactly that which was stated in the beginning: provide an additional set of internalFormat enums that say "I want the texture linearly stored so I can map and stream it".
What we need is usage flags for textures and one of them would be "I wanna upload, draw once, upload, draw once...". Drivers would decide how to implement that (some would use a linear texture, others may take a different approach).

mhagain
10-27-2012, 11:00 AM
Why everybody seem to ignore the problem of tiling/swizzling? Yes, a software implementation doesn't have to care about. Neither does an implementation that only allows mapping linear textures. But as mentioned before, sampling linear textures is way slower than sampling tiled textures so what you save at upload time you lose it multiple times, each time you actually use the texture.

Because the problem is irrelevant.
It's already been solved.
By glTexImage and glTexSubImage.
In the early 1990s.

One may well ask - why is there a fixation on raising this as an objection? Any vendor-specific internal representation is no business of OpenGL's; OpenGL does not and should not specify anything in that regard. But yet it was jumped on in the very second post.

Think about this and it becomes really easy.

Map a texture for reading and what happens? The pipeline needs to stall, flush, and the driver can pull back the texture data and give you a pointer. What happens during that "driver pull back" stage is no business of the OpenGL specification and irrelevant to this suggestion. The driver can convert it from a tiled/swizzled format to linear or it can just suck it back from a linear internal representation if that is how the driver decided to store the texture. It can even give a direct pointer if that is what the driver decides is appropriate. It does not matter. It's completely irrelevant. It's internal driver behaviour.

Map a texture for writing and what happens? The driver just hands you a pointer. It does not matter what internal representation the texture used, you're not going near that, you're not going to read from the internal representation, this is a map-for-writing, you're just writing to a pointer and you can assume that for the purposes of your program it's linear. The pointer may be to the actual texture memory, it may be to a scratch memory region, it does not matter; that's for the driver to decide. Unmap and what happens? The driver takes that data you wrote to that pointer and - if it gave you a scratch memory pointer - writes it back. Using the exact same code path that has been used by glTexImage and glTexSubImage since time immemorial.

That last point is key. The driver gets to decide when to do the write back. It can decide "OK, the texture is not currently being used for drawing, I can safely write back now without needing to incur a pipeline stall". Or it can decide "not OK, the texture is currently being used for drawing, I'm going to keep this memory hanging around until it's no longer used and write back then". Or it can even decide "I gave the programmer a pointer to an internal linear representation so I don't even need to do a write back". But that's internal driver behaviour.

So that's why the "problem" is being ignored - because it's about as relevant a problem as fears of asphyxiation on fast-moving trains.

Alfonse Reinheart
10-27-2012, 11:27 AM
Using the exact same code path that has been used by glTexImage and glTexSubImage since time immemorial.

Um, you are aware that ignoring the problem of swizzling/etc makes mapping, in virtually all cases (since virtually all textures are swizzled), no better than just using a Pixel Buffer Object with the implementation-preferred pixel transfer parameters (which, thanks to internalformat_query2, we can now ask for).

You're basically saying that you want a feature that might give you performance, but you can't rely on it in any real-world circumstances. Plus, without the ability to explicitly ask for unswizzled/etc textures, you can't do anything to improve your chances of actually mapping the texture (rather than just a lame PBO).

Yes, there are times when mapping a buffer object means that you don't actually get GPU memory. But you can do things to improve your chances, like double-buffering or using GL_INVALIDATE_RANGE_BIT or GL_UNSYNCHRONIZED_BIT or whatever. None of these make guarantees, but they do help; not using these techniques leads to degraded performance.

What you're suggesting would, in virtually all reasonable scenarios, never give you an actual mapped pointer. And there is nothing you can do to affect that in any way whatsoever.

In short, if "mapping a texture" can't give you a reasonable shot at getting a pointer to honest-to-God GPU memory, what's the point? How is it any better than using a PBO?

It should be noted that the only actual OpenGL extension to provide this functionality (http://www.opengl.org/registry/specs/INTEL/map_texture.txt) does in fact have a parameter for asking for linear textures, which enforces a specific order. And it even forbids mapping at all if you don't use it.

Or to put it another way, actual IHVs, people's who job it is to make hardware go fast (theoretically at least; they are Intel;) ), who considered the problem decided that the swizzle issue was important to making mapping useful.

aqnuep
10-29-2012, 03:04 PM
Tiling is not an issue. There are ways to make a tiled texture appear as linear to the CPU.
Tell me about it, because I'm not aware of any. Tiling/swizzling means hardware implementation dependent reodering of texels for better texture cache coherency.


After all, textures have to mapped in glTexImage2D anyway (except for some ancient Intel GPUs, which have more options), so exposing the map/unmap interface for textures doesn't really add anything new.
No, they don't. They don't even have to be visible to the CPU. Memcpy-ing texture data to CPU visible video memory is not a common practice for a good reason.


Swizzling as in ARB_texture_swizzle isn't an issue either, because it has nothing to do with how the texture is stored in memory.
Of course not, but that's not what I was talking about. Component swizzle has nothing to do with tiling/texel swizzle that was discussed about in this topic.

Also, as Alfonse stated, if mapping a tiled texture would give you some arbitrary chunk of memory that the driver will upload to tiled video memory eventually is not going to help you at all compared to PBOs. In fact, I would go as far to state that the fact the same trick is allowed by the spec in case of buffer mapping is already something that just hurts.