PDA

View Full Version : glGetTexSubImage



JoshKlint
08-08-2013, 12:38 PM
I have a need for a glGetTexSubImage command. I won't explain how it should work, because it's obvious.

We perform terrain editing on the GPU using shaders:
http://www.youtube.com/watch?v=t1OOxpO-bZA

Since the heightmap modification occurs on the GPU, we have to retrieve that data back to system memory, for physics and raycasting. Modifying a small section of terrain requires that we retrieve the entire heightmap, when we really only need a subsection of it. This creates a noticeable delay that is unnecessary.

malexander
08-08-2013, 01:02 PM
Did you try using an FBO to read the texture data? ie, Attach the texture to an FBO, bind it as the read-FBO, then call glReadPixels() on the area you want to retrieve. It's not a single API call, but it can be bundled up into a convenient function.

Alfonse Reinheart
08-08-2013, 06:25 PM
Did you try using an FBO to read the texture data? ie, Attach the texture to an FBO, bind it as the read-FBO, then call glReadPixels() on the area you want to retrieve.

The thing is, if they're doing shader work to compute this texture, then either:

A: It's already bound to an FBO, since they're rendering to it to compute its data.

B: They're using Image Load/Store. In which case, they could be writing to a buffer texture, thus giving them much better access to the data (ie: being able to map the buffer for reading).

This is probably why the ARB hasn't bothered to add such a function.

hlewin
08-08-2013, 07:56 PM
Taking the point of time it would have been senseful to add GetTexSubImage into account I strongly doubt that.
I'd have appreciated to see the "new extension" been written against the 1.0-spec

mhagain
08-08-2013, 08:05 PM
It would probably be faster to keep a copy of the heightmap data on the CPU and modify it there instead. Readbacks suck.

hlewin
08-08-2013, 08:14 PM
Maybe. Doesn't have to be. But not an argument for not including it in the first place,

ScottManDeath
08-08-2013, 09:45 PM
http://www.opengl.org/registry/specs/ARB/copy_image.txt the region of interest into a separate texture and then GetTeximage that?

Alfonse Reinheart
08-08-2013, 10:07 PM
Why would you do that? It'd probably be faster to just get the whole image and pick out the part you want.

mbentrup
08-09-2013, 02:50 AM
Don't forget that there's extension for sparse textures which may be impractical to transfer as a whole. The bind to FBO/ReadPixels workaround may also fail, because FBOs usually aren't capable to handle all texture formats, e.g. compressed textures.

Alfonse Reinheart
08-09-2013, 03:25 AM
Don't forget that there's extension for sparse textures which may be impractical to transfer as a whole.

Sure. But... if you wanted that data, you should have kept some of it, instead of deleting the memory after the upload. Asking for stuff you already sent OpenGL is a waste of time. Generally speaking, sparse textures are not used as render targets or the output of processes that compute information. Sure, they could be, but I can't really imagine why one would want to.


The bind to FBO/ReadPixels workaround may also fail, because FBOs usually aren't capable to handle all texture formats, e.g. compressed textures.

If it's a compressed texture, then that data could only have gotten there in one of two ways:

1: You uploaded it. Again, if you wanted it, you should have kept it instead of deleting the memory.

2: You wrote to an unsigned integer texture and use texture copying (https://www.opengl.org/wiki/Texture_Storage#Texture_copy) to copy the bits into a compressed texture.

The likelihood of doing the latter case and needing to read from it on the CPU? It seems rather unlikely.

Plus, getting subimages of compressed formats is... difficult. Even moreso since glCompressedTexSubImage* themselves are only ever guaranteed to work if you upload the entire mipmap level. Otherwise, the implementation can throw a GL_INVALID_OPERATION error for arbitrary, unspecified reasons related to the format.

hlewin
08-09-2013, 03:55 AM
Sure it's a waste of time - but not a waste of Memory as keeping every single texture in Memory two or three times. It is a little about what one wants to have and judge the means according to the specific needs. And this is something where to OpenGL API is a bit one-eyed in that the pre-assumption is that processing-speed is the only criterion that matters which refelects the state of the hardware as it was until maybe a few years ago - namely that there are tons of Memory available when compared to the computing power of the graphics Hardware and the Transfer Speed of the bus-Systems. In a not too timecritical environment I wouldn't care about wasting some time when compared to a method that is super-efficient if it eased the implementation a lot. But requesting the whole tex-image when needing only a few pixels sounds like bad-taste to me. Assuming that some round-tripping of textures between System- and GPU-Memory will most likely take place anyways makes me faithful that the Driver isn't all too likely to drop all texture-data from system-Memory so it will be accessible without too much of hassle - if there weren't the API that doesn't allow getting a SubImage...

Alfonse Reinheart
08-09-2013, 05:16 AM
namely that there are tons of Memory available when compared to the computing power of the graphics Hardware and the Transfer Speed of the bus-Systems.

I'm not sure why you brought that up, since it's an argument for you to keep a copy of the image data. You claim it's a "waste of memory" for you to keep a copy, but then claim there's "tons of Memory available". It can't be both, so which is it?


Assuming that some round-tripping of textures between System- and GPU-Memory will most likely take place anyways

Why should we assume that? In the best-case performance scenario, that's not supposed to happen. Textures live on the GPU and should only be evicted when there's no more room. And since modern OS's more or less guarantee a process's GPU memory, textures aren't ephemeral like they used to be. So drivers don't need to keep backups of them around.

I see no reason to assume that every texture has a backup copy in system memory on a modern OS. I'm not saying it's not true; I'm saying that we shouldn't assume that it is.

hlewin
08-09-2013, 05:54 AM
That depends heavily on whether I'm sitting on my Desktop, my old Laptop or my Handy. The aforementioned assumption is none. My Laptop uses the system-Memory for gpu stuff anyways. There won't be any difference. About my Desktop I'm pretty sure if I plug in a board that has 64mb Memory and I use about 90mb of textures. It would be pretty stupid to drop the textures from System-Memory if round-tripping is required. If it isn't there's still something one can have faith in - that is: the Driver developers trying to find a clever solution. If a texture is known to be read by api calls it isn't alltoo complicated to flag it for some time and keep it. The spec - of course - will never include statements about such a assumption: It defines the semantics of the API - not the details of it's implementation. The details of implementation get in when it is required to rule out incorrect usage which is not directly deduceable from the semantics themselves but due to certain hardware-restrictions etc. To avoid assumptions about such things would require reading notes from the individual Driver-developers - not the General OpenGL sites.

kRogue
08-09-2013, 03:17 PM
This likely does not apply to the poster's original wants, but for Intel GPUs (no laughing here please), there is http://www.opengl.org/registry/specs/INTEL/map_texture.txt which allows one to map a texture (there are various limit-issues on what can be mapped though).

I think a glGetTexSubImage is not necessarily a bad idea, though one can get equivalent functionality via FBO and glReadPixels, it seems silly to do it that way. As far as it always being slow, one may want to dump it into a buffer object and go further from there... As a side note, something like that was done in GL2 days with pixel buffer objects to essentially simulate transform feedback.

hlewin
08-09-2013, 06:11 PM
It would be nice to be able to bind the pixel-data of textures directly to some buffer, separating it from it's meta-data.
That is - a glTexImage that does not copy the data but simply makes it a binding. This way access to the texture data as well as storage-type-hinting etc. would follow the generic buffer-api.

Alfonse Reinheart
08-09-2013, 07:44 PM
A lot of things would be nice to be able to do. That doesn't mean we can or should be able to do them. A good abstraction needs to actually be abstract, so as to allow implementations across a variety of hardware.

Just look at Intel's map extension. They basically wave their hands about what formats are "natively supported by the GPU hardware". Thus ensuring that you have absolutely no idea whether a particular sized format will be mappable. That's not an abstraction.

hlewin
08-09-2013, 08:26 PM
This is true in that not all texture-formats are necessarily suitable for such a mapping. The spec as it is leaves a certain degree of freedom in internal representation for most formats which would mean the formats available for such a mapping would only be some if not using vendor-specific extensions.
I do not really understand the principle point you state at the beginning of your post. I guess I should be able to - as an example - render to a texture with a well-defined format and use the pixels as vertex attributes without the need to copy them. I do not know anything too specific about GPUs of course. But seeing APIs like OpenCL being widely available it simply cannot be that those things would not be possible to do. Of course I don't know what you should do or shouldn't - I'm not familar with the specific needs of your business. As a developer using an API one can wonder about the obvious absence of basic functionality or one does not - one size fits all is an Illusion in that a Piece of Software could dictate the users' needs. What it can dictate of course is the useage-patterns resulting from it's design. Again the question is if the API should be designed in a way that ensures optimal Performance at the cost of usability.

Alfonse Reinheart
08-09-2013, 08:59 PM
I guess I should be able to - as an example - render to a texture with a well-defined format and use the pixels as vertex attributes without the need to copy them.

I'll ignore the question of why you would even want to do that these days with transform feedback, image load/store, and SSBOs available. So instead I'll focus on how that would work.

OpenGL has no concept of a "well-defined format". It has formats of particular pixel sizes. But that says nothing about the important questions of swizzling, internal storage row alignment, and so forth. So you want to now expand image formats into being able to answer and control these questions? How would that work? And what about hardware that can't implement certain combinations of stuff?


But seeing APIs like OpenCL being widely available it simply cannot be that those things would not be possible to do.

I admit that I'm not exactly up on OpenCL, but I'm pretty sure that OpenCL images can't do what you're wanting either. The OpenCL concept of buffers is different from image buffers (http://stackoverflow.com/a/9908568/734069), just like the OpenGL concept of buffer objects is different from textures. You can't shove an image buffer in OpenCL when a buffer pointer is expected, and vice versa.

So I'm not seeing your point.


As a developer using an API one can wonder about the obvious absence of basic functionality or one does not

The ability to pretend that an image is a buffer object is not "basic functionality" by any reasonable definition of that term.


Again the question is if the API should be designed in a way that ensures optimal Performance at the cost of usability.

Usability is in the eye of the beholder. And not being able to use textures for sources of vertex data is hardly limiting in terms of usability. And yes, a well-designed performance API should be designed in a way that prevents you from doing things that lower performance needlessly.

Also, this conversation is very confusing. We've gone from a fairly simple, not-entirely-unreasonable request to be able to read parts of images back to nonsense like binding images as buffer objects and handing OpenGL random pointers that it's expected to use as buffers and images. These things have nothing to do with one another.

hlewin
08-09-2013, 09:37 PM
I'll ignore the question of why you would even want to do that these days with transform feedback, image load/store, and SSBOs available. So instead I'll focus on how that would work.
I would call this a wise decision.


It has formats of particular pixel sizes. But that says nothing about the important questions of swizzling, internal storage row alignment, and so forth.
You must have selectively forgotten about the sized internal formats. Given the cheap shot this does not include restrictions about row-alignments which seems to make things impossible to even image for you.


I admit that I'm not exactly up on OpenCL, but I'm pretty sure that OpenCL images can't do what you're wanting either.
What? Being a sequence of numbers that define colors? You must be kidding.


OpenGL concept of buffer objects is different from textures
If textures aren't series of numbers in ones view - of course.


The ability to pretend that an image is a buffer object is not "basic functionality" by any reasonable definition of that term.
Given - that was aimed at GetTexSubImage which is alltoo obviously missing when seeing it's pendant.


Usability is in the eye of the beholder. And not being able to use textures for sources of vertex data is hardly limiting in terms of usability.
Maybe that's the case for you - which makes your statement a little too contradictory for my taste.


And yes, a well-designed performance API should be designed in a way that prevents you from doing things that lower performance needlessly.
In your notion of 'needlessly' you seem to cancel out the time it takes to write a code-path around those definition holes. The notion of performance API makes your standpoint even more clear. Maybe this is right for you - which seems a little strange as I had the feeling you were reading the GL-spec at breakfast so that a lack of knowledge about certain things could hardly be trap for you performance-wise.

And yes - one's nonesense is other's workpower-saving. Comes down to what you make of it.

Alfonse Reinheart
08-09-2013, 10:35 PM
You must have selectively forgotten about the sized internal formats.

I said "formats of particular pixel sizes". Sized internal formats only describe the sizes of pixels, not the arrangement of pixels in memory. And without being able to control that, you can't use them as buffer objects, since the arrangement of the data in the texture is not well specified by the API.


What? Being a sequence of numbers that define colors? You must be kidding.

Just because you believe that a texture is "a sequence of numbers that define colors" doesn't mean that an API agrees, OpenGL or OpenCL. You can think of them as that all you want. That will not change the objective reality of the situation (FYI: they are not), nor will it change the objective definitions of OpenGL and OpenCL's APIs.


Given - that was aimed at GetTexSubImage which is alltoo obviously missing when seeing it's pendant.

... huh? It's hard to have a discussion when you keep jumping back and forth between different points. Which idea for OpenGL functionality are we talking about: your desire to use textures as buffer objects, your desire to just hand them a pointer and expect textures to work with that as their storage, or your desire to read an arbitrary region from a texture? Because you've mentioned all of these in this thread.

hlewin
08-09-2013, 10:58 PM
Sized internal formats only describe the sizes of Pixels
To be more precise they describe the existence, size and number-format of a pixel's color components. The order of their storage is given also. You really know that RGB means Red, Green, Blue in that order don't you? The Definition-holes that might still be there as the byte-order used internally could be closed by one or two declaratory sentences in a spec-file.


It's hard to have a discussion when you keep jumping back and forth between different points.
That's the dialectic course of discussion. In my oppinion the term "basic functionality" could have pointed in that direction.

Noticing that I'm falling into the bad habit of quoting you and answering one point after another I'll simply refer you back to the beginning of the discussion and remind you of it's dialectal nature. Don't you notice yourself that questions like the one brought up at the end of your last post are simply ridicioulus? You take different things mentioned as missing and form a either-or question of it. As if the one would logically contradict the other. As far as I am concerned the point has been made clear. Not that this is likely to matter at all...

Alfonse Reinheart
08-10-2013, 12:11 AM
The order of their storage is given also.

No it isn't. The implementation is free to store the binary data in whatever component ordering it wants. If it wants to store the bytes with green first, that's a legitimate implementation. When you fetch it in the shader, the red will be the first component automatically (barring any texture swizzling, of course).


The Definition-holes that might still be there as the byte-order used internally could be closed by one or two declaratory sentences in a spec-file.

And by putting those "one or two declaratory sentences", you're basically saying that if their hardware works a different way, they cannot implement OpenGL. That's a horrible idea; OpenGL should not enforce something like this when it doesn't have to.

More importantly, my main point is that describing the storage of an individual pixel isn't enough. There's more to texture storage than an individual pixel. Most textures are stored swizzled, where pixels are stored such that locality is maximized. For example, if you have GL_RGBA8, that's 4-bytes per pixel. Let's say that a cache line is 64-bytes in size. So a single cache line fetch will read 16 pixels.

If you stored the data linearly, each cache line would access 16 horizontal pixels. However, as we know, textures are almost never accessed horizontally. A bilinear fetch from a fragment shader needs a 2x2 block of pixels. To get that from a linearly stored texture, you'd need to fetch two cache lines. However, if every cache line stored a 4x4 block of pixels, rather than a 16x1 linear array, then you would only need one cache line for a bilinear fetch. Oh sure, some will need two or four, but if you're covering the whole face of a primitive, the number of times you'll need more than one is greatly diminished. Also, you'll sometimes need 4 cache line fetches for the 16x1 case two. Indeed, since you're typically fetching a whole pixel-quad of texture samples (since fragment shaders run in 2x2 groups), you're really needing to read a 4x4 block of pixels.

This is called "swizzling" (http://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/) of the texture's storage. Rather than storing texel data linearly, it's stored in these groups. Some swizzling is scan-like within the 4x4 block. Other swizzling will have sub-swizzles (each 2x2 block in the 4x4 is itself swizzled, and the 4 2x2 blocks in the 4x4 are swizzled). Different hardware has different standards, but virtually every piece of graphics hardware does swizzling.

A proper abstraction of textures, which OpenGL provides, allows different hardware variances on these issues. Different hardware can swizzle, or not, as it sees fit. And because the internal layout of pixels in the hardware is not exposed by the API, OpenGL is able to support any hardware via a simple black-box model. All the driver needs to do is swizzle the data the user provides from glTex(Sub)Image, and unswizzle it via glGetTexSubImage/glReadPixels.

That's why the Intel map texture extension requires an explicit setting pre-storage creation flag to say that the texture won't be stored swizzled. And you can't map the texture unless you force it to be linear. So if you want to use textures as buffer object, you too would need some way to tell the implementation to not swizzle the image.

If you were unaware of all this, perhaps you should spend some time learning how things currently work before suggesting how they ought to work.


Don't you notice yourself that questions like the one brought up at the end of your last post are simply ridicioulus?

If I had reason to think the question was ridiculous, I wouldn't have asked it. You brought up each of those points, completely unbidden by anyone else mind you. So it's not clear what exactly you're talking about at any particular point.

Or more to the point, you went off-topic when you brought up "It would be nice to be able to bind the pixel-data of textures directly to some buffer". I was just following your digression.

mhagain
08-10-2013, 08:57 AM
All of this is still ignoring the synchronization and pipeline draining needed to do such a readback. Here's a test - every place in code where one would like to have a hypothetical glGetTexSubImage, instead put a glFinish call. Because that's what it will be the equivalent of. Is it still acceptable?

kRogue
08-10-2013, 02:16 PM
..unless the read is done to a buffer object (GL_PIXEL_PACK_BUFFER) then the flush is needed only on buffer object read.

hlewin
08-10-2013, 02:24 PM
And by putting those "one or two declaratory sentences", you're basically saying that if their hardware works a different way, they cannot implement OpenGL. ...
Right. If their Hardware is unable to read a few numbers out of memory in a given order they cannot implement OpenGL (4.)5. What's the Problem? The further discourse of how cached memory-access works is pointing the direction. How difficult would it be to write texture-accessing methods for modern GPUs that did not try exploit cache-lines? (I didn't even bother to read the link you provided.) You pretend you're talking about hardware-issues all the time? Aren't those thingies programmable? The sentence

Different hardware can swizzle, or not, as it sees fit.
is an example. If the Hardware cannot randomly access it's own memory then there is a real problem. Otherwise it's just the one picture of how things ought to be done trying to exclude the other. As this thread and Forum-category is about the proposed target- not the is-state I wouldn't bother about optimizations to current implementations of the API's IS-state not to work exaclty the same way. And then again - there is no Problem to keep those optimizations for cases where they are appliciable. I do not see a contradiction here. And I can - without a Problem - write all this without knowing exactly how someone decided to optimize certain - that is: the ones defined and/or implied by the api as-is - use-cases. And that simply because I know: there first are the definitions and then the implementations. Not the other way around.
The note about transparently buffered textures not being as optimizable as opaque textures is something that belongs into the programming-guide - not the specification. The same would be the case for a warning that getTexSubImage might lead to a read-back from GPU-Memory and hence consume some time. I dunno about the exact DMA-timings these days but I guess it cannot be more than a few hundret clock-cycles before the data-transfer is O(n). Which means something in the -seconds scale of delay. Such a delay can't possibly be the reason to rule out functionality that would cause such a delay because of considerations it would lead people to use such functionality and hence write applications that caused such delays.

About the off-topicness: You are aware of the course the discussion took, aren't you? But if it eases your mind I could open another thread specific to transparently buffered textures and whole-fully dedicate this one to GetTexSubImage - although I would not know what there is to discuss about it. We're not Driver-implementors that have to care about how this could be done as fast as possible. We're users of the api wondering about missing functionality...

mhagain
08-10-2013, 07:07 PM
..unless the read is done to a buffer object (GL_PIXEL_PACK_BUFFER) then the flush is needed only on buffer object read.

...which depends on when you read the buffer object. If you need it in the same frame - you're screwed - now you have to wait for all pending GL calls to complete, as well as the transfer from texture to buffer object. If you can wait until a few frames later it's OK, but I get the impression from the OP that he needs it in the same frame (otherwise he's going to be performing physics/etc on out-of-date data) so reading to a buffer object seems a strawman in this particular case.

hlewin
08-11-2013, 12:14 PM
Could you Elaborate in concrete terms what "you're screwed" means? When needing the results from a previous Operation it is clear that the Operation has to be finished before going on in a trivial reading. I guess "you're screwed" only means "it is impossible to exploit (in-)dependencies via multi-threading" in that case.

kRogue
08-11-2013, 03:19 PM
...which depends on when you read the buffer object. If you need it in the same frame - you're screwed - now you have to wait for all pending GL calls to complete, as well as the transfer from texture to buffer object.

I'd imagine reading the buffer object after swap buffers would be good enough.. however, even that much waiting seems quite extreme. It is not like an immediate based renderer waits for swap buffers before doing anything. Best thing to do I would guess would be to use a sync object and query the sync when operation is done. Once done, then do the buffer object read... if one needs to values at some point to continue, then they will bite the bullet and cause the stall, but if there are other rendering bits going on and if the values are not needed by the CPU immediately, then I strongly suspect that the sync jazz will prevent a lot of stalls even if the values are used/needed in the same frame.

All depends though on how much GL stuff is between the height map render and when it is needed by the CPU.

mhagain
08-11-2013, 08:57 PM
Could you Elaborate in concrete terms what "you're screwed" means? When needing the results from a previous Operation it is clear that the Operation has to be finished before going on in a trivial reading. I guess "you're screwed" only means "it is impossible to exploit (in-)dependencies via multi-threading" in that case.

Multithreading is irrelevant here. You are aware that the GPU and CPU are separate processors, aren't you? And that they run asynchronously? And that there can be a ~3 frame latency between the GL commands you submit and them making it all the way through the pipeline and onto the screen? And that if you do a readback - particularly a readback from something that needs to wait until a late stage in the pipeline - then you're not just waiting for one operation to complete; you're waiting for ~3 frames worth of operations to complete?

That's what "you're screwed" means.

hlewin
08-11-2013, 09:36 PM
About what 3 Frames on what do we talk here? Looking at the data of the bus-System does not imply that those are in the higher MHz scale. Assuming that the GPU is able to do 50-60 Frames per second how could that possibly be true? Flushing execution causes to wait. Ok so far - this is done every frame. And then? The GPU is idle... Do you mean it takes >50 MS for a command to arrive and/or get recognized by the idle GPU? Are Rendering over Network?