Suggestion: GetTexSubImage2D() et al.

Suggestion:

void
GetTexSubImage2D(GLenum target, GLint level, GLint x, GLint y, GLsizei width, GLsizei height, GLvoid* texels);

and for consistency, its 1D and 3D counterparts.

Motivation:

I have a series of N textures I’d like to read data back from. The texture data is not available in client memory at the time I need to read it. I only need a relatively small part of each texture, so using GetTexImage() to read entire textures is very inefficient.

Binding each texture to a framebuffer object and using a ReadPixels() call, seems unnecessarily complicated (inefficient) given that a function to read an entire texture directly already exists.

For completeness it may be a good idea to add compressed versions such as:

void
GetCompressedTexSubImage2D(GLenum target, GLint level, GLint x, GLint y, GLsizei width, GLsizei height, GLvoid* texels);

I don’t really like that.

As I see it, you have two use cases for such a function:

You have uploaded the texture yourself. Then you don’t need to get it back, you already have it. The GPU memory is not a place to backup your data, keep it in RAM, and you won’t need to read it back.

You have rendered into the texture. Then you already have bound it to an FBO, no harm in just making a ReadPixels (into an PBO, if you’re concerned about parallelism).

I think there is a pattern here :wink:

a) Things that you rendered are read by ReadPixels.
b) Textures are not a general purpose data store. Buffer objects are used for that.

(For the record: IMHO the GetTexImage function should be removed as well.)

You have rendered into the texture. Then you already have bound it to an FBO
What about 3D textures? You would have to bind each layer of such texture in a loop since you can’t bind entire 3D texture to an FBO. Besides, binding a slice of 3D texture to an FBO is not widely supported. :frowning:

Consider the following scheme (I actually have this in my game):

  1. render and copy to texture (no FBO, just glCopyTexSubImage)
  2. issue some more rendering commands (render something else)
  3. calculate some physics/AI
  4. read the texture
    Let’s assume your app works on some old hardware that doesn’t support FBO (TNT2?).
    During the time spent by CPU on #3, GPU probably started to execute #2 and driver can send you that texture without stalling CPU. As soon as you have your data downloaded GPU will continue it’s work on #2 immediately. So I have no GPU/CPU stalls because I used glGetTexImage instead of glReadPixels. At least this is how it works with NVIDIA drivers. Not sure about ATI. Simple and portable to all OpenGL 1.1 GPU’s :slight_smile:

For the record: IMHO the GetTexImage function should be removed as well.
You can achieve parallel execution with PBO, that’s true. But with such approach I could say that we don’t need immediate mode since we can achieve the same with vertex arrays. So shall we remove glBegin()/glEnd() because there is a better way or should we leave it because it’s simplier?
I believe that what we love in OpenGL is not that it makes possible to achieve anything we want, but because it also makes it easy and your code is clean. You want tex image? You use glGetTexImage and not some FBO+PBO combination. Otherwise our source code will start to look like some driver source code. Of course if you want performance you’ll use FBO+PBO. Your choice.

As for ‘compressed’ version it won’t be that easy, since compression algorithm can work on block of pixels and therefore you will not be able to read any fragment of texture you want.

It would be nice if we had such functions from the very beginning, but now, since OpenGL 1.1 - 2.1 are allready defined without them, application would require something like OpenGL 2.2 to use these, so they will not add any new feature that cannot be achieved allready. But most certainly what we have now should not be removed.

But with such approach I could say that we don’t need immediate mode since we can achieve the same with vertex arrays. So shall we remove glBegin()/glEnd() because there is a better way or should we leave it because it’s simplier?
Exactly.

And guess what, it’s really going to happen in GL 3.0. Of course it won’t be really removed because of backwards compatibility, but it will be officially deprecated.

What about 3D textures?
How do you render into a 3D texture without binding each slice seperately?

And guess what, it’s really going to happen in GL 3.0. Of course it won’t be really removed because of backwards compatibility, but it will be officially deprecated.
AFAIK it will be layered, so it will be slower, but still available in the API and that’s what I meant. I use immediate mode for all my debug rendering and for user interface (I’ll change UI to VBO’s someday but it’s not worth the effort just yet).

How do you render into a 3D texture without binding each slice seperately?
You render to a 3D texture in a loop, yes, but if you use glCopyTexSubImage to render to texture (and as I said, binding slice of 3D texture is not widely supported) then you have only 3 optoins:

  1. call glReadPixels immediately after rendering and stall the CPU - bad idea
  2. call glGetTexImage later
  3. use PBO - NVIDIA only

That’s why I agreed that having glGetTexSubImage3D can be convenient.

Isn’t downloading a 3D texture already a slow process. Sorry, I have no idea, since I don’t need to render to 3D and download them.
Perhaps downloading each slice with glReadPixels is as fast as a single glGetTexture3D. How many times do you need to repeat the process? Less than 512? Less than 256?

So how fast is a glGetTexImage for a 3D texture? The texture is probably stored in a special format on the GPU, like the block method of nVidia.

My argumentation is based on simplicity, not performance. If we don’t want simplicity and clean code, then let’s remove glTexImage2D, too. We can bind texture to FBO and use glDrawPixels, right? So we don’t need glTexImage. :stuck_out_tongue:
We also don’t need to be able to pass data to VBO - we could render to vertex buffer using glDrawPixels, right?

So simplicity is the question here - do we want it or not? I believe beginning OpenGL programmers would appreciate it. It’s also usefull for demos - you can provide simple source code with your articels/tutorials.

Let’s just wait for the new API. There’s no point in extending the old one now with something that can be done already.

I’m pretty sure there won’t be a GetTexImage in the new API (although I don’t have any confirmed information on that). Also I’m pretty sure glTexImage2D as it exists now won’t be there, either. More likely one call to create the texture and one call to supply the data (perhaps with a glu call that unifies both).

If you want simplicity, you can always make it with a library (like the glu object creation calls in the various sample presentations of the new API).

Thanks for the feedback Overmind and k_szczech.

Overmind’s two use cases are valid for some scenarios, but things are often more complicated…

Overmind:
1)
You have uploaded the texture yourself. Then you don’t need to get it back, you already have it. The GPU memory is not a place to backup your data, keep it in RAM, and you won’t need to read it back.

True, unless there are many textures. Keeping an extra copy of them all in client memory simultaneously will bloat the application unacceptably. In my case, I’m tolerating the read overhead to avoid this.

Overmind:
2)
You have rendered into the texture. Then you already have bound it to an FBO, no harm in just making a ReadPixels (into an PBO, if you’re concerned about parallelism).

This use case is valid for code that continually renders to a texture and reads back immediately. What about code that renders to a collection of textures infrequently compared to the number of times it samples (reads during rendering) the textures. The pathological case is write once, sample (render) many times…

A texture attached to a framebuffer object must be detached before it can be sampled (rendered). So to read a subimage from each texture one needs to[ol][li]bind a framebuffer object[]read the texels[]unbind the framebuffer object[/ol][/li]That’s assuming one uses one framebuffer object per texture. For a reading a subset of a large collection of textures it may be wiser to use a small pool of framebuffer objects and wear the additional costs of attaching and detaching textures too.

If one needs to read from more than a few textures at once, the overhead starts to add up. Having said that, I haven’t benchmarked the relative costs of the binds/unbinds and the read.

Note that even using pixel buffer objects won’t necessarily increase parallelism much since only one framebuffer object can be active at a time. In contrast, as long as the textures being read aren’t being rendered to, there is much better scope for parallel GetTexSubImage calls in progress while rendering continues.

Overmind:[b]
I think there is a pattern here :wink:

a) Things that you rendered are read by ReadPixels.
b) Textures are not a general purpose data store. Buffer objects are used for that.
[/b]
Or more completely: :wink:

[list=A][li]TexImage* and TexSubImage* for writing Textures[]GetTexImage ([i]and maybe GetTexSubImage[/i]) for reading Textures[]many OpenGL calls for writing to DRAW_BUFFERS[]ReadPixels for reading READ_BUFFER[/LIST][/li]If the OpenGL state happens to be set correctly, C can also write to textures as a side effect.

Overmind:
For the record: IMHO the GetTexImage function should be removed as well.
I’m all for the “Lean and Mean” approach, but consistency is critical to a good API too.

Functions that set OpenGL state usually have a corresponding Get function. For TexImage*, GetTexImage* seems a better counterpart than a series of calls to configure, bind, read and unbind a framebuffer object.

Conversely, how would you feel about replacing TexImage* with a series of calls to configure, bind, write and unbind a framebuffer object? I’m not being altogether serious here, but it’s what consistency demands when applied ruthlessly to your suggestion.

Of course if the new versions of OpenGL take this buffer-based approach, but with a terse, low-overhead and consistent API, I’m all for it.

k_szczech
As for ‘compressed’ version it won’t be that easy, since compression algorithm can work on block of pixels and therefore you will not be able to read any fragment of texture you want.

I’d guessed that, thus my more tentative suggestion with the compressed version. I thought that I’d ask anyway, in case it was easy for the implementors to stream though existing texture access functionality.

My original post was more to draw attention to a minor inconsistency in the existing spec than to demand a solution to my particular problems. Hope this discussion helps those working on the new specs. :slight_smile:

True, unless there are many textures. Keeping an extra copy of them all in client memory simultaneously will bloat the application unacceptably. In my case, I’m tolerating the read overhead to avoid this.
I don’t see the advantage of GetTexSubImage2D over GetTexImage2D. It’s a good idea to do a performance test and show that what you want is warranted. All future GL additions are for exposing a new feature.
API consistency is not important, else they would have added glFrustumf, glOrthof and glClipPlanef already.

Overmind, already made the point about GL 3.0 direction.

If the function was already added (1.0), it would have been nice but it is a little late now.

My original post was more to draw attention to a minor inconsistency in the existing spec
Yes, that’s how I believe everybody see this, but adding something for consistency without actually discussing if it’s usefull at all isn’t the way to go :slight_smile:

At this point I can see that we are for/against one of two possible directions:

  1. We go for consistenci/simplicity: consistent set of Tex(Sub)Image/GetTex(Sub)Image - so we keep smiling towards beginning OpenGL programmers
  2. Lean and mean: PBO based “glDownload”/“glUpload” and no other ways of exchanging array data between client and server. So these two functions would be used for textures, VBO’s, framebuffer access, selection buffer, feedback buffer and even matrices - glMultMatrix would now multiply two matrices on top of the stack, but we would still have glTranslate/glRotate/glScale/glLoadIdentity since these minimize amount of date exchanged by application and OpenGL. So learn to use PBO or die.

Thanks to everyone that placed their 3 cents in this discussion, especially to Overmind.