Proper way to copy from PBO to a 3D texture?

Hello everyone, I really need some serious help here.

I have a problem with PBO and 3D textures. Namely I can’t seem to update any other slice in a 3D texture except the 0th one using PBO method.

In other words, glTexSubImage3D() ignores Z offset completely when used with PBO and works fine otherwise. Here is the code I am having problems with, what am I doing wrong?

#define GLEW_STATIC
#define GLUT_STATIC_LIB

#pragma comment(lib, "advapi32.lib")
#pragma comment(lib, "glew32s.lib")
#pragma comment(lib, "glutstatic.lib")

#include <stdio.h>
#include <windows.h>
#include <GL/glew.h>
#include <GL/glut.h>

#define valloc(size) VirtualAlloc(NULL, (size), MEM_COMMIT, PAGE_READWRITE)
#define vfree(ptr)   VirtualFree(ptr, 0, MEM_RELEASE)

static DWORD CPUFrequency(void)
{
  DWORD freq;
  HKEY hKey;
  const char *key = "HARDWARE\\DESCRIPTION\\System\\CentralProcessor\\0";
  DWORD buflen = 4;
  RegOpenKeyExA(HKEY_LOCAL_MACHINE, key, 0, KEY_READ, &hKey);
  RegQueryValueExA(hKey, "~Mhz", NULL, NULL, (LPBYTE)&freq, &buflen);
  RegCloseKey(hKey);
  return freq;
}

static __declspec(naked) unsigned __int64 ReadTSC(void)
{
  __asm   {
     rdtsc
     ret
  }
}

int main(int argc, char *argv[])
{
  const int w = 512, h = 512, d = 256;
  int   frame_size = w * h * sizeof(float);
  int   data_size = frame_size * d;

  glutInit(&argc, argv);
  glutCreateWindow("STREAMING TUTORIAL");
  glewInit();

  glMatrixMode(GL_PROJECTION);
  glLoadIdentity();
  glOrtho(0, w, 0, h, -1, 1);
  glMatrixMode(GL_MODELVIEW);
  glLoadIdentity();
  glViewport(0, 0, w, h);

  float   *data1 = (float*)valloc(data_size);

  GLuint   texture3D;

  glGenTextures(1, &texture3D);
  glBindTexture(GL_TEXTURE_3D, texture3D);
  glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_BORDER);
  glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_BORDER);
  glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_BORDER);
  glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
  glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
  glTexImage3D(GL_TEXTURE_3D, 0, GL_RGBA8, w, h, d, 0, GL_RGBA, GL_BYTE, 0);

  GLuint   buffer;

  glGenBuffers(1, &buffer);

  glFinish();

  unsigned __int64   t0, t1;
  double         tt, freq = CPUFrequency();

  t0 = ReadTSC();
  for (int z = 0; z < d; z++) {
     glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, buffer);
     glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, frame_size, NULL, GL_STREAM_DRAW);
     unsigned char *data_ptr = (unsigned char *)data1 + z * (frame_size);
     float *mem = (float*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY);
     if (mem == NULL) {
        DebugBreak();
     }
     memcpy(mem, data_ptr, frame_size);
     glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB);
     glTexSubImage3D(GL_TEXTURE_3D, 0, 0, 0, z, w, h, 1, GL_RGBA, GL_UNSIGNED_BYTE, 0);
     glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, 0);
  }
  glFinish();
  t1 = ReadTSC();

  tt = (double)(t1 - t0) / freq;
  printf("2D = %.3f ms, %.2f MB/sec
", tt / 1000.0, (double)data_size / tt);

  glDeleteBuffers(1, &buffer);
  glDeleteTextures(1, &texture3D);

  vfree(data1);

  return 0;
}

I am aware that mapping the buffer in a loop might not be the best idea but I simply do not know any better at the moment.

Can anyone help me with this or should I conclude that NVIDIA, GPGPU and OpenGL forums are totally useless when it comes to any serious problem?

Hate to say it, but probably, yeah. My own experience starting out was that forums were great for beginner questions, but generally fell silent when you came up against really bizarre behavior.

You have to find someone who’s willing to answer questions. There are few enough of those. Of that group, you need to find one who’s used the feature you’re interested in. Even fewer. Then you need to find one who’s encountered and solved your particular bug. That…is very rare in many cases.

(Still waiting for someone to explain to me why FBOs sometimes “forget” their attachments between bindings if there are enough of them floating around.)

Most such issues are probably driver bugs, and the only people who can fix it are NVIDIA/ATI. Best thing would be to email their support departments directly.

Why are you copying nothing in particular into the 3d texture? In a release build, that memory will contain random garbage.

Also, if you had actual data in the data1 pointer, that you wanted to put into the 3d texture, why not just pass it to glTexSubImage3D raw? (it’s last parameter)

Also, I assume you are using the “sizeof float” and “float *” because the size of a float happens to coincide with the number of bytes needed for an RGBA pixel. (4) It would be a better practice to just use a pre-defined RGBA structure. Or make your own.

struct
{
unsigned char r;
unsigned char g;
unsigned char b;
unsigned char a;
}RGBA_struct;

Using float in that manner is confusing for other developers since it obfuscates your intent.

Back to your code:
I have never used glTexSubImage3D before, but from it’s description I would revisit how you are using the “depth” parameter. After reading the spec, it’s purpose seems a little fuzzy to me. But maybe it needs to be some value relative to the current z. Rather than just 1. Like I said, I am not sure what it does except that it factors in when glTexSubImage3D reads from the pointer you provide.

Igor, I thought this was a “serious problem”. Can’t you reply back to this thread with more info?

I’m not clear on how the problem manifests itself as this simple code example doesn’t do anything except time the loop.

Sorry for not replying earlier, I have given up hope that I will get any answers here.

Originally posted by CRasterImage:
[b] Why are you copying nothing in particular into the 3d texture? In a release build, that memory will contain random garbage.

Also, if you had actual data in the data1 pointer, that you wanted to put into the 3d texture, why not just pass it to glTexSubImage3D raw? (it’s last parameter) [/b]
I have actual data, this is just a minimal example where initialization of data1 isn’t shown.

I can do that but I would like to achieve greater speed.

Originally posted by CRasterImage:
I have never used glTexSubImage3D before, but from it’s description I would revisit how you are using the “depth” parameter. After reading the spec, it’s purpose seems a little fuzzy to me. But maybe it needs to be some value relative to the current z. Rather than just 1. Like I said, I am not sure what it does except that it factors in when glTexSubImage3D reads from the pointer you provide.
If I understand it correctly, it can work in three modes. First mode is when you provide all the data at once. Second mode is when you provide individual slices and index them with changing the Z offset. Those two I don’t have a problem with.

Third mode is when you transfer from a PBO to the texture, then the last parameter is the buffer offset (which I am setting to 0) and the other parameters are hopefully the same but for some reason Z offset gets ignored and I always end up updating the slice 0 in a 3D texture. I hope I won’t have to repeat that again.

Originally posted by pudman:
I’m not clear on how the problem manifests itself as this simple code example doesn’t do anything except time the loop.
You can add some simple code:

	float	*data2 = (float*)valloc(data_size); // put after data1
	for (int i = 0; i < (data_size >> 2); i++) {
		data1[i] = (float)rand() / RAND_MAX;
	}

And then add at the end:

	glGetTexImage(GL_TEXTURE_3D, 0, GL_RGBA, GL_UNSIGNED_BYTE, data2);
	FILE *fp;
	fopen_s(&fp, "cmp.raw", "wb");
	fwrite(data2, 1, data_size, fp);
	fclose(fp);

And you will get the texture dumped, for me it has only 0-th slice, the rest is empty. That is exactly what I said it was the problem.

Why would you give up so quickly? You only waited half a day.

Let me see if I understand what you are trying to do:

  • You start by drawing some unknown data to an off-screen texture. (a pbuffer, in your case)

  • Then you copy the pbuffer’s pixels into a memory buffer allocated in your application’s heap.

  • Then you want to upload that pixel data as an arbitrary slice of a 3d texture.

If I am correct in describing what you want, then here are my suggestions:

  • Don’t allocate enough memory for the whole 3d texture, just allocate enough memory for a single slice.

  • Reverse your memcpy() call. You are copying in the wrong direction. You want to copy the pixels from the pbuffer into your local buffer.

  • supply that pointer as the last parameter to the glTexSubImage3D() call.

edit: I am sorry. It seems you waited 5 days. That is strange. I never saw your original post for some reason.

Actually, now that I read it more closely, it sounds like you are saying that the “data1” pointer supposedly already contains the data.

If so, then what is the pbuffer for?

Just give the data pointer to the glTexSubImage3D() call. (one slice at a time)

I don’t think you get it.

He’s just trying to use a PBO to accelerate data download into a 3D texture, and he says the z parameter of glTexSubImage3D is ignored when the PIXEL_UNPACK_BUFFER is bound.

Which I can well believe. 3D textures are used rarely enough that I could certainly see this use-case getting ignored in testing.

Like I said before, best bet is to extract a sample program that demonstrates the flaw, and email it to NVIDIA or ATI’s helpdesk (whomever’s drivers are at work). It’s probably a driver bug, because the code looks fine.

Also? Pbuffer not the same thing as a pixel buffer object.

Ah. Thanks Lindley. I had PBuffers stuck in my brain.

I just tested it on ATI HD2900XT. Same thing happens, only 0th frame gets updated. Z offset ignored. Even worse, instead of 363.71 MB/sec I get on 8800GTX, ATI gets only 9.40 MB/sec.

One thing I’ve always been curious about is the effect of binding a given texID as multiple types of texture. For example, could you bind a texture as TEXTURE_2D for data download, and then bind it as TEXTURE_3D? Assuming you knew the data order, of course, and assuming that OpenGL’s texture size bounds allowed you to map the entire 3D texture into 2D.

Lindley, I do not think that would be possible.

Anyway, I have tested this yesterday on 7600GT using 163.75 and it works as it should, which means that my code is correct, and that there is an OpenGL driver bug in G80 specific code path.

I have now submitted a problem report to NVIDIA. We’ll see what they have to say about it.

As for ATI, I am sooo glad I don’t have to support their cards in our application. Not only it doesn’t work, it is also ~38x slower for no apparent reason.

Just to inform everyone that the code now works with 169.xx NVIDIA drivers, it was an OpenGL bug for G80.

whoops, no need to reply to a solved question :wink: