Tricky operation slow on Windows

I am trying something tricky under OpenGL on a multiplatform game. I would like to:

a) Extract a smallish rectangle (192x192) from a large texture (2048x2048). The colour data received will be used to create “debris”.

b) Then immediately destroy parts of the same large texture (filling them with alpha=0). The modified texture will then be used to continue in the game.

All is well and fast under linux in various computers I have tried. Windows 10 works but gets choppy, maybe 100ms delay every time an explosion occurs. I try doing only a) and not b), and there is no choppiness, same with b) and not a).

This is how I read the pixels from the texture:


glBindFramebuffer(GL_READ_FRAMEBUFFER, my_fb);
glFramebufferTexture(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, world_tex, 0);
glReadPixels(x, y, 192, 192, GL_BGRA, GL_UNSIGNED_BYTE, my_buffer);

I also tried glGetTextureSubImage and glCopyImageSubData and get the same choppiness under Windows (fine under Linux). But they are not supported on some legacy hardware.

My code to destroy parts of the terrain:


glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, world_tex, 0);
glBufferData(GL_ARRAY_BUFFER, buffer_size, buffer_data, GL_STREAM_DRAW);
glDrawArrays(GL_POINTS, 0, num_points);

I am interested if there is a better way to do this, or a reason why Windows is slow.

Steve

i think thats not correct, to read a pixel from the currently bound framebuffer (target “GL_FRAMEBUFFER” or “GL_READ_FRAMEBUFFER”), you have to specify the buffer to read from:


struct {float r, g, b, a;} my_buffer[192 * 192];

glBindFramebuffer(GL_READ_FRAMEBUFFER, my_fb);
glReadBuffer(GL_COLOR_ATTACHMENT0);
glReadPixels(x, y, 192, 192, GL_BGRA, GL_UNSIGNED_BYTE, my_buffer);

This should not have anything to do with Linux, Windows or whatever OS. The only important factors are the power of the hardware you use, the manufacturer and the drivers.

Do I read your code correctly: You are drawing a bunch of individual points on top in your scene? How many points is that? If it is the same area as you have just read with glReadPixels, why not draw a quad?

You should also be aware that any kind of readback from the GPU is going to be slow. At the very least it will stall the pipeline and force all pending commands and draw calls to complete before it can do the readback: in other words, you’re completely destroying CPU/GPU parallelism. From your description it sounds like you’re trying to brute-force an algorithm on the CPU for which there may be a much faster (and more elegant) GPU-only solution. I’d encourage you to investigate that approach rather than continuing as you are.

[QUOTE=Cornix;1283290]This should not have anything to do with Linux, Windows or whatever OS. The only important factors are the power of the hardware you use, the manufacturer and the drivers.

Do I read your code correctly: You are drawing a bunch of individual points on top in your scene? How many points is that? If it is the same area as you have just read with glReadPixels, why not draw a quad?[/QUOTE]

Yes it must be the drivers, as I have tested with high power GPUs on Windows and performed worse than much weaker GPUs on Linux.

Unfortunately I can’t draw a quad, the pixels to remove are highly targetted (there is terrain that can and cannot be destroyed).

I have read about Pixel Buffer objects - do you think I could use them in this situation - set up a read operation, destroy the terrain, then wait for the transfer, and create the debris when it’s finished?

Steve