PDA

View Full Version : calculating overall luminance of a texture

12-08-2015, 11:02 AM
I need to calculate the average luminance of a texture so I can modify the gain and level of the system generating the image/texture.

the user can choose 4 different exposure gates to auto adjust the gain/level; small, medium, large and extra large. Extra large is the entire texture of 1920x1080.

I tried doing the calculations in my application code, but it takes 18ms to process a full image.

Here is the code I was using:

/* Determine the exposure gate row/col offsets */
First_Row = Exposure_Gate_Y[Video] * SOURCE_HEIGHT / 2;
Last_Row = SOURCE_HEIGHT - Exposure_Gate_Y[Video] * SOURCE_HEIGHT / 2;
First_Col = Exposure_Gate_X[Video] * SOURCE_WIDTH / 2;
Last_Col = SOURCE_WIDTH - Exposure_Gate_X[Video] * SOURCE_WIDTH / 2;

/* clear the image luminance value */
Luminance = 0.0;

/* process all the rows of the sub-image */
for( row = First_Row; row < Last_Row; row++ )
{
/* Get pointer to first pixel at row/col */
Pixel_Ptr = Capture_State.Frame_Buffers[Video][buf[Video].index].pData +
First_Row * SOURCE_WIDTH * COLOR_ELEMENTS +
First_Col * COLOR_ELEMENTS;

/* process all the columns in the sub-image */
for( col = First_Col; col < Last_Col; col++ )
{
/* sum the pixel luminance for level calculations */
Luminance += *(Pixel_Ptr + 0) * 0.0721 +
*(Pixel_Ptr + 1) * 0.7154 +
*(Pixel_Ptr + 2) * 0.2125;

/* Increment pixel pointer */
Pixel_Ptr += 3;
}
}

I was thinking I could put the one line of code shown below into the shader program, but I am torn about how to get the data back out to the application.

Luminance = texture(tex, texpos).r * 0.2125 +
texture(tex,texpos).g * 0.7154 +
texture(tex,texpos).b * 0.0721;

With the massive parallel processing of the fragment shader; if I just increment the Luminance variable, there would almost have to be a race condition between instances of the shader.

Memory isn't a problem, so I was thinking I could pass a 1920x1080x1 array and use the "texpos" to index into it. At least this way there will be no race condition.

What would be the best way to accomplish this? Any help would be appreciated.

mhagain
12-08-2015, 12:23 PM
Enable automatic mipmap generation and sample from the smallest (1x1) mip level.

12-08-2015, 02:23 PM
I wasn't generating any mipmaps because I only ever use the full texture(s).

Since I update the texture(s) every frame @ 60Hz (it is actually a video stream) and I am already using 50% of the GPU, I was concerned about killing the GPU. Worst case, I update two full size textures. 1920x1080, every frame.

Assuming the GPU can handle it, how do I sample from the texture in the application?

mhagain
12-08-2015, 02:48 PM
Oh, sorry, I didn't understand that you needed the resulting value in the application.

That's going to involve a readback from the GPU whichever way you do it, so that's going to be your biggest performance killer. You'll create a CPU/GPU sync point and destroy concurrency/parallelism.

12-08-2015, 03:32 PM
Because I am writing to the texture every frame, I already have sync points.

I create a sync after I update the texture with glTexSubImage2D so the draw can use the texture, and also after the draw uses the texture so the capture thread can update the texture without collision.

The system runs smooth.

I just don't know how to get a value out of the fragment shader.

Can you help with that?

GClements
12-08-2015, 04:01 PM
The way to average a texture using parallelism is the same approach that you'd use to generate mipmaps, i.e.compute an average over each block of texels, then average the averages, divide-and-conquer. You can do this manually, but it's debatable whether you can improve over the built-in mipmap generation. One thing which might favour a manual approach is that you don't need all of the intermediate levels, only the final level. Another is that you can merge the generation (or extraction) of the luminance value into the first stage, rather than creating a separate luminance texture (although you could still use the built-in mipmap generation for the remaining levels).

As for the issue with synchronisation, that will be less of a problem if you can tolerate a few frames of latency. Allocate two or more sets of buffers so that you can start processing the next frame before the processing of the current frame has finished. It will also be less of a problem if you can remove the CPU from the process entirely, e.g. copying the result directly from a texture to a uniform buffer without going via the CPU.

If you must have the result on the CPU, use a sync object to check when the result is available. Whereas most OpenGL commands are simply enqueued, anything which returns data to client memory has to block until the data is actually available, so you don't want to start the transfer before then.

12-08-2015, 05:02 PM
I am confused. What is the first stage?

How do I
compute an average over each block of texels, then average the averages?

If I put the above line of code into the fragment shader, I will have a value for each texel.

I can't have latency, I will be writing a new texture/image every frame.

I need the data to get to the CPU, so it can notify a different system of the gain/level changes needed to balance the image it is generating dynamically.

GClements
12-08-2015, 09:08 PM
I am confused. What is the first stage?

How do I

compute an average over each block of texels, then average the averages?

If I put the above line of code into the fragment shader, I will have a value for each texel.

With mipmaps, each level is (typically) a quarter of the size of the one above; i.e. each texel corresponds to a 2x2 block of texels in the layer above. If you were manually generating the mipmaps for e.g. a 256x256 texture using a compute shader, the process might be (roughly):

1. Bind the texture
2. Set the minification filter to GL_LINEAR_MIPMAP_NEAREST.

3. Bind mipmap level 1 (128x128) to an image unit for writing.
4. Call glDispatchCompute() with the x and y work group counts set to 128x128 divided by the work group size (i.e. 128x128 invocations in total).
5. Each invocation of the compute shader computes the average value of a 2x2 block of texels from mipmap level 0 (reading from their common corner with linear filtering will return their average) and writes the result to a pixel of the image.

6. Bind mipmap level 2 (64x64) to an image unit for writing.
7. Call glDispatchCompute() with the x and y work group counts set to 64x64 divided by the work group size (i.e. 64x64 invocations in total).
8. Each invocation of the compute shader computes the average value of a 2x2 block of texels from mipmap level 1 and writes the result to a pixel of the image.

Repeat until you've written the 1x1 level.

In other words, a divide-and-conquer algorithm, where you break the calculation into chunks and execute each chunk in parallel, then apply the same process to the results from the individual chunks.

In this case, the first stage is the generation of the 128x128 level from the 256x256 level. If you're doing this to compute an intensity value (rather than to generate mipmaps), but the source texture is RGB, the conversion from RGB to intensity can be performed in the shader. The output texture (which would be distinct from the original) would only have a single channel, so subsequent invocations (128x128->64x64 and after) would already be receiving intensity values rather than RGB triples.

If you don't need the intermediate mipmap levels, then you can combine multiple stages. So rather than the first stage calculating an average over a 2x2 block of pixels, it might calculate an average over e.g. a 4x4 block, generating the 64x64 level directly from the 256x256 level, skipping the 128x128 level. But you want the total number of blocks to be high enough to utilise all of the GPU's cores.

Whether the potential optimisations are enough to beat built-in mipmap generation is something that can only be determined by experimentation (although it's possible that someone has done this already).

mhagain
12-09-2015, 10:05 AM
Because I am writing to the texture every frame, I already have sync points.

I create a sync after I update the texture with glTexSubImage2D so the draw can use the texture, and also after the draw uses the texture so the capture thread can update the texture without collision.

The system runs smooth.

I just don't know how to get a value out of the fragment shader.

Can you help with that?

Typically loading a texture to the GPU will run smoother than reading one back. When loading a texture the driver can detect if the texture is currently in use for drawing, and if so copy the data off to a temporary area (or even create it's own temporary texture) rather than having to stall.

When reading back the driver must stall. It must wait for all outstanding calls to finish, and this may be up to 3 frames worth of outstanding calls. Reading back can totally wipe-out your performance in ways that loading may not.

To read back from a specific mip level you can use glGetTexImage (https://www.opengl.org/sdk/docs/man/html/glGetTexImage.xhtml). So generate a full mipmap chain, figure which mip level is the smallest and glGetTexImage it (if you really must read back).

12-10-2015, 09:46 AM
GClements,

That is what I was missing, thank you.

I didn't really know/understand the compute shader. I had seen it in the pipeline, but did not have to write one so I never dove deeper.

I can tell you that doing it in the CPU takes 2 dual-core hyper-threaded processors almost a full frame (16.66 ms) to do the averaging. With two textures per frame, I had to allocate 4 of my 6 processors just to averaging pixels.

I will give the graphics processor a try at mim-map generation and see what it costs me.

12-10-2015, 09:53 AM
mhagain,

I will try letting the GPU generate the mip-maps and read the smallest, 1x1, image. I don't think I can continue to support the processor load of doing it myself.

BTW, I am writing a real-time app that only draws 2-d lines and words on the screen with a textured quad underneath. the texture is a real-time image from a multi-channel capture card.

I am operating with only a single frame delay. I write a texture and use it in the current draw thread. I bought a monster machine. 6 dual-core processors (12 hyper-threaded cores), 16Gb memory and the nvidia Quadro K5200 graphics card with 5Gb of memory.