Fast RGBA8 to INTENSITY8

NiCo1 · April 23, 2008, 5:39am

Hi guys,

I’m having some problems finding the optimal (fastest) way to convert textures with an internal format of RGBA8 to grayscale/intensity textures with an internal format of INTENSITY8.

I need to use the INTENSITY8 images because it is used for a GPGPU algorithm that operates on 8bpp intensity images. The conversion to grayscale also needs to be done in a preprocessing step because I don’t want to waste cycles on performing dot products to calculate the grayscale value each time I access the RGBA image. I know that calculating grayscale values can be performed in a single pass by taking the dot product of the rgba components with the weighting factors. The problem is that an INTENSITY8 image is not color-renderable, so I can’t write to it if I were to implement such a simple pass. It’s possible to perform the conversion pass from RGBA8 to another RGBA8 where I replicate the calculated intensity to all four components and then use copyTexImage or PBOs to create the INTENSITY8 image but I don’t think that’ll be much faster than calculating the intensity values on the CPU.

So my question is, given an RGBA8 texture, what is the fastest way of creating the equivalent INTENSITY8 texture without leaving the GPU.

Any help is appreciated

AlexN · April 23, 2008, 6:57am

Use PBOs: render your rgba -> intensity pass into a 1/4 width RGBA8 render target, where each output pixel contains 4 pixels worth of intensity values (blue = pixel 0, green = pixel 1, red = pixel 2, alpha = pixel 3). Read the render target contents into a PBO, then use this PBO as the source for glTexImage on an INTENSITY8 texture.

niko · April 23, 2008, 7:11am

I agree with AlexN, but I would use FBOs.

/N

NiCo1 · April 23, 2008, 7:12am

Thanks for the reply AlexN.

That also passed my mind, but like you said it requires 3 steps.

One shader pass
One transfer to pbo
One from pbo to texture

I have no calculations to perform between the 2nd and the 3rd step so I can’t exploit the asynchronous behavior as the 3rd step has to wait for the 2nd step to complete. The algorithm is intended to run on mobile workstations where the device to device memory bandwidth is roughly 4 times lower than desktops GPUs so I really doubt this would be faster than performing the calculations on CPU. It’s too bad I cant render directly to INTENSITY8 textures to eliminate all these copies…

NiCo1 · April 23, 2008, 7:15am

@niko: Can you be a little more specific?

Zengar · April 23, 2008, 8:46am

I would just do the conversion in the actual algorithm, as you said it is a single dot product == very cheap. Or alternatively, use render to a single channel of a RGBA8 texture instead of an intensity texture. You will waste some memory, but who cares…

NiCo1 · April 23, 2008, 9:19am

A dot product is very cheap indeed but not if it has to be performed many times for a single pixel. The first step of the algorithm performs a Gaussian blurring of the intensity image. So if I use 17 filter taps for the gaussian there are 2 options

1 - Apply the dot product to all 17 taps and sum the result of the scalar values -> lot of overhead compared to preprocessing the image to grayscale in a previous pass

2 - Sum the 17 RGBA values and perform the dot product on the 4-component result -> 4 times the amount of additions needed compared to intensity image

But like you said, I too believe that the fastest way is to use another RGBA8 image in an FBO and just replicate the grayscale values to all channels in a preprocessing pass.

Thanks for the help