Performance of glTexSubImage3D

Eivind · July 12, 2006, 2:40pm

Greetings,
I use glTexSubImage3D to update single voxels of a 3D texture, that is:

glTexSubImage3D(GL_TEXTURE_3D, 0,
                x, y, z, //position
                1, 1, 1, //w, h, d
                GL_COLOR_INDEX,
                GL_UNSIGNED_BYTE,
                voxval   //unsigned char[1]
                );

I have seen that one call to glTexSubImage3D will typically take 0.002 ms when the currently bound texture has dimensions 2x2x2.

However, the time required for a single call to glTexSubImage3D seems to increase with increasing size of the currently bound texture. For instance, when the currently bound texture has dimensions 32x32x32, a single call to glTexSubImage3D takes 0.04 ms, - a factor 20 increase.

To me it seems peculiar that the execution time of glTexSubImage3D depends on the size of the currently bound texture. The call should perform a data transfer, and that should depend on the amount of data to transfer, and not the size of the target.

I have so far tested on two different Nvidia-cards, both on Linux-boxes, so I’m suspecting this might be a bug in the Nvidia driver.

Is there anyone here who knows any reason why glTexSubImage3D should depend on the size of the target texture?

Any feedback will be greatly appreciated!

Kind regards,
Eivind

michael.bauer · July 12, 2006, 3:09pm

Originally posted by Eivind:
[b] Greetings,
I use glTexSubImage3D to update single voxels of a 3D texture, that is:

glTexSubImage3D(GL_TEXTURE_3D, 0,
                x, y, z, //position
                1, 1, 1, //w, h, d
                GL_COLOR_INDEX,
                GL_UNSIGNED_BYTE,
                voxval   //unsigned char[1]
                );
I have seen that one call to glTexSubImage3D will typically take 0.002 ms when the currently bound texture has dimensions 2x2x2.

However, the time required for a single call to glTexSubImage3D seems to increase with increasing size of the currently bound texture. For instance, when the currently bound texture has dimensions 32x32x32, a single call to glTexSubImage3D takes 0.04 ms, - a factor 20 increase.
[/b]
The relation between the size of your textures is 4096x, so a factor of 20x increase is only a weak evidence for a dependency between texture size and the time for a TexSubImage call. Maybe it’s only possible to replace complete slices or stacks of slices efficiently.

You might want to do some benchmarks to find out how texSubImage behaves. Have in mind, there might be some overhead for every update of a texture (related or not related to texture size) that might dominate the total time for a TexSubImage call that only replaces 1 texel … If I may ask, what do you need this for?

Michael

Eivind · July 17, 2006, 6:34am

Originally posted by mlb:
[b]
The relation between the size of your textures is 4096x, so a factor of 20x increase is only a weak evidence for a dependency between texture size and the time for a TexSubImage call. Maybe it’s only possible to replace complete slices or stacks of slices efficiently.

You might want to do some benchmarks to find out how texSubImage behaves. Have in mind, there might be some overhead for every update of a texture (related or not related to texture size) that might dominate the total time for a TexSubImage call that only replaces 1 texel … If I may ask, what do you need this for?

Michael
[/b]
Yes, the increase in time is small compared to the increase in texture size. But anyway I was suprised that there was an increase at all. As far as I can see, the overhead should be the same for each call, and the amount of data to transfer is the same.

Unless as you mention that a call to TexSubImage might lead to a whole slice of the texture beeing updated. That is a very good point.

Do you think this could be a hardware limitation, - that 3D textures can only be updated in whole slices? Or could it be the driver vendor who has chosen this implementation of glTexSubImage3D?

I am trying the TexSubImage call in a volume rendering application. The data set to be rendered changes at sparse, scattered locations. Instead of updating the entire texture memory whenever there is a small change in the dataset, I thought I might increase the performance by just updating the texture where the dataset has actually changed.

Thank you for your reply!

Eivind

michael.bauer · August 31, 2006, 3:27pm

Hi again,

The answer could be, of course, a complete row of a slice has been updated (That’s a factor of 16 – approximately the same as 20)

Did you finally manage to increase performance by updating subtextures?

cass · September 2, 2006, 6:24am

Hi Eivind,

I suspect this could be a granularity-of-update issue, but it’s hard to tell for sure with only two data points. What does the curve look like as you go from 2x2x2 through 128x128x128? If it’s a granularity thing, you should see a plateau.

Thanks -
Cass

Eivind · September 10, 2006, 2:28am

I have now tried the glTexSubImage3D call on ATI hardware instead of Nvidia. It seems the performance of glTexSubImage3D is considerably better on ATI than on Nvidia.

For a 2x2x2 texture, the glTexSubImage3D function returns on average ten times faster on ATI than what the function does on Nvidia. And on ATI the function does not spend more time to update a larger texture (no difference at all updating 2x2x2 vs 128x128x128). So on ATI I got the performance increase that I was after in my volume rendering application.

I’m concluding Nvidia has a bug in their driver, or that the issue is related to the layout of 3D textures on Nvidia hardware.

Thank you for your replies glRulez and cass. Let me know if you still would like to see some timings for Nvidia. I’m not familiar with ‘granularity-of-update’ - could it bee that Nvidia is affected by this issue while ATI is not?

Eivind

michael.bauer · September 10, 2006, 3:38pm

> I’m concluding Nvidia has a bug in their driver, or that the issue is related to the layout of 3D textures on Nvidia hardware.

It is not a bug, as long as it works even if it may be slow.

ATI supposedly uses a different memory layout than NVIDIA for 3D-textures. That would explain the different behaviour.

Could you post some timing results, please?