PDA

View Full Version : ATI slow when using ARB_half_float_vertex data



BrianDFS
12-23-2010, 11:49 AM
I just got this code up and running, so it's possible I might have something wrong.

However, at the moment I'm rendering a 780k+ triangle (1.5M verts) mesh. When using 32-bit floats for the position, normal, and uv data, I'm getting 110+ FPS.

When I switch over to using 16-bit (packed) floats, performance drops to around 10 FPS. The rendered results are correct though, which confirms the data is correctly packed.

Are there any examples on the net showing how other people are doing this? My Google searches don't find anything useful.

Test card is HD4850. Drivers are Catalyst 10.9 Win7 x64.

The same code results in a 5 FPS gain on similar NV hardware.

skynet
12-24-2010, 05:57 AM
It might be an alignment problem. I once mis-aligned 32bit float data to 2-byte boundaries and it resulted in horrible performance and corrupted rendering on ATI.

Try to blow-up your vertex data from 3 to 4 half floats (by stride or using 4 components), so each vertex is 4-byte aligned again.

BrianDFS
12-24-2010, 07:22 AM
It might be an alignment problem. I once mis-aligned 32bit float data to 2-byte boundaries and it resulted in horrible performance and corrupted rendering on ATI.

Try to blow-up your vertex data from 3 to 4 half floats (by stride or using 4 components), so each vertex is 4-byte aligned again.

Yeah, that was my first thought as well. Unfortunately, I already tried padding the vertex structure with a 4th half-float. I'll double check it and test again just to make sure though.

Thanks for the suggestion.

BrianDFS
12-24-2010, 08:31 AM
Okay, so I went back and played around with the padding again and discovered the following:

1. Padding or not padding the vertex structure makes no difference when passing in 3 for the "size" parameter to glVertexAttribPointer as well as passing in the correct "stride".

2. If I pass in 4 for the "size" parameter to glVertexAttribPointer, things to run at full speed. Of course, when passing in 4 for the size, I have no choice but to pad the vertex structure.

The thing that confuses me about part 1, is how/why when padding the structure to 4-bytes and only passing in 3 for size but passing in 8 for stride, that doesn't allow things to run at full speed?

In summary, it appears that on ATI cards if you want to use HALF_FLOATS, it has to be in pairs of two (i.e. 4-byte aligned).

Kind of a bummer, but not a total loss I guess.

Alfonse Reinheart
12-24-2010, 09:00 AM
passing in 3 for size but passing in 4 for stride

Um, shouldn't that be 8 for the stride? Half-floats are 2 bytes each. So three of them padded to a 4-byte alignment would require a stride of 8.

BrianDFS
12-24-2010, 10:07 AM
passing in 3 for size but passing in 4 for stride

Um, shouldn't that be 8 for the stride? Half-floats are 2 bytes each. So three of them padded to a 4-byte alignment would require a stride of 8.

Yes, that's correct -- a typo on my part.