I’m back for more. This time I’m confused by some results I’m getting while measuring performance on my shaders. The interesting part of my shader looked like this:
for(int i = 0; i < probeSurfelCount; i++){
int probeIndex = probeSurfelCount*index.y + i;
int surfIndex = surfelIndex[probeIndex];
color += colorIn[surfIndex].rgb * weightGroup[probeIndex].weights[index.x];
}
Where surfelIndex, colorIn and weightGroup(weights is just a float[6]) are all buffer objects I’m using to pass data. In this configuration the run time was 1.8ms. And to test a theory I removed the weight from the second last line and the time dropped to 1.4ms. So I assumed if I could limit the number of lookups in the large buffers I’m using the time should drop because if the weight was replaced with a static float it stayed the same.
So after some changes in the rest of my code the buffers surfelIndex and weightGroup were combined into a single buffer. The shader changed to reflect this and ended up with:
for(int i = 0; i < probeSurfelCount; i++){
int probeIndex = probeSurfelCount*index.y + i;
SurfelRef temp = surfelRefs[probeIndex];
int surfIndex = temp.index;
color += colorIn[surfIndex].rgb * temp.weights[index.x];
}
Where a SurfelRef is a struct that looks as follows:
struct SurfelRef {
int index;
float[6] weights;
};
However this increased the run time to 3.8ms. Removing the weight part from the second last line again lowered it back down to 1.4ms. I would expect “SurfelRef temp = surfelRefs[probeIndex]” to eliminate the look up cost for weights almost completely. As it stands I am fairly confused why this change would increase the run time by so much. If anything I would expect the run time to go down by this change. If it would be related to the size of the buffer going up then I would expect the run time without the weight to be increased as well but that remains the same.
Anyone have any ideas or explanations?
(As a side not I am using glQueryCounter(queryID[i], GL_TIMESTAMP); and glGetQueryObjecti64v(queryID[0], GL_QUERY_RESULT, &startTime); to get my performance times. So I would assume they are correct.)
(Edit: Minor change to the first code section.)