Shader run time

I’m back for more. This time I’m confused by some results I’m getting while measuring performance on my shaders. The interesting part of my shader looked like this:


for(int i = 0; i < probeSurfelCount; i++){
	int probeIndex = probeSurfelCount*index.y + i;
	int surfIndex = surfelIndex[probeIndex];
	color += colorIn[surfIndex].rgb * weightGroup[probeIndex].weights[index.x];
}

Where surfelIndex, colorIn and weightGroup(weights is just a float[6]) are all buffer objects I’m using to pass data. In this configuration the run time was 1.8ms. And to test a theory I removed the weight from the second last line and the time dropped to 1.4ms. So I assumed if I could limit the number of lookups in the large buffers I’m using the time should drop because if the weight was replaced with a static float it stayed the same.

So after some changes in the rest of my code the buffers surfelIndex and weightGroup were combined into a single buffer. The shader changed to reflect this and ended up with:


for(int i = 0; i < probeSurfelCount; i++){
	int probeIndex = probeSurfelCount*index.y + i;
	SurfelRef temp = surfelRefs[probeIndex];
	int surfIndex = temp.index;
	color += colorIn[surfIndex].rgb * temp.weights[index.x];
}

Where a SurfelRef is a struct that looks as follows:


struct SurfelRef {
	int index;
	float[6] weights;
};

However this increased the run time to 3.8ms. Removing the weight part from the second last line again lowered it back down to 1.4ms. I would expect “SurfelRef temp = surfelRefs[probeIndex]” to eliminate the look up cost for weights almost completely. As it stands I am fairly confused why this change would increase the run time by so much. If anything I would expect the run time to go down by this change. If it would be related to the size of the buffer going up then I would expect the run time without the weight to be increased as well but that remains the same.

Anyone have any ideas or explanations?

(As a side not I am using glQueryCounter(queryID[i], GL_TIMESTAMP); and glGetQueryObjecti64v(queryID[0], GL_QUERY_RESULT, &startTime); to get my performance times. So I would assume they are correct.)

(Edit: Minor change to the first code section.)