1D texture speed

jgrossm1 · November 5, 2007, 12:53pm

I was testing a shader program, and how fast it ran, by varying the dimensions of the textures I’m using to pass data to it. All the literature I have read so far says that it should run faster on 2D textures than 1D textures, as long as they have the same number of elements, so I did one run with a 4096x1 2D texture to simulate a 1D texture, and one run with 64x64 2D texture. However, the results show that 4096x1 ran faster than 64x64. Anybody know why this might be?

system · November 5, 2007, 6:01pm

It depends on hw design and drivers. It depends on filtering mode and from what texels you are sampling and mipmapping,

Assuming the entire texture is in cache, then the only thing that matters are how the texture samplers are implemented.

jgrossm1 · November 5, 2007, 7:20pm

ok, so would a linear organization of the data in cache produce this speed up?

Also, now that I think of it, in order to calculate texture coordinates in the shader source I use an embedded loop:

" for(i=0; i<int(particle_tex_height); i++) {"
" for(j=0; j<int(particle_tex_width); j++) {"

" particle_index.x = (float(j)+0.5) / float(particle_tex_width);"
" particle_index.y = (float(i)+0.5) / float(particle_tex_height);"
    }

}

so would the fact that one of those values being equal to 1 be sufficient to result in a time difference? Or just the different memory geometry?

Thanks

Lindley · November 5, 2007, 7:37pm

Loops are definitely slow in shaders.

jgrossm1 · November 5, 2007, 9:19pm

hmmm, yea, im thinking this must be it, or at least it makes sense

thanks for everyone’s contributions

Relic · November 6, 2007, 12:57am

jgrossm1:

Also, now that I think of it, in order to calculate texture coordinates in the shader source I use an embedded loop:

" for(i=0; i<int(particle_tex_height); i++) {"
" for(j=0; j<int(particle_tex_width); j++) {"
" particle_index.x = (float(j)+0.5) / float(particle_tex_width);"
" particle_index.y = (float(i)+0.5) / float(particle_tex_height);"
    }
}

It’s not only the for-loops per se, your code is just bad for performance.

Do not recalculate the value depending on i inside the inner loop where i is not changing
You could remove all casts and the 0.5 offset if you loop over a float.


for(y = 0.5; y < particle_tex_height; y += 1.0) 
{
  particle_index.y = y / particle_tex_height;
  for(x = 0.5; x < particle_tex_width; x += 1.0)
  {
    particle_index.x = x / particle_tex_width;
  }
}

Next you can eliminate the divides inside the loops by normalizing the loop parameters like this:


float xInc = 1.0 / particle_tex_width;
float yInc = 1.0 / particle_tex_height;
for(y = 0.5 * yInc; y < 1.0 ; y += yInc) 
{
  particle_index.y = y;
  for(x = 0.5 * xInc; x < 1.0; x += xInc)
  {
    particle_index.x = x;
  }
}

What else is happening in that loop?

In a fragment shader you can get these coordinates for free from interpolating over a quad, and in a vertex shader the same could be sent as a vertex attribute of a point primitive.

system · November 6, 2007, 3:42am

I was testing a shader program

I thought you were using the same shader in both cases.

system · October 19, 2021, 7:37pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.