PDA

View Full Version : 1D texture speed



jgrossm1
11-05-2007, 01:53 PM
I was testing a shader program, and how fast it ran, by varying the dimensions of the textures I'm using to pass data to it. All the literature I have read so far says that it should run faster on 2D textures than 1D textures, as long as they have the same number of elements, so I did one run with a 4096x1 2D texture to simulate a 1D texture, and one run with 64x64 2D texture. However, the results show that 4096x1 ran faster than 64x64. Anybody know why this might be?

V-man
11-05-2007, 07:01 PM
It depends on hw design and drivers. It depends on filtering mode and from what texels you are sampling and mipmapping,

Assuming the entire texture is in cache, then the only thing that matters are how the texture samplers are implemented.

jgrossm1
11-05-2007, 08:20 PM
ok, so would a linear organization of the data in cache produce this speed up?

Also, now that I think of it, in order to calculate texture coordinates in the shader source I use an embedded loop:

" for(i=0; i<int(particle_tex_height); i++) {"
" for(j=0; j<int(particle_tex_width); j++) {"

" particle_index.x = (float(j)+0.5) / float(particle_tex_width);"
" particle_index.y = (float(i)+0.5) / float(particle_tex_height);"
}
}

so would the fact that one of those values being equal to 1 be sufficient to result in a time difference? Or just the different memory geometry?

Thanks

Lindley
11-05-2007, 08:37 PM
Loops are definitely slow in shaders.

jgrossm1
11-05-2007, 10:19 PM
hmmm, yea, im thinking this must be it, or at least it makes sense

thanks for everyone's contributions :)

Relic
11-06-2007, 01:57 AM
Also, now that I think of it, in order to calculate texture coordinates in the shader source I use an embedded loop:

" for(i=0; i<int(particle_tex_height); i++) {"
" for(j=0; j<int(particle_tex_width); j++) {"

" particle_index.x = (float(j)+0.5) / float(particle_tex_width);"
" particle_index.y = (float(i)+0.5) / float(particle_tex_height);"
}
}


It's not only the for-loops per se, your code is just bad for performance.

- Do not recalculate the value depending on i inside the inner loop where i is not changing
- You could remove all casts and the 0.5 offset if you loop over a float.



for(y = 0.5; y < particle_tex_height; y += 1.0)
{
particle_index.y = y / particle_tex_height;
for(x = 0.5; x < particle_tex_width; x += 1.0)
{
particle_index.x = x / particle_tex_width;
}
}


Next you can eliminate the divides inside the loops by normalizing the loop parameters like this:



float xInc = 1.0 / particle_tex_width;
float yInc = 1.0 / particle_tex_height;
for(y = 0.5 * yInc; y < 1.0 ; y += yInc)
{
particle_index.y = y;
for(x = 0.5 * xInc; x < 1.0; x += xInc)
{
particle_index.x = x;
}
}


What else is happening in that loop?

In a fragment shader you can get these coordinates for free from interpolating over a quad, and in a vertex shader the same could be sent as a vertex attribute of a point primitive.

V-man
11-06-2007, 04:42 AM
I was testing a shader program
I thought you were using the same shader in both cases.