Multiple Convolution passes on 3Dtexture nopingpon

Hello,
I have practice differents techniques in order to implement 2D and 3D texture filtering with opengl.
For the example purpose i test my code with a n passes laplacian convolution filter.

1st experiment with 2D textures: the ping-pong pattern generally used for iterative texture processing is not necessary as it seems that the texture cache take in charge the texture state synchronisation usually done by a temporary texture.
Then
for(n step) tex1=convolution(tex1)
replaces the pingpong pattern
for (n step/2){
tex2=convolution(tex1);
tex1=convolution(tex2)
}

So I’ve tried to generalize this GPU cache behavior for 3D texture convolution iterations and have then implemented 3D texture rendering using instanced quad slice.

However the texture state synchronisation between each convolution iteration denoted with 2D texture processing seams to per slice and not for the overall texture…!!
Result: each slice i processing at iteration j is not sure to sample texels states of slices i+1 and i-1 at iteration j-1 (in 3D it is often the state at iteration j and then is not what we need…)

I’m not a GPU whisperer so:

I would like to know if someone
have experienced this kind of behavior of texture caching…

and if anyone have pointers about How NVIDIA/ATI GPU texture cache policy work, i would be very glad to him

Thank in advance