Convolution of 4 corners of a texel using hardware Bilinear filtering

Hi

To get the convoluted value for a texel, I am trying sampling across 4 corners. I was able to do convolution by 3x3 matrix and doing gaussian-blur. But I want a more optimized way and so thought of going for bilinear filtering.
This is how I have done that. Wanted to know whether the implementation is correct. Becasue the output values differ from what I got while using 3x3 gaussian blur. The difference in each color value sometimes range upto 32; so that caused a concern about the implementation. Please review.

   " vec2 scale = vec2(offsetx,offsety);
" +
    " vec4 p00 = texture2D(sTexture, vTextureCoord + 0.5*scale*vec2(-1,-1));
" +
    " vec4 p02 = texture2D(sTexture, vTextureCoord + 0.5*scale*vec2(1,-1));
" +
    " vec4 p20 = texture2D(sTexture, vTextureCoord + 0.5*scale*vec2(-1,1));
" +
    " vec4 p22 = texture2D(sTexture, vTextureCoord + 0.5*scale*vec2(1,1));
" +
    " vec4 pconv = 0.25*(p00 + p02 + p20 + p22);
" +
offsetx = 1/width
offsety = 1/height

vec2 texcord11 = vec2(vTextureCoord.x , vTextureCoord.y)
vec4 p11 = texture2D (sTexture, texcord11)                                 // is the current texel.
and texture mapping is:

glTexParameterf(GL_TEXTURE_EXTERNAL_OES, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_NEAREST);
glTexParameterf(GL_TEXTURE_EXTERNAL_OES, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_LINEAR);