Intel driver texture lookup limitation

Hi !

I’m currently working on shaders through Cg.
I’m on an old i915 intel card that seems to only support ARB programs.
My fragment shader implements a filtering algorithm on a texture, this means I run several texture lookups. And my driver ( open-source mesa 7.10 ) does not like that, telling me I did 5 out of 4 indirect texture lookups.
Indirect ? I google’d about that and it seems to mean getting the coordinates to look up a texture from another lookup. That’s not what I am doing, am I ? :confused:

I’m doing texRECT(texture, float2(coords.x+1, coords.y+1)) and so on. (actually, I use a 3x3 filter)

I tried passing the texture not as a uniform but as a TEXUNIT0 but same thing.

So, why is that considered indirect lookup ?

Thank you for your attention :slight_smile:

From mesa/src/gallium/driver/i915/i915_fpc_translate.c


static void
i915_fini_compile(struct i915_context *i915, struct i915_fp_compile *p)
{
   struct i915_fragment_shader *ifs = p->shader;
   unsigned long program_size = (unsigned long) (p->csr - p->program);
   unsigned long decl_size = (unsigned long) (p->decl - p->declarations);

   if (p->nr_tex_indirect > I915_MAX_TEX_INDIRECT)
      i915_program_error(p, "Exceeded max nr indirect texture lookups");

And i915_reg.h define I915_MAX_TEX_INDIRECT as 4

** Ohh… I love opensource **

Hardware have limitation on the number of instruction, texture access and number of indirect texture access(access to the texture with an offset) you can retrieve this number with

glGet(GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB)
GL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB

So, sorry your hardware can’t do that, try another technique (multiple pass or lower quality).

OK, so accessing with an offset is considered indirect. Too bad :frowning:

Thank you a lot :wink:

Use ‘multitexturing’ with same texture but slightly offset texcoords. So each texRECT will use directly its own texcoord, without any indirection. It may even end up faster.

Really the limitation is on the number of indirection phases, not total indirections. The shader compiler should re-order the instructions so that all of the temporary coordinates are computed in a batch, and then used in a batch, to minimize the number of phases.

Adding to what ZbuffeR is saying:

If the above is really what appears in your shader, create an additional varying for each texture coordinate:

VertexShader (GLSL, you’ll need to convert to the correct Cg code):


varying vec2 texcoord0, texcoord1, ...

void
main(void)
{
  texcoord0=whatever;
  texcoord1=texcoord0 + offset_constant1;
  texcoord2=texcoord0 + offset_constant2;
  texcoord3=texcoord0 + offset_constant3;
  texcoord4=texcoord0 + offset_constant4;
   .
   .
}

FragmentShader:


varying vec2 texcoord0, texcoord1, ...
uniform sampler2DRect rect_tex;
void
main(void)
{
   vec4 tex0, tex1, ...

   tex0=texelFetch(rect_tex, texcoord0);
   tex1=texelFetch(rect_tex, texcoord1);
   tex2=texelFetch(rect_tex, texcoord2);
   .
   .
}


You are almost guaranteed that this will run faster (letting the hardware doing the interpolation is typically cheaper than doing an extra operation in the fragment shader).

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.