Glsl/nvidia/ati

Sort of an odd problem.

On my leopard mac, with graphics update, is there a way to query how many texture2D and shadow2DProj are possible? It appears my NVIDIA GeForce 7300 GT can handle as many as I need. Yet, the ATI Radeon 9800 goes kaput. There is no looping. Ironically, the 9800 is the better card. :stuck_out_tongue: If I lower the number of texture2D calls then the ATI card gets happy again and does what is expected. The number of calls to texture2D and shadow2DProj is related to the percent closer sampling for shadows etc. I lowered the number of calls for ATI but I get a little worried the next card down the street might blow. The programs compile fine but blow on ati when there are too many calls.

Could be related to this issue. According to this site, the number of texture indirections of the 7300 GT is much larger than the number of texture indirections of the Radeon 9800.

The GLSL limits are the same as the ARB fragment program limits, +/- language and compiler differences. But GLSL doesn’t let you query the limits, it just gives you an info log, and requires software fallback when hardware limits are exceeded. So it is hard to know what will work, and what will “work” but be emulated.

Going back to ARB fp, compare MAX_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB and MAX_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB in this table.

(r9800 = 32 lookups, 4 indirections. gf7300 = 4096 lookups, 4096 indirections.)

Thanks for the tips.

I counted about 24 calls. Half shadow2DProj and roughly half texture2D. Another textureCube for normalization. When I want to blend something, then I have to call many times.

Its probably the indirection. I’d assume this means I can only use the results of 4 texture lookups together – ati being the limiting factor.

One has to wonder. This blending issue by sampling the texture many times is a common problem. Might be nice if glsl had one simple call to magically sample many times. Maybe just supply a kernel to it or something.

Using texturecubes for normalization is deprecated. normalize() is much faster.

Are you suggesting normalize in a fragment shader is not going to cause the program to switch into software mode because its not supported in hardware? My understanding is that using a textureCube is the safer thing to do. Supposedly, newer cards would not have the problem. Though, calling normalize in a vertex shader is not a problem. I suppose this is knowledge from the cg tutorial book. But I figure that glsl might have the same issues.(?) I started with cg, then switched to glsl.

it depends on where you hit an hardware limitation : if run out of texture samplers, it is better to use normalize() instead, whatever the cost.

The Cg tutorial was written for GeForce FX hardware. AFAIK, all better hardware (Radeon 9500+, GeForce 6+) tends to prefer normalization in code.

Hmm, looks like normalize is the way to go. Probably more accurate … maybe the driver lis using a lookup table? Not enough resources to test all these cards but they can all do vanilla opengl.

My initial problem is a texture indirection issue. I figure for ati I can #ifdef ATI and make it go away. I check the vendor before I compile.

Thanks for tips.

Hmm, digging deeper into this indirection issue.

So, to clarify, whenever you dynamically modify a texture coordinate and fetch a texture value you create an indirection. But a texture indirection stalls out the ati pipeline – main difference with the nvidia card. So, they limit these indirections because they don’t want the thread count to explode.(?)

Now, it appears that calling shadow2DProj 9 times actually worked on my 9800. Though, calling texture2D seemed to have a problem. Maybe, I’m nuts. But I figure getting at the depth is a separate mechanism?

I see the indirection problem creating issues if you need to sample something like an 8 bit height map. Or sampling a texture for shadow mapping.

Trying to think why 4 indirections would be ok. I suppose the solution is just to pre calculate in graphic converter.

For an 8 bit map, to introduce noise into it, you can put the same map into r,g and b. Maybe gaussian blur g and b. Then average them together to get something smoother than 8 bits in the fragment shader.

From other stuff I’ve read, it seems to indicate a floating point solution would be a problem on nvidia! He he … there is the symmetry in the incompatibility matrix but at least you could sample more. :slight_smile:

Of course, if you created a texture dynamically with a fbo, then you a really in trouble on the ati card!

As far as I know the HW in question operates similiary to following pseudocode.


for ( i = 0 ; i < block_count ; i++ ) {
    sample_textures_using_state_of_registers_from_start_of_this_block();
    do_alu_operations() ;
}

where the block_count is limited to 4 (something like more powerfull ATI_fragment_shader). This means that if texture instruction needs value generated by the shader, it must be generated by one from blocks preceeding it. On the other hand the driver should be able to fit several idependent samplings (e.g. from shadowmap using different offsets calculated in previous block) into one block if sufficient number of temporary registers is available.

Hmm, nesting it into a subroutine seemed to help.

Thanks again.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.