CG_PROGRAM_LOAD_ERROR -> The Program could not load.

Hi,

I have to use a quite complicated vertex shader.

I am compiling under the vp40 profile.
I can always compile, but however, if the number of instructions exceeds about 240, then I get a CG_PROGRAM_LOAD_ERROR and cannot run my program.

What can I do?
The number of instructions should be 2^16??

Thanks a lot.
Chris

I tried to increase the number of instructions.
For instruction counts higher than 2048, i get a cgc compiler error, that says, that no more then 2048 instructions can be used with vp40 profile. With fragment shader in fp40 the limit is at 4096.
How is this related to the stated number of instructions for SM3.0 of 2^16?

There are two different counts. Number of instructions that can shader can contain (2048 in your case) and number of instructions that can be executed during processing of single fragment (those 2^16). Because SM3 shaders can contain loops some instructions from those 2048 instruction slots can be executed several times until limit of processed instructions is hit.

thank you for the fast reply.

i thought of something like this.

but my problem is not solved.
i can variate the compile time number of instructions in my program. and sometimes, the program can be lod with more than 400 instructions.
but i don’t think, that the load program error is caused by increasing the instruction count of 2^16 by this program.

so what else could be the issue for that.

and what effect does the number of “R-regs”, right to number of instructions in the end of cg-compiler output have?

chris

Originally posted by nitschke:
but my problem is not solved.
i can variate the compile time number of instructions in my program. and sometimes, the program can be lod with more than 400 instructions.

Maybe you are hitting limit other than instruction count (e.g. number of interpolators, number of environment constants). What is difference between shader that can be load and shader that fail the load?


and what effect does the number of “R-regs”, right to number of instructions in the end of cg-compiler output have?

The lower the number of R registers is, the better can nVidia hw hide various latencies in the shader which is likely to result in better performance of the shader.

Because vp40 profile is based on ARB_vertex_program extension you can try to load your program directly using that extension and then you can get error string which will likely contain some indication, why the load failed.

The easiest way would be to find some example program for that extension on the internet and replace its shader with yours.

thanks!

i tested somehow further and r-regs are not the problem, cause number was increasing, but shader could be load this time.

the difference is just some increase in the number of instructions. i increase the max value for an index variable in a loop. but i don’t think, i hit the max number of runtime instructions.

the index need to be in [0,7]. it works for [0,2] but if index gets higher than 2, then the program could not be load.

i tested somehow further and r-regs are not the problem, cause number was increasing, but shader could be load this time.

Number of r-regs itself influences only performance of shader not if shader can be loaded.


the index need to be in [0,7]. it works for [0,2] but if index gets higher than 2, then the program could not be load.

What the loop is doing? It is accessing some arrays?

As I have said in my previous post, the fastest way to find the problem will probable be the ARB_vertex_program extension.

i see,
but i have to use vertex texture fetches, so i am quite restricted to vp40.
but i can just try the code that causes the problem, without texture reading.
thanks for the hint.
i will try it!

im doing a quadratic loop
for int i=0; i<8; i++
for int j=0; j<8; j++

  • a length-call
  • an if statement,increasing a count variable

after the loop, i am testing count in another if statement and change a boolean flag according to the test

vp40 profile is implemented as extension to ARB_vertex_program. That extension is enabled by “OPTION NV_vertex_program3;” keyword inside source code of the vertex program. Program itself is then used in exactly the same way as ordinary ARB_vertex_program would so testing program does not need to have special support for that extension.

hey komat,

can you tell me how to directly load the program with ARB_vertex_program extension and to get the error string?

thanks a lot!
chris

glEnable(GL_VERTEX_PROGRAM_ARB);
glGenProgramsARB(1, &programnum);
glBindProgramARB(GL_VERTEX_PROGRAM_ARB, programnum);	
glProgramStringARB( GL_VERTEX_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, code_length, source_code ) ;

ubyte * error_string = glGetString( GL_PROGRAM_ERROR_STRING_ARB )

GLint error_position_offset ;
glGetIntegerv( GL_PROGRAM_ERROR_POSITION_ARB, &error_position_offset ) ;

i have to use some branching in my shader code.

when i do, i increase performance, cause it seems, the code is really not executed.

BUT
if i use to much if/then statements, the program cannot be load by cg.
if i remove some if/then and the code is executed, the framerate decreases, but i CAN load the shader.

so for me, it seems like a problem NOT dependent on number of instructions.
could there be some issues with branching?

is there any tricks, to avoid branching or simulate with some arithmetics?

chris

Originally posted by nitschke:
so for me, it seems like a problem NOT dependent on number of instructions.
could there be some issues with branching?

On GeForce 6 and better nVidia hw there are some limitations on number of nested ifs, calls and loops in fragment programs however i do not know similiar limitations in vertex program. It is likely that there are some however with much higher limits than in fragment programs and the compiler will likely avoid them.

Have you tried to get the error string? Additionaly you may try to contact nVidia.


is there any tricks, to avoid branching or simulate with some arithmetics?

You can simulate the calculation effect not the optimalization effect. You calculate both variants and then select the correct result using something like lerp( condition_false_calculation_result, condition_true_calculation_result, selector )
where selector is float variable that does contain 1.0 if condition is true and 0.0 otherwise. If selector calculation is simple, the compiler is likely to calculate it using mathematics instead of real if operation or you can use that mathematics directly.