Hi,everyone,
I wonder how I can determine the pipelines’ number in program, such as how many pipelines for vertex and pixel respectively. In this way, we can come to know the parallelism for each component.
Another problem is about the fragment program performance. When I enable the fragment program by the following:
glEnable(GL_FRAGMENT_PROGRAM_NV);
glBindProgramNV(GL_FRAGMENT_PROGRAM_NV, ShaderID);
The performance drop dramatically even only ONE intruction in that fragment program.
The primitive is following:
glBegin( GL_QUADS );
glMultiTexCoord2fARB(GL_TEXTURE0_ARB, 0, 0);
glMultiTexCoord2fARB(GL_TEXTURE1_ARB, 0, 0);
glVertex2d( 0, 0);
glMultiTexCoord2fARB(GL_TEXTURE0_ARB, fTexCoord[0], 0);
glMultiTexCoord2fARB(GL_TEXTURE1_ARB, fTexCoord[0], 0);
glVertex2d( _iWidth, 0);
glMultiTexCoord2fARB(GL_TEXTURE0_ARB, fTexCoord[0], fTexCoord[1]);
glMultiTexCoord2fARB(GL_TEXTURE1_ARB, fTexCoord[0], fTexCoord[1]);
glVertex2d( _iWidth, _iHeight);
glMultiTexCoord2fARB(GL_TEXTURE0_ARB, 0, fTexCoord[1]);
glMultiTexCoord2fARB(GL_TEXTURE1_ARB, 0, fTexCoord[1]);
glVertex2d( 0, _iHeight);
glEnd();
const int _iWidth = 256;
const int _iHeight =256;
With Fragment program disabled, the cost is less than 0.01ms. Just one output instruction, the performance drops to 90ms. Add an texture fetch instructure, the performance drops to 150ms.
Are there any tricks to overcome or to full dig out the potential of HW.
My machine is nVidia Quadro4 980 XGL with latest driver.
Thanks for any comments.
Happy New Year!
[This message has been edited by foollove (edited 12-31-2003).]