CPU instead GPU

Bystrov_Sergey · January 18, 2007, 5:05am

Hello.

Iâ€™m trying to do hardware accelerated jpeg viewer. Once I saw some code sample on nvidia.com ( discrete cosine transform ). There is an example of code which can do some operation on GPU ( IDCT , DCT ). This code can simply extended to hardware accelerated jpeg viewer. Iâ€™ve done it successively. But Iâ€™m not satisfied. The problem isâ€¦ then Iâ€™m doing some operation on GPU ( by means of CG ) it seems like CPU is loading instead of GPU ( I can see it in profiler that execution of any GPU program takes a lot of CPU usage).

Do you know such problem ?

How do I execute program â€¦ Smth like this

Void Pass(CGprogram prog, RenderTexture *src,  RenderTexture *dest)
{
    dest->Activate();
    cgGLBindProgram(prog);
    cgGLEnableProfile(fprog_profile);
    glActiveTextureARB(GL_TEXTURE0_ARB);
    src->Bind();

    DrawQuad(dest->GetWidth(), dest->GetHeight(), width, height);
    src->Release();

    cgGLDisableProfile(fprog_profile);
    dest->Deactivate();
}


void DrawQuad(int w, int h, int tw, int th)
{
    glBegin(GL_QUADS);
    glTexCoord2i(0, 0); glVertex2i(0, 0);
    glTexCoord2i(tw, 0); glVertex2i(w, 0);
    glTexCoord2i(tw, th); glVertex2i(w, h);
    glTexCoord2i(0, th); glVertex2i(0, h);
    glEnd();
}

I use Gforce6800.

Zengar · January 18, 2007, 5:52am

When you load and run a Cg program, it will be executed on the GPU. How do you come to the conclusion that it is not so? Is it slower then using a pure CPU-decoding approach? maybe a tip: loading a shader is a performance hit, because it has to be compiled and set up. Instead, try changing it only when you need. Same goes to texture binding.

Bystrov_Sergey · January 18, 2007, 6:22am

Originally posted by Zengar:
When you load and run a Cg program, it will be executed on the GPU. How do you come to the conclusion that it is not so? Is it slower then using a pure CPU-decoding approach?
No. If I will comment only GPU operation: the FPS remains the same( some limitation works ) and I saw not correct picture, but CPU usage is much lower. And I can also see in profiler that Pass function takes a lot of CPU usage

dolf · January 18, 2007, 9:17am

Just an uneducated guess: what if you take either one or both of

cgGLBindProgram(prog);
cgGLEnableProfile(fprog_profile);

out of Pass and place them in the program initialization code? It seems to me that these are the performance eaters.

Bystrov_Sergey · January 18, 2007, 10:04am

Originally posted by dolf:
[b] Just an uneducated guess: what if you take either one or both of

cgGLBindProgram(prog);
cgGLEnableProfile(fprog_profile);

out of Pass and place them in the program initialization code? It seems to me that these are the performance eaters. [/b]
Ok. But if I want to do diffirent operation with the textures by turns. operation1 then operation2 then oparation1 again. And it seems to me that I have to call cgGLBindProgram all the time before next operation that is different from the previous. Is’t it ? ( I have to call cgGLBindProgram very often).

I guess you are right in the case of cgGLEnableProfile.

Thanks.

Bystrov_Sergey · January 18, 2007, 10:11am

Originally posted by dolf:
[b] Just an uneducated guess: what if you take either one or both of

cgGLBindProgram(prog);
cgGLEnableProfile(fprog_profile);

out of Pass and place them in the program initialization code? It seems to me that these are the performance eaters. [/b]
Any way if I remove cgGLBindProgram & cgGLEnableProfile from pass, It’s not makes cpu usage lower -(

Korval · January 18, 2007, 12:07pm

I don’t understand your problem.

You’re drawing quads in the most inefficient way possible. You’re constantly binding and unbinding shaders.

How exactly is it that you expect the CPU to not be doing work?

Bystrov_Sergey · January 18, 2007, 12:35pm

Originally posted by Korval:
[b] I don’t understand your problem.

You’re drawing quads in the most inefficient way possible. You’re constantly binding and unbinding shaders.

How exactly is it that you expect the CPU to not be doing work? [/b]

As I already sad Any way if I remove cgGLBindProgram & cgGLEnableProfile from pass, It does’t decrease cpu usage
How is it possible do not do BindProgram before executing new CG task ( not the same as pervious) ?
Why do you think that bind takes a lot of cpu usae ?

Zengar · January 18, 2007, 1:55pm

Sergey, as we don’t have the code of your program, we can’t really tell, but there are some basic things… First, applications that use graphics almost always utilize 100% of CPU, because they use a continues rendering loop. On a contrary, a usual GUI application spends most of it’s time idle, waiting for events. So this is most probably the reason for your high CPU usage. Even if you don’t use a continuous loop, there is a lot of work inside a driver to manage your rendering. BindProgram usually forces the Cg program to be compiled, and that is HELL lot of CPU time.
So here is a most basic advice: do not judge upon your CPU load, judge upon general performance. If your program runs faster then software decoders, then everything is OK and your task runs on GPU. If no, then you’ve made a mistake somewhere.

Korval · January 18, 2007, 5:13pm

As I already sad Any way if I remove cgGLBindProgram & cgGLEnableProfile from pass, It does’t decrease cpu usage

How is it possible do not do BindProgram before executing new CG task ( not the same as pervious) ?

Why do you think that bind takes a lot of cpu usae ?
You missed the most important statement: “You’re drawing quads in the most inefficient way possible.”

If every quad you use uses a different shader, then you’re going to run slow. Period. But if you’re just changing shaders when you don’t have to, you should be using a VBO and sorting your data by shader.

Bystrov_Sergey · January 19, 2007, 5:27am

Originally posted by Korval:
[b]
You missed the most important statement: “You’re drawing quads in the most inefficient way possible.”

If every quad you use uses a different shader, then you’re going to run slow. Period. But if you’re just changing shaders when you don’t have to, you should be using a VBO and sorting your data by shader. [/b]
I donâ€™t understand. Why am I drawing Quad in the most inefficient way possible ?
Iâ€™m not changing shaders when I donâ€™t have to.

Scheme is simple. With each frame I have to do IDCT ( first operation ) and then color conversion to RGB ( second operation ). Of course I have to call cgGLBindProgram twice per frame ( isâ€™t it ? ).

And then as I think cgGLBindProgram is not difficult operation for CPU. And I donâ€™t think that cgGLBindProgram forces Cg program to compile ( I think that this is connected with cgCreateProgram).

So why my way is the most inefficient ? And how is it possible to change it ?

Please make me discount. I have nave work with OpenGl before this task.