Optimizing pixel count

Hi,
I want to do some gpgpu. About CUDA I know that you have to take the number of multi-processors of a gfx-card into account and the number of “threads” that are processed in parallel.
Do I have to consider something like that by choosing a proper pixel cout when I use shaders via standard OpenGL?
Thx a lot

Have you read the programming guide? Then maybe look at some of the samples like GL post processing or N-Body simulation (personal fave).

CUDA newbie myself… waiting for a VS9 compatible version of nvcc to be released.

Hi & thx 4 your reply.
I know what to do when I use CUDA. But are there any things I have to notice when using shaders the “normal” way. Will it be worse, if I render 1634 pixels instead of 1632 (16=number of multiprocessors of my 8800, 32=number of threads running in pseude parallel on 1 multi-processor) or something like that, or doesn’t the CUDA-scheduling exist this way?

To get maximum performance you MUST need to know how your hardware and compiler is working (optimising).
But its hard to get this information.

There is a simple solution: try it out!

But rememeber, the optimisation you do on your platform may have a big performance decrease on another platform (or may even not work)!

You can only maximum optimise for one platform, keep this in mind!

(AMDs 4xxx loves long shaders (>200 Instr.), but NVIDIAs 2xx does not, it perfoms often better with less shader instructions per shader;
to optimise for both cards you have to handle different shaders, more komplex but less for AMD and less complex but more shaders for NVIDIA)

Thx for your advice. Could you give me an example? I know, I have a Grforce 8800 Ultra with 128 fragment shaders, each working at 1.5 GHz and e.g. 10000 jobs (=pixels to render). How should I organize my texture? TEXTURE_RECT seems a better choice than TEXTURE_2D because of TEXTURE_2Ds size limit to 2^x.
IS TEXTURE_RECT slower than TEXTURE_2D? Is TEXTURE_1D really slower than 2D?
Thx :slight_smile:

because of TEXTURE_2Ds size limit to 2^x.

Not since OpenGL 2.0, and nvidia card handle NPOT texture very well.

And since GL3, ARB_texture_non_power_of_two extension was promoted to core.

There is no generally advise what to use.
It depends on your code / API usage.

NPOT-Texture may be handled very well, but if you use for example a texture of 500x400 dimension, there may be a little performance increase using 512x512 texture, depending of your usage. Using these “faster” 512x512 texture and you need now a lot of “correction” code (because you need only 500x400 texture access or whatever) may have more performance costs than using “a little bit slower” 500x400 texture.

It really depends on your usage, there is no generally advise possible.

Again: check it out, and you will see… .

No, it was promoted to core in GL2. See chapter I.3 in http://opengl.org/documentation/specs/version2.0/glspec20.pdf

No, it was promoted to core in GL2. See chapter I.3 in http://opengl.org/documentation/specs/version2.0/glspec20.pdf

Yes my mistake! :slight_smile:

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.