Hello I am beginning with gpgpu and had a few (simple) questions
I want to support OpenGl 2.0 cards (Ati 9500+, NVIDIA ?) and performance is crucial!
I have read a few tutorials (incl. dominik) and think I need a pingpong setup using multiple textures bound to one FBO (or is there a faster way these days)
Is there any (performace) penalty in using ARB_texture_rectangle
in stead of GL_TEXTURE_2D ? I do not know a lot about ARB/EXT, but ARB_texture_rectangle should as far as I can tell be supported by all 2.0 cards and not be any slower than GL_TEXTURE_2D… Non normalized textures are just really handy when it comes to gpgpu as I read it.
For my project I need bool(GL_RGBA2 ?)/ fixed (GL_RGBA8 ??) /float32(GL_RGBA32F_ARB) textures (input). Will GL_RGBA2 compared to GL_RGBA32F_ARB be faster to upload to the gpu and process by the GPU?
And, is GL_RGBA32F_ARB still not supported by NVIDIA OpenGl 2.0 cards?
Sorry about all the questions, but I couldn’t find any definite answers online.
Maybe is better to stick on CUDA. First, using OpenGL you can have issues regarding transfer speed, texture formats, platform incompatibility, driver version issues.
Then, what kind of computation you would like to perform. Only latest hw generation support more complex shaders, while previous generation have some limits and caveats.
Cuda is no option, I need to support a wider range of cards.
I know that most things I want to do can be done in the tiny gpu shader of for instance a 9600 openGl 2.0 card, although I do not know the minimum the 2.0 specs require in number of computations/per vertex fragment shader.
I had just these questions above about what the fastest way for these computations would be on 2.0.
Thanks, I thought the GL_RGBA2 would be perfect for boolean input, but I thought that the gpu would maybe cast it to float internally anyways, which would even mean a performance drop
Isn’t Shader Model a Microsoft thing or am I getting things mixed up here?
Well to be honest it probably does get cast, but I’d be surprised if it was to 32bit float. Even if it does you’ll save transfer time.
I have to add in a caveat here though; occasionally you come across a format that for some reason internal to drivers you get a disproportionate performance hit. I haven’t tried GL_RGBA2 as I said, so it’s possible that this is one of those formats.
Sorry of course SM is MS, I’m just so used to referring to generations by SM.
edit: btw fbos are still the way to go. Note each fbo must have the same texture and size attached to it.
So even though you thought I meant SM3.0, will your answers be applicable to OpenGl 2.0 ?
Well, as I’ll be using pingponging the transfer times won’t be that important, right?
What kind of precision will you get when using GL_RGBA8? 8bit integer?
I am also a bit confused about clamping. When using FBO-pingponging, will the values ever be clamped when for instance I use GL_RGBA32F_ARB?
Yes my answers should be applicable to opengl2.0 (i haven’t read the specs to confirm this though, but I’m pretty sure we’re talking about the same thing :)).
well cpu<>gpu transfer times obviously get less important the longer you are on the card, but you still have texture fetch costs which will be considerably lower with a lower precision format.
you won’t get clamping with RGBA32F, I’m not sure exactly what happens with RGBA8 though. I don’t think it will get clamped any more than the precision of the format.
RGAB8 will give you 8bit integer per component is my understanding, but I haven’t checked that myself (I do everything in RGBA32).