I’m currently doing a project which is doing some GPGPU via OpenGL and I appear to have found a path which causes a pretty massive slow down when using MRT on my X1900XT (WinXP x64, Cat7.2)
The part which is sucking all the resources is the main pass of the algorithm; It makes 4 reads from one texture, 1 read from another and writes out 2 vec4s to two 32bit floating point RGBA textures.
When the rendertarget and source textures are 40*40 in size the max speed of the program is approx. 40fps.
I’m not doing a great deal of ALU ops so I figured I was bandwidth limited somewhere, as I can’t reduce the number of reads I was using I instead turned off one of the writes; fps shot up to around 800fps…
From here I refactored the program, pushed in 2 extra passes and reduced all outputs to single render targets and the final fps came out at ~710fps.
Now, while I’m happy with the improvement I’m left wondering why on earth I took such a massive performance loss in the first place? I’m pretty sure the same kinda thing isn’t seen in D3D so is this another sign of poor ATI OGL drivers? or maybe it was something I did wrong?
Relivent C++ code included below, I’ve omitted the shaders simply because the new stuff does the same as the old just split over an extra couple of passes which leads me to belive the problem is either in my setup or some other state management;
// All textures created for RTT are setup with this function
void SetupRenderTarget(const int size)
{
glTexImage2D(GL_TEXTURE_2D,0, GL_RGBA32F_ARB,size, size, 0, GL_RGBA, GL_FLOAT,NULL);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
}
void TLMFullShader::GenerateTLMData()
{
glPushAttrib(GL_COLOR_BUFFER_BIT | GL_VIEWPORT_BIT); // Save the clear colour and viewport
// Setup the view for orthographic projection.
camera_.setMatricies();
// Switch to render target
rendertarget_.activate();
glClearColor(0.0f,0.0f,0.0f,0.0f);
glViewport(0,0,vertsperedge_,vertsperedge_);
glClampColorARB(GL_CLAMP_VERTEX_COLOR_ARB, GL_FALSE);
glClampColorARB(GL_CLAMP_READ_COLOR_ARB, GL_FALSE);
glClampColorARB(GL_CLAMP_FRAGMENT_COLOR_ARB, GL_FALSE);
// Setup for pass 1
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, energySource_); // Setup the energy source map
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, drivingMap_); // Setup driving map texture
// Setup the RT output
GLenum pass1RT[2] = { GL_COLOR_ATTACHMENT0_EXT, GL_COLOR_ATTACHMENT1_EXT};
rendertarget_.attachRenderTarget(heightMap_, 0);
rendertarget_.attachRenderTarget(energySink_, 1);
glDrawBuffers(2,pass1RT);
rendertarget_.checkStatus();
pass1_.use();
pass1_.sendUniform("energySource",1); // setup the sampler for the energy source map
pass1_.sendUniform("drivingMap",0);
pass1_.sendUniform("step",1.0f/float(vertsperedge_));
// Draw quad here
DrawQuad(0.0f, 0.0f, 1.0f, 1.0f);
// now we need to copy the height map to an VBO for later rendering
glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, heightBuffer_);
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glReadPixels(0,0,vertsperedge_,vertsperedge_,GL_BGRA, GL_FLOAT,NULL); // copy to VBO
glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB,0);
rendertarget_.detachRenderTarget(heightMap_, 0);
rendertarget_.detachRenderTarget(energySink_, 1);
... // rest off passes from here on out have no real effect on fps, above causes the major speed hit
}
Blend is off, depth test is also off.
‘rendertarget’ is just a thin wrapper over an FBO
‘pass1’ is just a thin-ish wrapper over a GLSL Program object.
glReadPixels isn’t the problem as the speed hit goes away when the DrawQuad() call is commented out with everything else enabled.
So, dodgy drivers?
Hardware limit?
Bad setup?
Other ideas?