MRT : Massive performance hit on ATI hardware

I’m currently doing a project which is doing some GPGPU via OpenGL and I appear to have found a path which causes a pretty massive slow down when using MRT on my X1900XT (WinXP x64, Cat7.2)

The part which is sucking all the resources is the main pass of the algorithm; It makes 4 reads from one texture, 1 read from another and writes out 2 vec4s to two 32bit floating point RGBA textures.

When the rendertarget and source textures are 40*40 in size the max speed of the program is approx. 40fps.

I’m not doing a great deal of ALU ops so I figured I was bandwidth limited somewhere, as I can’t reduce the number of reads I was using I instead turned off one of the writes; fps shot up to around 800fps…

From here I refactored the program, pushed in 2 extra passes and reduced all outputs to single render targets and the final fps came out at ~710fps.

Now, while I’m happy with the improvement I’m left wondering why on earth I took such a massive performance loss in the first place? I’m pretty sure the same kinda thing isn’t seen in D3D so is this another sign of poor ATI OGL drivers? or maybe it was something I did wrong?

Relivent C++ code included below, I’ve omitted the shaders simply because the new stuff does the same as the old just split over an extra couple of passes which leads me to belive the problem is either in my setup or some other state management;

// All textures created for RTT are setup with this function
void SetupRenderTarget(const int size)
{
	glTexImage2D(GL_TEXTURE_2D,0, GL_RGBA32F_ARB,size, size, 0, GL_RGBA, GL_FLOAT,NULL);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
	glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
	glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
	glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
}

void TLMFullShader::GenerateTLMData()
{
	glPushAttrib(GL_COLOR_BUFFER_BIT | GL_VIEWPORT_BIT);		// Save the clear colour and viewport

	// Setup the view for orthographic projection.
	camera_.setMatricies();
	// Switch to render target
	rendertarget_.activate();

	glClearColor(0.0f,0.0f,0.0f,0.0f);
	glViewport(0,0,vertsperedge_,vertsperedge_);
	glClampColorARB(GL_CLAMP_VERTEX_COLOR_ARB, GL_FALSE);
	glClampColorARB(GL_CLAMP_READ_COLOR_ARB, GL_FALSE);
	glClampColorARB(GL_CLAMP_FRAGMENT_COLOR_ARB, GL_FALSE);

	// Setup for pass 1
	glActiveTexture(GL_TEXTURE1);	
	glBindTexture(GL_TEXTURE_2D, energySource_);	// Setup the energy source map
	glActiveTexture(GL_TEXTURE0);
	glBindTexture(GL_TEXTURE_2D, drivingMap_);		// Setup driving map texture
		
	// Setup the RT output
	GLenum pass1RT[2] = { GL_COLOR_ATTACHMENT0_EXT, GL_COLOR_ATTACHMENT1_EXT};
	rendertarget_.attachRenderTarget(heightMap_, 0);
	rendertarget_.attachRenderTarget(energySink_, 1);
	glDrawBuffers(2,pass1RT);
	rendertarget_.checkStatus();
	pass1_.use();
	pass1_.sendUniform("energySource",1);	// setup the sampler for the energy source map
	pass1_.sendUniform("drivingMap",0);
	pass1_.sendUniform("step",1.0f/float(vertsperedge_));
	// Draw quad here
	DrawQuad(0.0f, 0.0f, 1.0f, 1.0f);

	// now we need to copy the height map to an VBO for later rendering
	glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, heightBuffer_);
	glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
	glReadPixels(0,0,vertsperedge_,vertsperedge_,GL_BGRA, GL_FLOAT,NULL);	// copy to VBO
	glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB,0);
	rendertarget_.detachRenderTarget(heightMap_, 0);
	rendertarget_.detachRenderTarget(energySink_, 1);

... // rest off passes from here on out have no real effect on fps, above causes the major speed hit
}

Blend is off, depth test is also off.
‘rendertarget’ is just a thin wrapper over an FBO
‘pass1’ is just a thin-ish wrapper over a GLSL Program object.
glReadPixels isn’t the problem as the speed hit goes away when the DrawQuad() call is commented out with everything else enabled.

So, dodgy drivers?
Hardware limit?
Bad setup?
Other ideas?

Originally posted by bobvodka:
I’m currently doing a project which is doing some GPGPU via OpenGL and I appear to have found a path which causes a pretty massive slow down when using MRT on my X1900XT (WinXP x64, Cat7.2)

Same here. The drivers recompile all shaders each time the number of render targets change. So in your case the shaders are recompiled twice per frame. This is a known problem (at least since 10/2006), but devrel@ati.com told me that they won’t fix it.

Btw, can anyone here say whether this also applies to XP x86 or Vista x86/x64 ?

Malte

ouch… well, that certainly explains it, thanks for clearing that up.

Kinda sucks that they aren’t going to fix it either as it basically makes MRT unusable on ATI hardware.

This could be the straw which sends me back to NV…

WTF?

I haven’t use MRT so far. Does this mean, that ATI drivers ALWAYS recompile all shaders, when you use MRT and switch the buffers?

That would make it useless, indeed. And i was hoping to do deferred rendering sometime… :frowning:

Jan.

Well, to directly quote the reply I’ve just got from ATI/AMD’s Devrel;

You are correct, the problem with using MRTs in OpenGL is a known issue and we have no plans to fix it in our current driver. Although there is a chance that this could be implemented in a different way in our Vista driver, we do not have any official word. Thus, I would suggest that you find another way to implement your project.

So, it might be fixed in the Vista drivers (but honestly, I wouldn’t count on it) you can pretty much count it out for XP’s drivers.

Congrats to AMD, they just lost a graphics card customer and I’ll be happy to advise others against AMD/ATI branded cards in the future :slight_smile:

You could just wait for Longs Peak to see if that fixes it.

I could, however I wanted to start using it now, not in 6 to 8 months time when drivers appear and are working.

Thus, I would suggest that you find another way to implement your project.

Or drop support for ATI cards, like we have.
Seriously, I can’t even begin to imagine nvidia taking that attitude with developers.

To be fair, how many GL applications use MRT anyway? The main GL applications (Doom3 engine, Maya, XSI, some CAD stuff) don’t. So really, there is little incentive to proceed with fixing it.

Especially with Longs Peak coming around the corner.

OK, accepted, but then why even support the MRT extension to GLSL? Because it’s practically un-usable in it’s current state.

It’s not even like the non-power-two extension, where if you stick to a few guidelines it’s usable on current hardware…

OK, accepted, but then why even support the MRT extension to GLSL?
Is it in OpenGL 2.1? If so, there’s your answer.

If i remember correctly, ATI was first to support MRT, at all. Didn’t they even have an ATI specific extension (ATI_DrawBuffers, or so)?

That was long before OpenGL 2.1. Pretty pointless, IMO.

Jan.

Yes, ATI was the first to have MRT, and it was exposed as an ATI-specific extension.

ATI’s attitude in the above post from their devrel is the same inexcusable attitude I’ve gotten from them for years now. OpenGL and people who use it seem to be nothing more than an annoyance to ATI. Getting an existing feature to work well or correctly is hard enough, but try getting them to implement an important feature like EXT_packed_depth_stencil or EXT_framebuffer_multisample and you’ll learn a lesson in futility. (I did see that EXT_packed_depth_stencil is in the extensions string under Vista, but their implementation is completely broken and unusable.)

Nvidia, on the other hand, has continually shown amazing support for OpenGL. Just look at all those awesome G80 extensions! And they actually work the way they’re supposed to. Life would be much better if I could just drop ATI support like knackered did.

Here’s the funny thing.

ATi/AMD have two chairpersons on subgroups within the Kronos GL ARB. They have the chairs for the Ecosystem and Shading Language subgroups.

And yet they have what is probably the weakest implementation of said shading language.

I’ve always been under the assumption that ATi is busy preparing for Longs Peak. That once writing a GL implementation becomes more manageable, they’ll get better drivers.

At this point though, I don’t know. Will they even bother to support Longs Peak? Will they still support 2.1 after LP hits?

ATi’s a big question mark, and their ineptitude is really holding OpenGL back.

Let me clarify the answer from devrel above. There’s no plan to fix it in the current driver, that is, the legacy driver that currently ships on XP. However, the new driver that currently ships on Vista is a totally redesigned driver built from scratch, the famous “OpenGL rewrite” that’s been rumoured on the net for quite a while. I don’t know if the same problem exists on that driver, my gut feeling would be that it probably doesn’t, but I haven’t tried so I can’t say for sure right now, but if it does, then certainly it will be fixed in that driver. This new driver currently only ships on Vista, but soon enough it’ll ship for all platforms and hardware and the legacy driver will be retired. The legacy driver has been on the backburner for quite a while now while the majority of the driver team has been working on the new driver. Since this project hasn’t been public until very recently it could easily have been perceived from the outside that ATI stopped putting effort into OpenGL, while the truth was actually the opposite. Rewriting a driver from scratch is a major undertaking, and the project has been going on for a couple of years. During this time guys like me have had a hard time defending the fact that certain issues would not get immediate attention. While it certainly did not help me do my job, in the long run the new driver will be a better foundation to build our GL implementation on and won’t have some of the architectural problems of the legacy driver and the situation should improve now that the new driver is out in the wild and developers start using it.

Basically, what I’m saying is that while I easily understand that it could have looked that way, it’s certainly not the case that ATI doesn’t care about OpenGL, it’s just that rewriting the driver has been a massive task and unfortunately (like most software projects) has taken longer than originally projected. The good news is that the new driver is now out there and I encourage everyone trying it out on Vista if you get a chance.

That’s very good news.

I really hope, that LP is out soon and that ATI’s driver will support it shortly after that. OpenGL 2.1 is a mess and for me it does not make sense to begin writing a new renderer with it.

Jan.

A very frank, honest and encouraging answer, humus. Thanks.

This new driver currently only ships on Vista, but soon enough it’ll ship for all platforms and hardware and the legacy driver will be retired.
Does that men we’ll get usable linux drivers within finite time? :stuck_out_tongue:

Basically, what I’m saying is that while I easily understand that it could have looked that way, it’s certainly not the case that ATI doesn’t care about OpenGL, it’s just that rewriting the driver has been a massive task and unfortunately (like most software projects) has taken longer than originally projected. The good news is that the new driver is now out there and I encourage everyone trying it out on Vista if you get a chance.

Well, to be honest, someone somewhere should have said something to the developers, heck even that reply from devrel could have been better than the vague ‘yeah, it might have been fixed, we don’t know’ which I got. It comes across as not caring and in this game PR and talking to the devs is worth a lot; basically a more definative answer such as ‘there is a new version in the works, which will be released soon and for all platforms, which will address this issue’ would have been enough.

Now, to be fair, I’ve just checked the state of things out wrt the Vista driver and while it leaves me with some screen corruption (which given the newness of the drivers and the arch. is fair enuff… and it goes away when a repaint is forced by dragging a window around) MRT, FBO and GLSL appear to work at a decent framerate (although I think there might be a z-buffer issue in the 7.2 drivers as I’m pretty sure I could see into a cube I shouldn’t have been able to see in to, I’ll swap to XP x64 to confirm at some point), so I’ll refrain from yelling about this for now.

Still, this is a good reason behind why it’s important to talk to the developers, lets face it if I hadn’t have kicked up a fuss like this about it we woudln’t even no now would we?

Originally posted by Overmind:
Does that men we’ll get usable linux drivers within finite time? :stuck_out_tongue:
The new driver is cross-platform and Linux support has been one of the important goals from the project start of the new driver, rather than an afterthought as in the legacy driver. I’m not going to give any guarantees as I personally haven’t even tried the new driver on Linux yet, but since it’s built on the same code from the start I think it’s reasonable to expect roughly the same quality as the Windows driver, even though of course the driver model is different between different OSes and there are other OS specific pecularities, so there may still be Linux specific issues in the future as well (as well as Windows specific ones of course).