PDA

View Full Version : Slow shaders on RADEON X300/X550 Series (128 MB)



devdept
06-01-2009, 02:19 AM
Hi All,

We have a texture blurring code that runs very slow only on this hardware:


Graphics card: RADEON X300/X550 Series (128 MB)
Graphics card: RADEON X300/X550 Series Secondary (128 MB)
Chiptype: RADEON X300/X550 Series (0x5B60)
3D accelerator ATI Radeon X300 (RV370)
Installed driver: ati2dvag (6.14.10.6575)

RAMDAC frequency: 400 MHz
Pixel pipelines 4
TMU per pipeline 1
Vertex shaders 2 (v2.0)
Pixel shaders 1 (v2.0)
DirectX support, hardware DirectX v9.0
Pixel Fillrate 1296 MPixel/s
Texel Fillrate 1296 MTexel/s

ATI GPU Registers:
ati-00F8 08000000
ati-0140 00000070
ati-0144 1A289111
ati-0148 D7FFD000
ati-0154 F0000000
ati-0158 31320032
ati-0178 00001017
ati-01C0 01FF0000
ati-4018 00010011
ati-CLKIND-0A 03301D04
ati-CLKIND-0B 00001A00
ati-CLKIND-0C 0400BC00
ati-CLKIND-0D 00807FFA
ati-CLKIND-0E 04002400
ati-CLKIND-0F 00000000
ati-CLKIND-12 00031212
ati-MCIND-6C 00000000




Chipset: Intel Grantsdale-G i915G

GPU code: RV370 (PCI Express x16 1002 / 5B60, Rev 00)
GPU speed: 324 MHz (orginal: 324 MHz)

CPU type Intel Pentium 4 520, 2800 MHz (14 x 200)

Supported: x86, MMX, SSE, SSE2, SSE3



OpenGL Extensions Viewer 3.0 says:

Renderer: ATI Radeon X300/X550/X1050 Series
Vendor: ATI Technologies Inc.
Memory: 128 MB
Version: 2.1.8543 Release
Shading language version: 1.20


To me all the info tell that this machine can fully support a shader program, while the real machine make us think that we need to disable shader support on it.

Why? The shader code follows below.


Thanks,

Alberto


// size of kernel for this execution
const int KernelSize = %len%;

// array of offsets for accessing the base image
uniform float Offset[KernelSize];

// value for each location in the convolution kernel
uniform float KernelValue[KernelSize];

// image to be convolved
uniform sampler2D BaseImage;

void main()
{

int i;
vec4 sum = vec4(0.0);

for (i = 0; i < KernelSize; i++)
{
vec4 tmp = texture2D(BaseImage, gl_TexCoord[0].st + vec2(Offset[i], 0));
sum += tmp * KernelValue[i];
}

gl_FragColor = sum;

}

ZbuffeR
06-01-2009, 03:13 AM
For what value of KernelSize is it slow ?

To me "supported" and "usable" are kind of orthogonal. A small micro benchmark at runtime, during an "auto-detect settings" phase, allow to make a better decision whether to use a feature or not.

The user should always be able to force the use of any supported feature, even if it does not pass the "usable" framerate, but defaults settings should really take in account the real performance.

devdept
06-01-2009, 03:38 AM
Hi ZbuffeR,


static int kernelSize = 19;
It is so small...

devdept
06-01-2009, 06:22 AM
ZbuffeR,

I understand you point but if we need to test everything for speed before using it, all the version numbers what are there to do?

Thanks,

Alberto

bertgp
06-01-2009, 06:28 AM
Maybe on this platform, Separable Convolution would be a better fit for your algorithm. This would reduce the number of texture lookups at the expense of an intermediary texture write.

http://http.developer.nvidia.com/GPUGems/gpugems_ch21.html

You should find what the bottleneck is however before doing all this.

Stephen A
06-01-2009, 06:36 AM
The X300/X500 cards are very, very slow. A 19-tap kernel will basically destroy them.

Unfortunately, version numbers don't tell a whole lot about performance. You either have to measure at runtime, as suggested, or build a list of video cards beforehand.

devdept
06-02-2009, 05:34 AM
Ok, the only viable solution is to test speed at runtime. So we can check and disable blurring.

In general what is the recover approach in the case the time becomes acceptable? (I know that in shader case it's impossible to get better results) You set a flag = false and never do the computation again, but what if the model changes and blurring can be done?


Thanks again,

Alberto

ZbuffeR
06-02-2009, 08:13 AM
You mean, when the user upgrades its video card ?
A big fat button labelled "re-detect graphic settings".
Or you can do that silently at each startup (it should be fast).
Or check the GL_VENDOR GL_RENDERER GL_VERSION strings, if any one changes, redo the auto-detect.

The best solution will depend on your application.
I still have some trouble understanding what you sell exactly, is that a low level graphic engine, a scenegraph, ... ?

devdept
06-02-2009, 09:16 AM
No ZbuffeR,

I mean in general, suppose you have a very complex scene and you decide to turn off some feature to keep the navigation fast enough, then the scene becomes simpler: what is the best approach to re-activate complex/slow features? If you continuously try to see how fast complex features are you will end up with a slow fps.

Do you remember some program that instead of objects draws boxes to allow smooth navigation on slow machines? Perfect, suppose that the scene comes simpler while navigating, how do you re-activate accurate object representation?

Or maybe there is no way and the user needs always to press one button to change LOD and get different performaces?

We develop a small software component that allow 3D models visualization.


Thanks again ZbuffeR,

Alberto