PDA

View Full Version : VBOs and ATI



Leadwerks
04-03-2008, 09:58 PM
When I disabled VBOs on my ATI 2400 HD Pro test card I got about a 1000% increase in framerate, even when drawing a repeating mesh.

Note to self: never use VBOs on ATI hardware.

Also, glGenMipmapsExt() does not seem to work on ATI cards. However, their latest drivers seem to have fixed a lot of glsl problems, so I am calling off the attack on AMD headquarters.

Nicolas Lelong
04-03-2008, 11:21 PM
Just of out of curiosity, what are use using instead of VBOs ? Display Lists ?
What was the vertex format you used to fill the VBOs ?

I think it should be time for me to have another look at my ATI test cards !...

Leadwerks
04-03-2008, 11:27 PM
Vertex arrays.
The terrain patch I was testing with was just a vertex position buffer, with an interchangable attribute buffer. Making the attribute buffer a vertex array instead of a VBO gained about a 10% increase. Replacing the position buffer with an array resulted in an exponential gain in speed.

The rendering routine was like this:
Set buffers/arrays
change attribute buffer/array
draw
change attribute buffer/array
draw
change attribute buffer/array
draw
change attribute buffer/array
draw
unset buffers/arrays

The fact that the position array was so much faster was surprising to me, because it is being rendered over and over, something that VBOs are very good for.

ProsperLOADED
04-04-2008, 12:21 AM
What is the format of your position array VBO ? Maybe the format isn't natively supported by the card and the driver has to copy it back to system memory, convert it and send it back to the GPU each time you use it, that would explain the bad performances.

Lord crc
04-04-2008, 12:34 AM
In which case the driver would be retarded. Then again, it's ATI drivers we're talking about here...

When you say "change attribute buffer" Leadwerks, you mean you have bunch of static attribute buffers, and just change the active one? Or do you dynamically change the buffer?

Komat
04-04-2008, 12:40 AM
ATI cards require all elements to be 32bit aligned and have size which is multiple of 32 bits (e.g. properly aligned 4 ubytes are fine, 3 ubytes are not). Otherwise the driver must convert the data during drawing which is very costly when VBOs are used.

skynet
04-04-2008, 01:19 AM
In which case the driver would be retarded. Then again, it's ATI drivers we're talking about here...

At the time the driver allocates the memory (glBufferData) it has no idea what kind of data will be stored in it. glDrawElements() is the first opportunity for the driver to check what kind of data streams it has to fetch. And in case of non-hw-supported data formats the only thing it can do is to copy back to sysmem, do a conversion on the fly and then copy back (into some temporary buffer). Nvidia suffers the same problems. Just make sure to use common, hw-supported data formats and everything will be fine with VBOs. Just aother tip: to actually make sure that the VBO is allocated in VRAM, use GL_STATIC_DRAW as usage hint.

Mikkel Gjoel
04-04-2008, 02:21 AM
Oh wouldn't it be lovely to be able to check these sort of things. Like a glGetWarning() for really obvious performance-issues or something? :p

Relic
04-04-2008, 03:02 AM
It is. :cool: NVIDIA offers that with the PerfKit:
http://developer.nvidia.com/object/glexpert_home.html

havokentity
04-04-2008, 03:19 AM
NVIDIA cards don't slow down when theres no 32 bit alignment?

Relic
04-04-2008, 03:46 AM
They probably do as well.
I always use 4 byte aligned data anyway. And most of the time everything as floats which never hits this 3ub snag or other unsupported formats.
If you want colors as unsigned bytes use 4ub.
The only other thing which is slightly ok, is normals as signed shorts, but that has issues with representing some values exactly.

Lord crc
04-04-2008, 04:01 AM
And in case of non-hw-supported data formats the only thing it can do is to copy back to sysmem, do a conversion on the fly and then copy back (into some temporary buffer).

He said the buffers were static, in which case it is retarded of the driver to not keep the converted buffer. But that's just my humble opinion :)

Komat
04-04-2008, 04:36 AM
NVIDIA cards don't slow down when theres no 32 bit alignment?
When I was hit by this alignment issue few years ago, Nvidia did not have that limitation. From the threads that appear from time to time on this forum it seems that this did not change.

Komat
04-04-2008, 04:40 AM
He said the buffers were static, in which case it is retarded of the driver to not keep the converted buffer. But that's just my humble opinion :)
Interpretation of the VBO depends on setting of the arrays at time the draw call is issued. This can change between calls even if the content of the buffer did not. You might even render from one buffer with several different settings and only some from those settings might be incompatible while other can fit within hw limitations.

Unless one specific combination of setup and vbo is simply identifiable by the driver (like is the case with display lists), there is imho no effective way to handle such caching.

Leadwerks
04-04-2008, 08:02 AM
The position array is 12 bytes, so that is not an issue.

The only time I have run into the 32-bit alignment problem was when I was using RGB byte values for a color array. I simply switched to using RGBA values.

I DO use shorts for the attribute array, and there is an odd number of vertices in each patch, but using a vertex array instead of a VBO makes only a small difference with the attribute array. Switching these values to floats would mean a gigantic increase in memory usage.

Komat
04-04-2008, 08:32 AM
I DO use shorts for the attribute array, and there is an odd number of vertices in each patch, but using a vertex array instead of a VBO makes only a small difference with the attribute array.
The shorts are supported however there must be 2 or 4 of them in one element (vertex attribute).

Komat
04-04-2008, 08:53 AM
I do not know what is current state of things however on some older hw, mixing of arrays from system memory with arrays from video memory (VBO) in single call caused performance issues. The hw required that all vertex data are present in the same type of memory so the driver had to copy the data as necessary.

Relic
04-04-2008, 09:05 AM
Affirmative. That holds true today. Don't mix attributes in standard vertex arrays with attributes in VBO arrays or you get a performance penalty.

Lord crc
04-04-2008, 09:19 AM
Unless one specific combination of setup and vbo is simply identifiable by the driver (like is the case with display lists), there is imho no effective way to handle such caching.

It could easily do some "thinking" when it has to convert a buffer, and figure out that perhaps it would be smart to keep the converted buffer as well as the original. That is, "hey I've had to convert the exact same buffer X times in the last N frames, perhaps I should keep the converted one as well".

Anyway, given the current state of things, I can't really blame them for not working their ass off to optimize the bazillion different combinations that OpenGL allows for.

Leadwerks
04-04-2008, 09:29 AM
Vertex attribute buffer
Position buffer
12 FPS

Vertex attribute array
Position buffer
15 FPS

Vertex attribute array
Position array
100 FPS

The attribute array changes for each render (single call) but the position array remains the same, for dozens of draw calls.

Komat
04-04-2008, 10:01 AM
What if you render using the VBO position buffer only?

Komat
04-04-2008, 10:27 AM
It could easily do some "thinking" when it has to convert a buffer, and figure out that perhaps it would be smart to keep the converted buffer as well as the original. That is, "hey I've had to convert the exact same buffer X times in the last N frames, perhaps I should keep the converted one as well".

So if one buffer is used by several different setups you will have it in the memory several times. Additionaly you need heuristic to determine when to release those caches. That heuristic can still interfere in unexpected way with different program having the same issue and different usage pattern. Such program then might have spikes of low framerate in seemingly random situations.

Leaving it slow so the developer will notice it and update its program is easier and more reliable.

Leadwerks
04-04-2008, 10:49 AM
Here's a test with meshes, which are a but more conventional rendering:

1000 oildrum meshes, 256 polys each.

VBOs:
21 FPS

Vertex arrays:
19 FPS

So it may be that the way I was rendering terrain was a unique situation.

Lord crc
04-04-2008, 11:04 AM
What if you render using the VBO position buffer only?

And in case you haven't, what if you use the same usage hint for both position and attribute? Since someone mentioned it was a killer to have the buffers in different memory, perhaps that will help.

Lord crc
04-04-2008, 11:10 AM
Such program then might have spikes of low framerate in seemingly random situations.

Imho that won't be so different from what it is now, where you get constant low framerates in seemingly random situations (hw/driver combinations). Given that you have a bazillion different ways of issuing geometry in OpenGL, I'd prefer it if the driver was a bit smart, since, after all, it's written by someone who knows the hardware.

Jan
04-04-2008, 11:33 AM
Why should the driver be "a little bit smart", only because the one who writes the OpenGL app is "a little less smart".

It doesn't make sense to optimize the case, that the developer does stupid/wrong things.

Leadwerks: Do you have your indices in a VBO too (preferably as unsigned shorts) ? Because with VBOs you _should_ get much better speed compared to conventional arrays. At least with position-data only.

Little hint: Your first posts starts with a rant about how bad ATI is, but you haven't really given THAT much information about how you set up the pipeline. From my experience ATI is just as good as nVidia with VBOs, so i would assume your use-case is just inefficient / rare. But with the given information i can't really tell.

Jan.

Leadwerks
04-04-2008, 11:37 AM
My post does not start with a rant about how bad ATI is. I said that their latest drivers fixed a lot of problems they had before. I am very happy with ATI right now.

It seems that with conventional mesh rendering, VBOs offer a slight advantage on ATI hardware, and the way I am rendering terrain works better with vertex arrays. I am not complaining, I am just fine-tuning performance for ATI cards.

Komat
04-04-2008, 11:52 AM
I think I know what might be going on. When you use the unsupported format for one from the arrays, the driver will switch to fallback. That fallback is likely created to be compatible with the OGL specification while being simple to implement (even if it is slow). One likely implementation matching what you see is that the fallback in all cases simply creates buffer big enough to contain required number of properly aligned vertices utilizing proper format and then there is one big loop which will for each vertex fetch data from original arrays and store them into the new buffer. This would mean that as long there is at least one array in unsupported format, everything will be copied into that new buffer (and you will pay for the readback of the positions from the VBO).

The logical conclusion for that would be that using the supported format might not only restore speed to the VBO path, it might also increase speed (or decrease cpu consumption) for the standard memory arrays. Of course I might be wrong in my assumption however you should really check what happens when you use aligned number of shorts for the additional streams.

Komat
04-04-2008, 12:04 PM
Imho that won't be so different from what it is now, where you get constant low framerates in seemingly random situations (hw/driver combinations).
While in this case the situation might appear random at first. It is consistent. If you render the mesh with specific setup, it will be slow (as documented by ATI) as opposed of it being slow only when you look at it for less than three frames and after you did not see it for one minute.



Given that you have a bazillion different ways of issuing geometry in OpenGL, I'd prefer it if the driver was a bit smart, since, after all, it's written by someone who knows the hardware.
What I prefer is having a nice paper written by the same people describing what you should and should not do to get the best performance. It might be caused by the fact that I was bitten by the Nvidia driver reoptimizing the GLSL shaders when a vec4 uniforms changed.

Lord crc
04-04-2008, 12:14 PM
If you render the mesh with specific setup, it will be slow (as documented by ATI) as opposed of it being slow only when you look at it for less than three frames and after you did not see it for one minute.

In which case it shouldn't be slow because you only render 3 frames. And anyway, if it always does that, it's consistent too so ;)

Komat
04-04-2008, 12:21 PM
In which case it shouldn't be slow because you only render 3 frames. And anyway, if it always does that, it's consistent too so ;)
Having three frames rendered at something like 5fps is not nice. You are right, in that sense it is consistent, however determining that it is caused by driver hiding use of unsupported vertex format might be not easy.

Lord crc
04-04-2008, 12:28 PM
It doesn't make sense to optimize the case, that the developer does stupid/wrong things.

Do let me know how I should know what is the stupid/wrong way to render various things on ATI/NVIDIA/Intel (taking any driver specific issues into account). Aside from some general hand-wavy docs, I haven't found anything that could really help me in that area.

For instance, where is it mentioned that if I want to upload a dynamic texture each frame, it's MUCH faster to do it "the old way", compared to PBO, on my 7800 GT / Windows XP / forceware from around a year ago. IF the texture is 256x256, that is (and only then).

Lord crc
04-04-2008, 12:39 PM
Having three frames rendered at something like 5fps is not nice.

Well if the rest of the frames were rendered @ 10ms each (ie 100fps), the average would still be 43 fps. Sure it'd probably look slightly jerky, but it wouldn't be a constant 12fps (as in the OP's case).


however determining that it is caused by driver hiding use of unsupported vertex format might be not easy.

True. Then again, if it's that easy to do now, why does the OP still have a problem? ;)

Yeah I know he hasn't shared too many details, I'm just saying that currently it is SO easy to fall into a trap and get bogged down, especially on some platform you don't have direct access to.

I guess what I'm really trying to say is that I can't wait for OpenGL 3 to be released :)

Komat
04-04-2008, 01:03 PM
Well if the rest of the frames were rendered @ 10ms each (ie 100fps), the average would still be 43 fps. Sure it'd probably look slightly jerky, but it wouldn't be a constant 12fps (as in the OP's case).

It might be only jerky or it might be a serious problem depending on when it happens. For example in my problem with the GLSL compiler the rendering was stalled for more than one second when setup of dynamic lights changed to new configuration (many shaders with many uniforms, I was able to warm-up them on the ATI however not on the Nvidia). This might be bearable from time to time when you look at the scene however the problem was that most changes of dynamic lighting happened to be during gunfight when such stall is not acceptable.



True. Then again, if it's that easy to do now, why does the OP still have a problem? ;)

Maybe because he did not change the format to supported one yet.



Yeah I know he hasn't shared too many details, I'm just saying that currently it is SO easy to fall into a trap and get bogged down, especially on some platform you don't have direct access to.

I know. It happened to me more than once. Sometimes for very stupid reasons :o

Lord crc
04-04-2008, 01:33 PM
It might be only jerky or it might be a serious problem depending on when it happens. For example in my problem with the GLSL compiler the rendering was stalled for more than one second when setup of dynamic lights changed to new configuration (many shaders with many uniforms, I was able to warm-up them on the ATI however not on the Nvidia).

Yeah I can see how that can be annoying. Though I would say that in that case, the driver was retarded as well. :)

If it had been a bit smart, it would have first determined if it was any point to optimizing it (ie the uniforms are static enough), and if so, compiled the optimized version in the background, replacing it when done.

But I get your point :)

Dark Photon
04-07-2008, 05:57 AM
It is. :cool: NVIDIA offers that with the PerfKit:
http://developer.nvidia.com/object/glexpert_home.html
Yeah, well still waiting on the Linux version that was promised back in Nov '07.

It's a shame because the NVPerfSDK for Linux released back in Sept. '06 was really nice.