glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY)
...
glUnmapBuffer(GL_ARRAY_BUFFER)
Incredibly slow when used to draw string of characters (textured quads) using very simple shader.
Any thoughts?
ATI latest drivers! OpenGL 2.1!
glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY)
...
glUnmapBuffer(GL_ARRAY_BUFFER)
Incredibly slow when used to draw string of characters (textured quads) using very simple shader.
Any thoughts?
ATI latest drivers! OpenGL 2.1!
I've experienced a phenomenon like that on ATI 5450 and latest drivers, then I switched to glBufferSubData() and it was ok. The problem then was only that glBufferSubData() worked fast but failed to update VBO accurately.
Try to create a new buffer every time instead of mapping existing one.
glBufferSubData did the trick!
Thanks a lot for your help!
The buffer object performance Ouija Board strikes again!
Vendors (or ARB): Would you please just tell us how you want us to do this on your hardware so we can make your GPUs look as sexy and competitive as possible?
Reading similar posts about VBO performance I conclude:
This proves that the idea of VBO (imitating Direct3D approach) was a horribly bad idea. And the argument that the new hardware has changed the way it pulls vertex data is misleading...
I mean D3D was built since it was born on the idea of buffers (execution buffers) and I believe this is developed into more relatively elegant and developer-friendly vertex buffers.
OpenGL was built from the start on the concept of state machine, switches, on/off, enable/disable, begin/end, which is already very developer-friendly, neat, and performs excellent (Quake, Serious Same, Doom, etc etc...) (CADs...etc. etc.) no one complained.
Now we are heading toward a mysterious awkward approaches being abandoned by the other API, and we adopting all the garbage and replacing the features and functionality that made this API epic for a very long.
Let the driver takes care of this internally, and with VBO we are already copying the vertex data from system memory to GPU. Forget about dynamic data? A function that draws a mesh or polygons? We MUST copy from CPU to GPU buffers.
Next time remove VBO, go back to glBegin/End and traditional vertex arrays. They are much faster and reliable.
Cheers!
First, the ARB can't explain such a thing. The OpenGL specification does not define performance, nor can it.Vendors (or ARB): Would you please just tell us how you want us to do this on your hardware so we can make your GPUs look as sexy and competitive as possible?
Second, what exactly does "this" entail? Streaming? What kind of streaming? How frequent are you streaming? Are you talking about vertex formats?
Third, and most important of all, if you want to pressure IHVs to do this, you have to give them incentive. Every application that cops out with display lists or client-side vertex arrays is another reason for IHVs to not bother.
Make your applications rely on buffer object performance, and you'll find out more about it. The squeaky wheel gets the grease.
This belief is not congruent with reality.I mean D3D was built since it was born on the idea of buffers (execution buffers) and I believe this is developed into more relatively elegant and developer-friendly vertex buffers.
Execution buffers were an old, old Direct3D 3.0 thing. And Direct3D 3.0 was the first iteration of D3D. The next was 5.0, which abandoned execution buffers in favor of a more OpenGL-like vertex array model.
It wasn't until 7.0 that vertex buffers appeared in D3D. And the primary purpose of that was, well, pretty obvious.
So I have no idea where you got this idea from. Vertex buffers and execution buffers have nothing to do with each other. Execution buffers are more reminiscent of display lists than buffer objects. And display lists are 1.0 functionality.
I'm not sure how listing a bunch of games that use OpenGL is an argument that it "performs excellent".OpenGL was built from the start on the concept of state machine, switches, on/off, enable/disable, begin/end, which is already very developer-friendly, neat, and performs excellent (Quake, Serious Same, Doom, etc etc...) (CADs...etc. etc.) no one complained.
Quake was released in the days before hardware T&L. Without hardware T&L, buffer objects makes no sense, because T&L had to be done on the CPU. Why upload vertex data to GPU memory, only to download it right back to the CPU to transform? Serious Sam was released before buffer objects existed. It might have made use of NV_vertex_array_range, which had all of the pitfalls of buffer objects (though only from a single vendor).
Oh, and Doom 3 (I assume you mean Doom 3, since Doom wasn't an OpenGL game)? It uses buffer objects. It also uses NV_vertex_array_range, where applicable. So again, I have no idea what you're talking about.
Do you honestly believe that using regular vertex arrays, or immediate mode, doesn't copy data from the CPU to GPU buffers?Let the driver takes care of this internally, and with VBO we are already copying the vertex data from system memory to GPU. Forget about dynamic data? A function that draws a mesh or polygons? We MUST copy from CPU to GPU buffers.
By letting the driver "take care of this internally," you're effectively ensuring that, every frame, you are transferring megabytes of vertex attribute data from the CPU to the GPU across a PCIe bus. You're willing to cede all of that performance?
If I have a million+ vertex model, that uses 64-byte vertex data, I don't want to have to transfer 64MB of data across the PCIe bus every frame. And that goes double if I'm doing shadowing and have to render it twice.
No they aren't. The only thing that can be said to be consistently faster than buffer objects is display lists, and that is only on some hardware.Next time remove VBO, go back to glBegin/End and traditional vertex arrays. They are much faster and reliable.
The simple fact is this: the closer to the hardware you get, the greater the chance that you'll fall off the fast path. However, you also get greater the rewards if you happen to remain on it.
Client vertex arrays will give generally consistent levels of performance. But they will never give as good performance as proper use of buffer objects will. However, you can do the wrong thing with buffer objects and get poor performance. That's the nature of getting low level.
It's like instruction scheduling. If you could write to shader assembly directly, it's possible you could beat the compiler/linker's scheduling and improve performance. However, you might also fail miserably at this task, thus making your performance worse.
Seconded. I've really come to believe that the VBO API is just not well conceived or well thought-through. It badly needs to be much clearer what happens, when it happens and under what conditions it happens all the way through. A more prescriptive specification that maybe needs to abandon some of the core philosophies of OpenGL (don't sweat the hardware details, let the driver handle it)? Maybe.Originally Posted by Dark Photon
What's "proper" use of buffer objects?But they will never give as good performance as proper use of buffer objects will.
IHV-Driver-Bug Dependent?
Any official guidelines?
Time once was that using the wrong vertex format on the wrong hardware with client arrays would kill your performance. When compiled vertex arrays were en-vogue, there was precisely one vertex format that was accelerated: the one that Quake 3 was using. If you used anything else, your performance died on the spot.What's "proper" use of buffer objects?
My point is that this is not some new problem that buffer objects created. This has always been around. Finding the fast path has always been fraught with peril and potential performance disaster. Going low-level means taking performance into your own hands.
Longs Peak was going to introduce a vertex format object, which drivers could reject creation of if the vertex format was sub-optimal (similar to GL_FRAMEBUFFER_UNSUPPORTED). But that died.