Consolidated Vertex Array with ModelView Matrix

Hi, just want to check if this is possible.

In our application, there are many small animated non-deformable objects (their mesh’s vertex/normal/UV and etc never change), But each object has different transformation (ModelView matrix) at each frame.

Currently we can group those objects’s meshes into a single VBO set if they share the same material/texture/effects. During rendering time, we only need to setup VBO/Material/Texture for the whole group once, then update the ModelView matrix for each mesh, and draw the mesh one bye one. This give us certain amount of performance increase.

But still we need to setup the matrix once per object. and this could be improved if I use Vertex Program. There I can package all the object’s matrix into array buffer, and construct a vertex–>matrix mapping table, inside the Vertex program I can query matrix for each vertex via this mapping table, and do the
transformation accordingly.

The problem is that our application is mostly use the OpenGL fixed pipeline functionality, instead of the vertex/pixel program pipeline.

So in this case, if OpenGL can provide an extension to support this matrix array buffer, and the vertex–>matrix mapping array buffer. It will be a big gain for us.

What do you think? thanks.

What do you think?

This is a waste of time.

This is something you could do easily and trivially with uniform buffers or buffer textures (depending on how many such objects there are). The only reason you can’t is because you’re relying on the fixed function pipeline.

You cannot reasonably expect to take full advantage of all of the capabilities of modern, shader-based GPUs from the fixed function pipeline. The ARB is not going to create extensions solely for the benefit of fixed-function users. Not unless there is nothing for anyone else to do.

Take a look at: GL_ARB_matrix_palette
You will still need to break up your calls some as it only supports a finite number of matrices.

Maybe he is targeting GL 2.1 or GL 1.3? Maybe he wants to port his code to OpenGL ES1 eventually. Maybe it is a part of large legacy code base. As a side note there is an OpenGL ES1 extension with similar functionality ( GL_OES_matrix_palette )

As for “trivial” with uniform buffers that is not at all true, there is a very finite amount room for uniform buffers.

On a side note, if you do end up going the whole hog of a programmable pipeline jazz, I advise the following:

If the matrices are a composition of a common perspective matrix and transformation matrices are angle preserving, then you can pack it into a translation, a rotation (represented as a quaternion) and a scaling factor, which nicely packs into 2 vec4’s. Then I’d advise to first pack into a uniform array (again you will need to break up the calls then) but this will keep it GL2.x compatible. If you are going or GL3.x, then I’d pack the values into a buffer object and use a texture buffer object to read them. At which point you will need to benchmark if doing 2 lookups + uglier arithmetic in vertex shader is cheaper or more expensive than just doing 4 lookups + simple MAD in the vertex shader.

Maybe he is targeting GL 2.1 or GL 1.3?

Most if not all OpenGL 2.1 hardware is not being supported by either ATI or NVIDIA anymore (no new drivers). And Intel doesn’t seem to care enough about OpenGL to implement 2.1 properly, let alone some random extension.

So the extension will not be available on that hardware. And since 2.1 hardware is (for the most part) rather much slower than 3.0+ hardware, that’s where the optimization is needed most.

Maybe he wants to port his code to OpenGL ES1 eventually.

But ES 1.1 won’t have this functionality, which his engine will be relying on.

Maybe it is a part of large legacy code base.

Unfortunately for those of us who use shader-based OpenGL, OpenGL features are very clearly designed to ease the upgrading of legacy codebases. Moving to shaders can easily be done on a per-object basis as needed.

As for “trivial” with uniform buffers that is not at all true, there is a very finite amount room for uniform buffers.

I said trivial to implement.

Plus, if UBO access is significantly faster than buffer texture access, then you’d get plenty of speedup using the largest UBOs you can, but making several draw calls. You should easily be able to get 1000 modelview matrices into a UBO (2x that if you compress them as you suggest). That reduces the number of draw calls by 3 orders of magnitude.

Sure, it’s not the same as making 1 draw call. But it’s a lot better than one per object. And it may only be as large as 2-3 draw calls, depending on how many individual objects you have.

If the matrices are a composition of a common perspective matrix and transformation matrices are angle preserving, then you can pack it into a translation, a rotation (represented as a quaternion) and a scaling factor, which nicely packs into 2 vec4’s.

The projection matrix should be a separate matrix that is not part of the data. So it doesn’t matter what kind of projection matrix you use, only what your modelview is.

Seconded. This is the most sensible course of action. If nothing else, it’s highly doubtful if there will ever be any more FFP extensions ever released.

Could the OP perhaps confirm the target hardware?

But ES 1.1 won’t have this functionality, which his engine will be relying on.

Some OpenGL ES 1.1 implementations do have this functionality, that is why the extension GL_OES_matrix_palette is around.

The projection matrix should be a separate matrix that is not part of the data. So it doesn’t matter what kind of projection matrix you use, only what your modelview is.

Sighs, I have a suspicion that you did not get what I was driving at. Oh well. Firstly: the point was that for each object i, it’s matrix is of the form P*M_i where P is a common perspective matrix and M_i is transformation produced from a composition of a rotation, scaling and translation. If that is the case then one could store that M_i in 2 vec4’s and P as a common mat4x4 uniform. However if different objects have different projection matrices, or if their transformation in 3-space is not given by a composition of a rotation, scaling and translation or if the transformation matrix is truly wacky then my suggestion does not apply.

Most if not all OpenGL 2.1 hardware is not being supported by either ATI or NVIDIA anymore (no new drivers). And Intel doesn’t seem to care enough about OpenGL to implement 2.1 properly, let alone some random extension.

Comment 1: that is down right wrong. The unified driver from NVIDIA coves GeForce6-GeForce5xx series, the GeForce6 is a GL2.1 part. Indeed, NVIDIA added the extensions ARB_debug_output, ARB_ES2_compatibility and ARB_separate_shader_objects for GeForce6 series when the GL4.1 spec came out.

Comment 2: If you need to support Intel hardware, then the best you can rely on is GL2.1, regardless of how poor it is. Moreover, if it is a large block of code already, “porting/updating” it to avoid the fixed function pipeline is a significant amount of work.
Lastly the attitude that “Fixed Function code = not good code” is a terrible attitude. For older hardware, the fixed function pipeline is much more reliable, even for modern hardware we have all come across limitations (or downright bugs) in the programmable side (typically the GLSL compiler) where as bugs only encountered in the fixed function pipeline are much rarer.

At any rate, there is already a GL extension out there with exactly what the poster asked for, weather or not his target platform has that extension he will see.

Some OpenGL ES 1.1 implementations do have this functionality, that is why the extension GL_OES_matrix_palette is around.

It’s not the same thing. The most crucial difference is that the palette is not array state. The indices into the palette are, but not the matrices themselves. So you would have to modify the matrices with the regular matrix functions.

Also, the OES_matrix_palette extension seems to have low minimum requirements. 9 matrices total. While implementations can increase this number, the equivalent ARB extension required a minimum of 32. That’s not exactly encouraging.

Comment 1: that is down right wrong. The unified driver from NVIDIA coves GeForce6-GeForce5xx series, the GeForce6 is a GL2.1 part. Indeed, NVIDIA added the extensions ARB_debug_output, ARB_ES2_compatibility and ARB_separate_shader_objects for GeForce6 series when the GL4.1 spec came out.

That only makes it 1/3rd wrong. The 1/3rd being about NVIDIA. ATI hasn’t updated their non-GL3.0 hardware drivers since February of last year. And what I said about Intel is still quite true.

Moreover, if it is a large block of code already, “porting/updating” it to avoid the fixed function pipeline is a significant amount of work.

Did you expect high performance to not have a programmer cost associated with it? If you want performance, you have to work for it. In the best cases, it means writing a mesh optimizer to rearrange your vertex data. In the worst cases, it means having to upgrade lots of old code.

For older hardware, the fixed function pipeline is much more reliable, even for modern hardware we have all come across limitations (or downright bugs) in the programmable side (typically the GLSL compiler) where as bugs only encountered in the fixed function pipeline are much rarer.

Modern hardware doesn’t have limitations or bugs (not the ones you’re talking about); modern OpenGL implementations do. It is a software problem primarily due to the fact that nobody caring about desktop OpenGL implementations. It has nothing to do with hardware.

The more people avoid GLSL for the fixed-function pipeline, the less reason the IHVs will have to care about getting their programmable pipeline issues worked out. Most of the bugs and issues are just oversights: things they didn’t test for. The best way to encourage them to fix their stuff is to exercise it as much as possible.

At any rate, there is already a GL extension out there with exactly what the poster asked for, weather or not his target platform has that extension he will see.

If you’re talking about ARB_vertex_blend combined with ARB_matrix_palette, then I would point out that, according to the OpenGL extension viewer’s database (which has entries going back to the Ti 4xxx and Radeon 7xxx days), nothing supports ARB_matrix_palette. And ARB_vertex_blend without the matrix palette would massively bloat his per-vertex attribute data.

Also, the OES_matrix_palette extension seems to have low minimum requirements. 9 matrices total. While implementations can increase this number, the equivalent ARB extension required a minimum of 32. That’s not exactly encouraging.

Gee check this as well: GL_OES_extended_matrix_palette.

At any rate, the debate over matrix palette is pointless on the desktop: Apple does not have it, AMD does not have it and NVIDIA does not have it. The ones that do have it are ironically over in GLES-land.

Modern hardware doesn’t have limitations or bugs (not the ones you’re talking about); modern OpenGL implementations do. It is a software problem primarily due to the fact that nobody caring about desktop OpenGL implementations. It has nothing to do with hardware.

ROFL. Modern hardware most definitely does have bugs, it is not just drivers bugs that foul folks up. Though not a graphics situation, go do a search for bugs in ARM CPU’s. At any rate an end developer cares mostly about weather or not there is a bug in the implementation that harms them, not the source of the bug. An end user will not care if they start up your product and it bombs. If the driver is at fault, it is a truly unpleasant experience to prove that and keep your customer.

But that misses something else: plenty of people do care about desktop OpenGL implementations. How else would NVIDIA and ATI participate in the GL standards, funnel so much work into their GL implementations? Because people do care and people do use it. Bugs happen and get reported, but being bleeding edge means exposed to more bugs. This is why when using GL for widespread use, GL2.1 is often the spot. But that does not sound so crazy: lot’s of development in D3D land is Direct3D 9 (not 10 and not 11) which corresponds to GL2.1.

Did you expect high performance to not have a programmer cost associated with it? If you want performance, you have to work for it. In the best cases, it means writing a mesh optimizer to rearrange your vertex data. In the worst cases, it means having to upgrade lots of old code.

This is degenerating into a flame war but I’m game today. Ever worked on a large project with tons of code? Ever have to work with other people? Ever have to run on older hardware? All of these are reasons to think carefully about using GL3 and higher. Compounding the issue, under certain circumstances, the fixed function pipeline will perform better than the programmable pipeline. Gee imagine that. A highly tuned piece of functionality in a GL implementation.

I am all for folks using GL3 and higher more often but one needs to access the situation. I am all for people pushing the implementations. I am all for using the programmable part. Atleast I have the ability to realize that there are situations where one cannot just up and change a large block of code and dramatically increase the hardware requirements.

I would argue that at this point in time one can count on GL2.1. At this point the original poster is likely laughing at the flame war, I know I would.

Apologies for the late reply; I lost track of the thread a few weeks back.

First off:

This is degenerating into a flame war but I’m game today.

This isn’t a flame war. A flame war is when people are insulting one another back and forth.

We are having an argument. We are presenting different sides, offering evidence and argument for our different positions. This is what a forum is for.

Moving on:

Gee check this as well: GL_OES_extended_matrix_palette.

Wow, I don’t think I’ve ever seen a more worthless extension. Not the matrix palette functionality; when dealing with the FFP, that has some merit. Just the fact that they made an extension for the sole purpose of increasing the minimum of a queriable value.

The ARB has made some extensions of questionable merit, but I don’t think even they’ve ever made one solely to increase a minimum implementation-dependent value before.

Can’t GLES users just ask what the max number of matrices is? Isn’t that easier than looking for an extension?

But that does not sound so crazy: lot’s of development in D3D land is Direct3D 9 (not 10 and not 11) which corresponds to GL2.1.

Yes there is. But your analogy misses a crucial point: that D3D9 is supported on all Windows OS’s, while D3D10+ are only supported on Vista or better. The simple fact of the matter is that there is a lot of D3D 10 capable hardware out there that cannot be used with D3D 10 because the user is running XP.

This is a rather large issue. One that OpenGL doesn’t have. GL 3.x runs just fine under XP.

So when you’re asking yourself whether you want to restrict yourself to GL 3.x, you aren’t also saying that all XP users will be cut out of the loop like you would in D3D land.

Look at Valve’s hardware survey. A developer might be willing to say that they’re willing to overlook the bottom 20% who use DX9-only hardware. But it’d be much harder to overlook the bottom 36% who only have access to the DX9 API even though almost half of them have DX10-hardware.

One fifth vs. one third. That’s a pretty substantial difference.

Ever worked on a large project with tons of code? Ever have to work with other people? Ever have to run on older hardware? All of these are reasons to think carefully about using GL3 and higher.

Yes to all of the above. But you said, “Moreover, if it is a large block of code already, ‘porting/updating’ it to avoid the fixed function pipeline is a significant amount of work.” That’s not citing a lot of reasons; that’s only citing one reason.

My point was that if you are doing performance optimizations, then you had better prepare to do “a significant amount of work” regardless of what those particular optimizations are. Whether it’s dropping the FFP for shaders in some places, optimizing your vertex formats, etc. My point is that something being “a significant amount of work” does not mean that you shouldn’t do it if you need the performance.

Performance takes effort to achieve.

The greater hardware requirements are a valid point, but it would be just as valid a point if the OP’s suggested extension came to be. Because GL 2.1 hardware is simply not capable of what he’s asking. Thus it would only ever be implemented on 3.x class hardware, and thus the optimization would require 3.x hardware.