If a specific part in my vertex program do the same for every vertex, e.g. transform a vertex with several matrices, it will be faster to multiply those matrices once on the CPU and track them to the vp.
The reason of this is, on the CPU I perform the matrix multiplications(M = M_0 * M_1 * … * M_n) only once, but when I use a vertex program I have to write sth like this:
vNew = M_n * v
vNew = M_n-1 * vNew
…
v = M_0 * v_new
…
which is very slow, because I do this for each vertex.
Well the GPU is VERY fast at matrix multiplies (it’s made just for that kind of thing ) but if these matrices never change and only needed to be multiplied together once in your program, then combining them on the CPU and only doing one matrix multiply on the GPU would be faster. Now if you have to multiply these matrices together like you say every frame on the CPU, then this would end up slower.
Yes, the matrices change from frame to frame, but within a frame they don’t change.
Currently I multiply them on the CPU and track them to my vp, because I think to perform those multiplications for each vertex, this would slow down my app enormously. Right?
So I assume to perform those multiplications only once per frame on the CPU is much faster, than to perform those multiplications for each vertex on the GPU.
[This message has been edited by A027298 (edited 12-31-2002).]
I would suggest always premultiplying your matrices. There may be a point where this isn’t going to be profitable (say, if your matrix is changing couple of vertices), but as a general rule you’re going to come out ahead if you avoid doing any more per vertex work than you have to.
An example of this occurs everyday in the object space to clip space transformation.
It’s perfectly legal to do the object space to clip space transformation by multiplying by both the modelview and projection matrices, but you’ll get better performance by premultiplying the modelview and projection matrices (even if the modelview matrix changes often) and using this composite matrix to avoid one per-vertex matrix multiplication. This is exactly why there are “state.matrix.mvp” vertex program bindings.
In NV_vertex_program you use them to change the content of the constant registers. For example, you could transform a light vector stored in c[0] by some matrix. They are explicitly called by your OpenGL program and if I’m not mistaken always executed on the CPU.
Cheers.
I assume that you mean with light vector the direction of the light?! Ok it was just an example but why do you want to change this with a matrix in a vert. state prog.?
You could use the state program to convert the lights world position into object space, before you rendered each object.
Shame its gone from ARB_VP, I used to like them. Meant I didn’t even have to bother doing the matrix transforms myself when I could use state programs.
Yeah I liked State Programs too. I was disappointed when they said State Programs would not be in ARB_v_p. The spec says they were not usefull. Which I find as a crock because I know I had a need and use for them. I mean like for transforming a light to object space, that doesn’t need to be done for every vertex. It only needs to be done once per object so state programs provided useful there. I can see other ways too, like combining matrices that only need to be combined once per object like the original poster of this thread wants.
-SirKnight
[This message has been edited by SirKnight (edited 01-01-2003).]
Ok…using the ARB-extension is better. But I see that a vertex program can only contain 128 commands. Thats not too much, because I wanna do some matrix calculations. I’m not sure, but for one matrix multiplication I need naively 16 DOT4 calls to compute each component of a matrix. I have to do about 8 matrices and some additional calculations to build the components of some matrices. Hence 128 commands are not enough…so, back to the CPU
[This message has been edited by A027298 (edited 01-02-2003).]