The difference between row major and column major is merely a matter of populating the array horizontally or vertically. Converting should be very straight forward and easy.
The LookAt function just creates a 4 by 4 matrix where the end is a (4 value) position and the first 3 are the mutually perpendicular vectors representing a private axis. Notice I don’t care whether we’re talking about columns or rows here as long as we pick one and stick with it throughout. The inner 3 by 3 is just the mutually perpendicular vectors with no position. But the 4 by 4 can work with position, orientation, and scale all simultaneously. Combine it with a rotation matrix and it will rotate the position, orientation, and scale (which can be a problem if you don’t do it right: imagine scaling the vertices away from the world origin instead of the object’s center).
So, LookAt takes 3 parameters: Camera position, place to look towards, and “Up”. Up is just the area “above” the camera.
Before you really get started on this you need to know vector algebra. If you’re shaky on that, that’s where you need to start and I suggest my Vector video on my YouTube channel. It’s two hours long, but I think it’s well worth your time if you don’t have a really solid understanding of vectors already. It’s long, but I’m trying to shove an entire semester of Linear Algebra into about 3 hours of video including my Matrix video. The Matrix video actually briefly covers LookAt around 50 minutes in. Again, that’s probably well worth your time if you don’t already have a strong understanding of matrices and how they are used in game programming.
But anyway, the 3 by 3 portion of the 4 by 4 matrix holds 3 mutually perpendicular vectors that represent a “private” X, Y, and Z axis. They have to remain mutually perpendicular at all times just like any other 3D axis. This is much easier to do than it sounds, because they almost never loose their relationship to one another as long as you do the math as matrix algebra and merely multiply one matrix times another. A rotation matrix times an object’s matrix will rotate all 3 axes together simultaneously and thus preserve their relationship to one another without you having to do anything to maintain that relationship.
That “private” axis in the matrix describes a “difference” from the world axes. So, when it aligns with the world axis, it is an empty matrix also known as an “identity matrix”. Rotate it and it describes an orientation change from the world axis. Translate it (with a 4 by 4 matrix) and the position value will describe an offset from the world origin.
This allows you to place objects into your world.
When you import your model, it’s going to come from a file. That file is going to have all the vertices in the position they were in in your modeling program when you modeled it. Generally, that’s going to have the model centered on the world origin. So, all your models will be on the world origin on top of one another if this is all you did. The object’s world matrix allows you to have a matrix that holds the position and orientation for each model. Each model gets at least one (more matrices per model is a little more advanced topic - starting off just do one per object).
There are 3 changes you need to apply to the positions of the model’s vertices in order to make the miracle of 3D programming. The first (world/object matrix) places the vertices of the model into the scene. The second (view matrix) prepositions them to simulate a camera. The third projects them to a 2D surface so that they can be drawn on your 2D computer screen (the back buffer actually which is just a 2D array of pixel colors that will be used to draw the screen).
You can make a 4by4 matrix of the object’s world matrix, the view matrix, and the position matrix. Then you can combine those 3 into one matrix that contains all that information in a single matrix. Then you can apply it to every vertex in the model one at a time (or in reality the graphics card does massive parallelism where it processes hundreds or thousands of vertices simultaneously all with the save world-view-projection matrix). And that vertex will be positioned perfectly on the 2D screen so that you can draw triangles between the vertices and shade them in (rasterization). And your 3D world will magically appear on the 2D computer screen.
The matrix algebra is a highly efficient way of doing the whole drawing process. You’re never messing with the original vertices and risking losing their relationships between one another. In other words, you’re not changing the actual model and so you can always go back to that or reuse it for several instances of the same model all placed differently in the scene by different world matrices.
But more to the point: the LookAt matrix uses Vector Cross Products as a way to build 3 mutually perpendicular vertices. The camera position to the “spot to look at” forms a vector that points in the direction the camera should be facing in 3D space. Normalize that (change it’s length to 1) and you have your first vector. That “Up” vector is needed to do the first vector cross product and “must” point generally above the camera, although it’s not actually “trusted”. In fact, you will often see people cheat and just use Vector3.Up or something that gives a vector that points along the “up” axis. This is technically wrong, but this step here is why the cheat works and why it’s technically wrong:
The “Up” vector and the forward vector we just created are two vectors that we can assume live on a plane which will be our forward/up plane (I’m avoiding using X,Y, and Z here since they change from environment to environment, but if Y is up and Z is forward this would be on they ZY plane. But we might be pitched up (rotated around the X axis) and that “Up” might not actually be pointing above the camera. So, we’ll fix it later. For right now, we just use the vector cross product of these two to give us a 3rd vector that points straight out of this plane (I think you have to normalize both vectors to get the correct answer here, but I’m not certain off the top of my head).
So, now we have our “Up” vector (that we don’t trust) and a trust worthy forward and right facing vector (could be left depending on the order you do the math in but multiply a vector times negative one and it will reverse directions in 3D or 2D). Now we have two trusted vectors that live on the forward-right (ZX for example) plane. And we can use their vector cross product to get a vector that we know is mutually perpendicular and truly points up (above the camera) and then we can throw away our untrusted “Up” vector.
That gives you the 3by3 matrix. The 4th value is just the camera position we were given as an input parameter. And the 4th (w) value of each is 0 if it is a vector/direction and 1 if it is a position. This is in order to make all the math we will be doing later work out correctly. Viola! There’s your 4 by 4 matrix created with LookAt().
Keep in mind that I never know exactly which of these vectors has to be normalized to make the math come out right because I generally use GLM or whatever math is built in to what I’m using. I would just normalize them all to start out with until you know which ones do not necessarily have to be normalized.
Also, the identity matrix will give you a “LookAt” matrix that points straight down the forward (Z?) axis with no pitch, yaw, or roll. So, you could just start with an identity matrix instead and re-position and re-orient it. But LookAt is convenient for various things.
The view matrix and the world/object’s matrix are the same thing except every object has it’s own matrix and you only have one view matrix (per camera). Also, (and this is really important), the view matrix is opposite (inverted from) an object’s world matrix. It works exactly backwards from the object’s matrix. So, when working with a view matrix, you have to invert and rotation, translation, etc. that you do to it.
But anyway, that’s how the LookAt() function works. It merely takes 3 input parameters to build a 4 by 4 matrix that holds an orientation and position that match the input parameters by using vector cross product math.
I might also mention that I am a firm believer in never using scale, and to scale equally on all three axes if you do. I believe if the scale is off, you need to go into Blender and fix it and re-import it rather than using mathematical scaling. But there may be times where it makes sense to use it. Still, given a choice, I pretty much never scale anything in code… ever. For one thing, when you’re working with a lot of models, it’s going to be near impossible to get everything to look right together if they are not modeled to scale in the first place. For playing around, it may not matter. But for serious endeavors you want your models modeled to scale so that they just “work” when you import them. I had a scene that I did recently with a lot of objects that were modeled separately and it looks “real” because everything was modeled to the same centimeter scale even though they were all made completely separate from one another.
As a learning exercise, you can create your own LookAt() function and compare it’s output to the LookAt() function built into GLM, or whatever you are using. If yours works just as well, you know you’ve done it right.