Copying the vao likely isn't much less fiddling than setting everything up without.
Dunno if my code could/should work another way. It's pretty much straight Forward:
There is a vertex-buffer which contains mesh data (this is a global-const - not per model-instance thing). This is function A.
There is function B which uses shaders to do mesh-animations. B renders from the pointers currently bound into textures, which get copied into buffers, the pointers to animated attributes get bound to those buffers.
C just takes whatever is bound as attributes and renders changing only the surface-settings(meaning program, textures and so on) to the one used by the triangle-range(s) emitted.
The function calling A, B and C checks if programs to be selected for rendering by C do mesh-animations themselves which may or may not cancel out the Need to call B.
Having written that: I do not really see the Need to Change the code at all. It would just have been a lot more elegant from the programmer-perspective with a copyVao as A needs to do a for(i=0;i<16;++i) loop everytime it sets up it's pointers. Not that this would matter performance-wise when compared to the stuff done when rendering just 1 triangle... ;)
Can you confirm - when you're talking about the "transformed" and "untransformed" versions of your data - are you transforming on the CPU? And if so, why?
I'm only transforming on the CPU if necessary - meaning if there is no SL-Support or the max. number of draw-buffers is too small. But that is a whole other story...
Otherwise transforms are done by shaders as scatched above.