What is the most efficient way to transfer planar YUVA images for rendering in OpenGL?

Currently I'm using 4 separate textures (Y, U, V, A) to which I upload to from 4 separate PBOs during each frame. However, it seems to be much more efficient to transfer a lot of data in few textures, e.g. transferring YUV422 to a single packed texture is ~50% faster than transferring the same data to 3 (Y, U, V) separate textures.

Some thoughts I've had on the matter is whether I could use 2 array textures, one for (Y, A) and one for (U, V), would that be faster?

Another alternative I've considered is to convert from planar to packed while copying data to the PBO for transfer, though this does have some CPU overhead.

Any suggestions or insights?

NOTE: dim(Y) == dim(A) && dim(U) == dim(V) && dim(Y) != dim(U).