Packing to RGB565 is cheap and easy. Conceptually, take the high 5 bits of red, high 6 bits of green, high 5 bits of blue, bit-shift them all together into a single unsigned short, and store. Then upload to OpenGL using a GL_RGB5 internal format and GL_RGB/GL_UNSIGNED_SHORT_5_6_5 external format/type. No alpha, but you get RGB in 2 bytes.
Allegedly there’s also an RGB5_A1 and an RGBA4, which are pretty much the same but steal bits from the color to give to an alpha channel. So that’s a simple encode too. And 2 bytes/texel.
Short of that, consider DXT1 or DXT5. Don’t re-invent the wheel; just pick up van Waveren and Castano’s work here and here for starters, and use their code. Based on their stats (see Results sections), you should be able to compress a 1024x1024 texture in ~0.5-1.0ms on the CPU for DXT1 or ~0.7-1.5ms for DXT5 (add 33% to that for MIPmaps). If that’s not fast enough, you can allegedly get a pure compression speed-up on the GPU, but probably not worth it as you pay a hefty fine in PCIx upload time (several ms per megatexel).
Could converting to OpenGL’s type really take that much time/processing? I do need the alpha, at least for all but the bottom layer.
Well RGB565 is basically the same form as your base RGB8 base layer data: discrete texels. Just throw away a few bits of RGB precision, chuck the alpha you’re not using, and you’re there.
Ditto for RGB5_A1 and RGBA4, except don’t chuck the alpha, so those are simple encodes too. If your app miraculously didn’t need any more color/alpha precision than that, these might be sufficient for your top layers.
However, DXT-compressed textures are a different beast requiring more encoding work. Unlike the above formats where texels are encoded and stored separately, with DXT they aren’t: 4x4 blocks of texels are. For the RGB side for instance, the compressor has to split your image into 4x4 blocks, and for each block, come up with the two best colors to represent the entire block by. Then what’s stored in your texture are those two best colors, along with a 2-bit value (0%/33%/66%/100%) for each texel in the block which describes where that texel is (approximately) along a line between those two points. Alpha in DXT5 is handled similarly, but with 3-bit interpolants.
…by the way, I’m assuming in all this that none of your texture layers are monochrome. If so, that simplifies things dramatically.