Recommended normal mappping format

I am going to implement parallax and relief mapping in my new engine, but I’ve some design releated questions.
Previously, I was loading normal maps with depth component in alpha using RGBA8 texture format from pregenerated TGAs. But I’ve came to conclusion that loading times from disk are way too long.
Moreover, it’s easier to model depth maps, without converting them to normal maps.
So currently I’m thinking about depth map, from disk and normal map recalculation on the fly.
What kind of texture format should I stick with - signed byte (RGBA16?), float textures? Which of them is less expensive performance wise? What’s happening with filters? Can anyone give me better recomendations (without 3Dc)?
Normal recalculation in pixel shader using dx/dy, is way too expensive I guess…

Target platform is GF 6x series and that like for now.

[edit]
According to documentation at http://developer.nvidia.com/object/bump_map_compression.html
it might be a good idea yo use non tangent space bumpmapping. Anyone knows where can I read more about making this gradient map?

I use photoshop plugin to generate nomalmaps from RGB image now.
JPG is out as compression artifacts are noticable, PNG might be OK.
For best results normals from texture map must be renormalize anyway, as filtering is causing artifacts too.
I want to be able to use engine for demonstrational purposes too. Idea - I take and draw hight field, application stage or fp, recalculates normals.

Anyone knows where can I read more about making this gradient map?
i assume by gradientmap they just mean storing the height offsets each pixel has from its neighbouring ones, this is what i do with some of my bumpmaps Rchannel of texture is offset in X direction,Gchannel is offset in Y direction.

I was loading normal maps with depth component in alpha using RGBA8 texture format from pregenerated TGAs. But I’ve came to conclusion that loading times from disk are way too long.
are u sure? from memory loading times of textures eg a 1024x1024x32 (4mb) texture is very short, are u perhaps swapping bytes per piel or something?

No, I use I’m working with BGR/BGRA formats, the data read times from HDD are prety big. Base+Normalmap with high resolutions…