glTexSubImage2D can be fast, but you need to set up your texture carefully because some combinations of parameters are less than optimal.
It's commonly seen that people choose GL_RGB for the texture format, but this is going to be one of the slowest choices possible. The driver very likely doesn't represent it's texture data in RGB order at all (and it definitely doesn't support 24-bit textures), so it will need to do some conversion before it can upload the new data. That's the single most likely cause of your performance loss.
I've benchmarked this extensively, and on all hardware the fastest choices are:
In your initial glTexImage2D call:
format: doesn't matter
type: doesn't matter
For subsequent glTexSubImage2D calls:
If you only care about NVIDIA hardware you can get away with GL_UNSIGNED_BYTE in the last case, but if you need to run well on AMD or Intel too, you absolutely must use these parameters. These will allow the driver to stream in the texture directly and without needing to go through any intermediate conversion steps, which in one benchmark ran 30 times faster. Yes, you read that right: 30.
Of course you need your incoming data to be 32-bit 4-component too. You can write your own up-conversion routine if you wish, and that may be faster than the driver's (just remember to only allocate the memory you write to once instead of doing a separate allocate/free per-upload).
With GL4.4 you could probably do something with a persistently-mapped PBO but I haven't benchmarked or even tested this and I'd advise that you get the basics right first anyway.