Out of curiosity, are you binding most or all 3k textures each frame and only seeing the usual 150 of them? Or do you have some spatial testing before binds?
[note: In responding, I’m assuming you have less available videomem than total requested texture bytes, or you wouldn’t be asking. Let me know if that’s not the case.]
If you want to benchmark how much those texture “misses” (where the texture is requested but not resident) cost, you can add a toggle to bind only one texture for all objects and see the difference. If it’s significant, then you have your answer as to “cost” and whether you should try to optimize here.
As for relying on the driver, you don’t have much choice. “If the texture don’t fit, you must page it.”
OTOH, one fairly simple method I used for a project a while back (with at least that many textures) was to have all textures have two software states – one where the texture is considered “unloaded” meaning that when bound, GL is handed the ID of a common low-res stand-in (perhaps a 1x1 or 2x2 pixel of a similar color, but not too many of these either); and the second state is the normal “loaded” textured state. This way, your Bind wrapper code doesn’t need to know much except the currently used ID for that texture.
A texture manager would then use spatial tests to flip the best set of textures from unloaded to loaded states and back as a participant flew around.
At the time, bandwidth was very low relative to now, so we also had to carefully manage how many we flipped to “loaded” per frame to avoid stutters when the textures were actually fetched from system memory.
The downside is that if you don’t get textures into the loaded state in time, you see the stand-in texture, which may or may not be a problem for you. The upside is you can make the uploading very controllable and deterministic (compared to the blind approach).
A similar method would also work, where you could force the MIP level to the lowest resolution for any texture you think you might not need at the moment, thereby capping the # bytes transferred at any given time.
Other methods try to group textures, which helps with binds but not so much with memory (and mipping is harder). And then there are various “universal texture” methods to try and handle the paging. But that’s also not trivial for a lot of unique textures that may or may not have spatial coherence.