I’m a long-time fan of color-ID, but I’d advise only using it if you can amortize the cost over many calls or you can greatly focus and optimize. It’s not a general-purpose win.
You didn’t specify this one way or another, but some people implement color-ID by re-rendering the whole scene to a normal-sized framebuffer ahd then find the X,Y projection of the ray on the resulting image plane (worst case reading the whole image back first). That’s actually somewhat reasonable if you’re testing a very large number of rays at once.
For resolving a scattering of rays with the color-id technique, you might actually decide to render to most of the viewing frustum, but here at least you can pre-mask the pixels that you don’t care about to reduce the fill cost. If you had hardware histogram, this readback would be on the order of a few bytes per ray with such masking.
But for a single ray, you’d want to first cull the scene to a 1 pixel-wide frustum and render to a 1x1 framebuffer only to resolve the z-order (which I understand is essentially what gl picking does, only in software since it’s just one pixel). So for a single ray, I think it can be shown that using the framebuffer for picking will only be faster if the depth complexity is impossibly large. For your project, you might want to determine “how large” for a given CPU/GPU combo.
On occasion, it’s possible to use the color-tag render as the first pass of a more complex shading technique, since colors may get overridden anyway (often, they are used as interpolants, so tough luck there). Difficulties include the wait between the first and subsequent passes, and handling things like alpha surfaces, which you’d make either 100% opaque or transparent for picking purposes (no blending allowed).
There are other techniques, of course. Another method involves maintaining screen-space bboxes for objects and using that to accelerate picking. In one case, a system I worked on rendered only a small set of unique primitives, so the screen-space bounds could be computed algorithmically and very easily for final visibility/pick resolution in 2D projected (even head-centered spherical) space. Lots of options, depending on the design and constraints of the engine.
Hope this helps with some of the options and tradeoffs.
Avi