I don’t have a simple code example, but I can give a different (possibly easier) approach. (The problem of the previous suggestion is that you can not use indexed vertices, since vertices are then shared between multiple primitives and are not unique per primitive.)
Overview: you will need to render each scene at least twice. The first time, make sure multisampling is disabled and disable smooth polygon, line, and point modes (you don’t want any antialiasing). Set your framebuffer clear values to something like (1.0, 1.0, 1.0, 1.0) with glClearColor() (so the packed pixel color of the background will be -1 when you call glReadPixel()). Render your scene the first time with a super simple shader program that fills the color frame buffer with primitive IDs. You won’t display it. Shader programs can access builtin variable gl_PrimitiveID, which contains an unique number, beginning with zero for the first primitive drawn and incremented by one for each subsequent primitive drawn. Unpack that value into a fragment color vector with builtin GLSL function unpackUnorm4x8() so you have the 32-bit value of gl_PrimitiveID stored in the RGBA bytes of a four-element vector (the geometry shader is a good place to do that). Store that packed value as your fragment color.
After rendering the scene with that shader program, read the pixel that you are interested in with glReadPixel(). What you get back is either -1 if the pixel read is part of the background, or else it is the value of the appropriate gl_PrimitiveID if the pixel is over any rendered primitive. As long as you know the order you defined the geometry primitives that you drew, then you can use gl_PrimitiveID to directly map to that primitive.
(If your scene consists of multiple types of primitives (i.e., points, lines, triangles, patches), then you need to render each type of primitive by itself, so that gl_PrimitiveID will be unique for each primitive (it restarts from zero with the first primitive drawn). Render all polygons first (so that the depth buffer will be fully populated) before rendering lines or points, so hidden lines and points will not be rendered. In any case, read your desired pixel and then clear the color buffer after each rendering of your scene, but do not clear the depth buffer between renderings.)
Do not swap the back buffer for the front buffer. Enable antialiasing/multisampling as desired. Clear the framebuffer with the values you want to see. Now you can render the scene the final time, but this time for displaying. (As an efficiency matter, you don’t need to clear the depth buffer before drawing the scene the final time, so that when you render the scene the final time, only fragments that will ultimately be displayed need be rendered. If you have expensive per pixel lighting/texturing calculations, this can actually make rendering your scene again faster than rendering it just once!)
After reading over all this, it sure sounds complicated. It really isn’t!