This is really a big question, but basically if your top requirement is “real-time” (60fps or better), you have 16.666ms to do everything. Culling, occlusion, lighting, shadowing, shading, occlusion, antialiasing, …the works. That’s obviously going to preclude using some higher quality rendering methods that don’t fit the time budget on today’s hardware except in limited cases. OTOH, if you’re top requirement is quality not time, lots of these excluded options are practical.
Briefly, most conventional real-time methods (termed “standard rasterization”) can be oversimplified as:
for each object in the world:
smash it onto the screen pixels
what you’re left with is your image.
Thanks to GPUs, with a small amount of application assistance (such as culling), this can be made very, very fast. Part of the reason for this speed is that each point on each object is rendered largely independent of all other points on that object and points on other objects in the scene.
…But you can just hear the quality limitations of this in the technique description. Each point rendered on each object doesn’t “know” about the other points on that object (much less the points on other objects in the scene) when it’s rendered. If you “need” it to (such as for shadow casting, refractions, reflections, volumetric absorption/scattering, etc.), you have to arrange for this to be precomputed and passed in some fashion so it can be applied. This is one area where the complexity mounts up in a realtime renderer.
However, take an off-line “high quality” technique such as ray-tracing, which is typically not regarded as realtime (without significant limitations). It can be roughly oversimplified as:
for each pixel on the screen:
bounce rays from it off screen objects toward the light sources, and integrate their contributions
Here you can see that the tech approach has been flipped around. Instead of iterating over world objects, we’re iterating over screen pixels. Also, instead of each shading computation being “local” to that point on that surface, it’s “global”. That is, for each pixel we can spawn multiple rays and “hunt out” all the potential lighting contributions to this pixel “globally” in the scene of objects and light sources; thus why it and other techniques are termed “global illumination” techniques. And because of this, whereas in the former each shading computation didn’t need direct access to a database of scene objects, in the latter we do need this. So as you might imagine, whereas in the first technique we have a lot of coherent (e.g. sequential) access which helps improvement performance, with the latter we have a lot of random (incoherent) access looking up into this database of scene objects and light sources to trace pseudo-randomly bouncing rays, which is inherently more expensive and harder to “make fast”.
So briefly, the realtime/offline rendering technique difference is largely the “local” vs. “global” illumination thing, with better antialiasing (sampling/filtering) to some extent as well.