Real-time 2D camera lens blur effect achieveable?

Hello,
I render a lot of animation content and I would like to look into the possibility to add lens blur via Z-depth render passes.
Photoshop and newer versions of After Effects has rendering-based effects that does this, but is it possible to do real time?

Theory:
On rendering a scene I output a color pass - the crisp (non-blurred) visual animation.
I also output a Z-depth map - the exact same content as the render above, but instead of color information, it’s a greyscale depth map where black is the closest and white is the farthest.
The Z-depth map affects the amount of blur of the color map and the closest content is always blurred on top of content behind it. This should be done real-time, possibly interactively on full-HD video content. Edge blurring doesn’t matter.

The concept seems pretty basic, but as this is not just ordinary depth-map-is-blur-intensity, it could be pretty processor demanding.

Is it feasible or too demanding?

It depends upon exactly what you’re trying to do. And the blur radius.

Naive convolution (i.e. for each pixel, calculating a weighted sum over every pixel within a neighbourhood) is too expensive to use with large neighbourhoods. There are optimisation (e.g. FFT-based convolution, taking advantage of separability), but these rely upon the radius being constant. They aren’t much use if you need to vary the blur radius with each pixel.

AIUI, the usual approach for depth-of-field simulation is to approximate a Gaussian kernel of a given radius as a blend between two fixed-radius kernels, i.e. blending between images blurred using different fixed radii. As the blur increases, the need for image resolution decreases, allowing the different images to be stored in different mipmap layers of a texture, and allowing the standard linear-mipmap-linear texture filter to perform the blending (i.e. using the textureLod() function where the LoD parameter is based upon depth).

What this can’t deal with is the fact that, with a physical lens, opposite sides of the lens actually have slightly different views of the world. I don’t think that is feasible to simulate in real time for the general case (it can be used as a spot effect, where you have a clear distinction between foreground and background).

[QUOTE=GClements;1272170]It depends upon exactly what you’re trying to do. And the blur radius.

Naive convolution (i.e. for each pixel, calculating a weighted sum over every pixel within a neighbourhood) is too expensive to use with large neighbourhoods. There are optimisation (e.g. FFT-based convolution, taking advantage of separability), but these rely upon the radius being constant. They aren’t much use if you need to vary the blur radius with each pixel.

AIUI, the usual approach for depth-of-field simulation is to approximate a Gaussian kernel of a given radius as a blend between two fixed-radius kernels, i.e. blending between images blurred using different fixed radii. As the blur increases, the need for image resolution decreases, allowing the different images to be stored in different mipmap layers of a texture, and allowing the standard linear-mipmap-linear texture filter to perform the blending (i.e. using the textureLod() function where the LoD parameter is based upon depth).

What this can’t deal with is the fact that, with a physical lens, opposite sides of the lens actually have slightly different views of the world. I don’t think that is feasible to simulate in real time for the general case (it can be used as a spot effect, where you have a clear distinction between foreground and background).[/QUOTE]

Thank you for the in-depth response.
For simplicity’s sake let’s say I need a maximum blur/out of focus radius of 8 pixels for 1080p video content - I do not need fancy perspective shift calculations, since I don’t believe these would be worth the added processing cost.
Remember - I won’t be using the approach for real-time 3D content streamed from the gpu, I will use it for pre-rendered 2D video with all the fancy calculation a raytracing renderer uses baked into the video material. I am not huge in real-time VFX processing, so I am not sure if it is the exact same approaches used for these different cases, but as Photoshop’s DoF filter has a very long calculation time, I would guess they do not.

I am a 3D artist and not really a very technical one - at least not when it comes to real-time effects. Is this advanced when it comes to openGL or is it pretty basic? How difficult would it be to code a basic tech demo of an .mp4 video source being blurred from mouse location on the video canvas, and the blur radius/intensity defined from the Z-depth pass?

Without any optimisation, I’m getting just under 60 fps for a radius of 8 (diameter of 16) blurring a 1920x1080 texture. There are quite a lot of things which could be done to improve upon that.

Test code:


const float pi = 3.141592653589793;

uniform sampler2D tex;
uniform float radius;

void main()
{
    vec2 size = vec2(textureSize(tex, 0));
    int r = int(ceil(radius - 0.5));
    vec3 color = vec3(0, 0, 0);
    float k = 0.9342/(r*r);
    for (int y = -r; y <= r; y++) {
        for (int x = -r; x <= r; x++) {
            float d = length(vec2(x, y));
            if (d >= r)
                continue;
            float weight = k * (cos(pi*d/r) + 1) / 2;
            color += textureLod(tex, gl_TexCoord[0].xy + vec2(x,y)/size, 0.0).rgb * weight;
        }
    }
    gl_FragColor = vec4(color, 1.0);

Thank you for the quick shot at this.

Is the blurring the most processing demanding task in a theoretic DoF tech demo or is the handling of the Z-depth map significant too?

What about 4k 30fps material, I guess that would require a lot of optimizing?

This is an example of what I want from a tech demo:

Color pass:
[ATTACH=CONFIG]1166[/ATTACH]

Z-depth pass:
[ATTACH=CONFIG]1167[/ATTACH]

DoF (exaggerated):
[ATTACH=CONFIG]1168[/ATTACH]