Using glViewportArray for stereoscopic rendering on AMD cards

glnoob · October 13, 2018, 3:25pm

I’ve implemented single-pass stereoscopic rendering using Nvidia’s extension to double up the geometry and draw to a second viewport:
https://www.khronos.org/registry/OpenGL/extensions/NV/NV_stereo_view_rendering.txt
https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glViewportArray.xhtml

Now I am writing a rendering path for AMD and Intel graphics, and I would like to optimize it as much as possible. I plan to set two viewports and then render all objects twice, once in each viewport. Without the Nvidia extension, what is the normal way that you are supposed to select a viewport for rendering?

glnoob · October 13, 2018, 4:07pm

From the spec:

Multiple viewports are available and are numbered zero through the value of MAX_VIEWPORTS minus one. If a geometry shader is active and writes to gl_- ViewportIndex, the viewport transformation uses the viewport corresponding to the value assigned to gl_ViewportIndex taken from an implementationdependent primitive vertex. If the value of the viewport index is outside the range zero to the value of MAX_VIEWPORTS minus one, the results of the viewport transformation are undefined. If no geometry shader is active, or if the active geometry shader does not write to gl_ViewportIndex, the viewport numbered zero is used by the viewport transformation.

It appears that the only way you can specify a viewport to render to is by outputting a value in a geometry shader. Is that correct? As I understand it geometry shaders are very slow on discrete cards.

arekkusu · October 13, 2018, 5:18pm

Or with ARB_shader_viewport_layer_array or GL_NV_viewport_array2 or AMD_vertex_shader_viewport_index.

Alfonse_Reinheart · October 13, 2018, 5:39pm

As I understand it geometry shaders are very slow on discrete cards.

It all depends on what you’re trying to do. Compared to the way you’re rendering now, it would probably be faster to use a Geometry Shader. Why? Fewer draw calls and state changes.

Your current solution is to render everything twice. My alternative is to render everything once, but have the GS emit two primitives. Now, that could cause performance problems, since the GS would have to transform each triangle twice in sequence (not to mention other sequential issues). You can side-step most of that by using GS instancing.

Your GS would have two instances, and each instance would emit a single primitive to a single viewport/layer. This allows us to recover some parallelism, and can better allow the implementation to use the available resources.

The problematic part of your ARB_shader_viewport_layer_array (or its equivalents) version is that it involves rendering every object twice, almost certainly with a state change between them (presumably, a uniform value describing which viewport/layer to use). You can avoid that in one of two ways.

You can use instanced rendering, rather than render twice. You would use the gl_InstanceID to tell which viewport/layer for the VS to output to. Of course, this can only work if you’re not already using instancing for something else.
You can use multi-draw rendering and gl_DrawID, which is available through OpenGL 4.6 or ARB_shader_draw_parameters. Here, you use the multi-draw rendering command to render the same object twice. gl_DrawID would be used to detect which viewport/layer to write to.

Note that I haven’t implemented or profiled any of these, so feel free to take this advice with a grain of salt.

Dark_Photon · October 13, 2018, 10:08pm

[QUOTE=Alfonse Reinheart;1292749]My alternative is to render everything once, but have the GS emit two primitives. Now, that could cause performance problems, since the GS would have to transform each triangle twice in sequence (not to mention other sequential issues). You can side-step most of that by using GS instancing.

Your GS would have two instances, and each instance would emit a single primitive to a single viewport/layer. This allows us to recover some parallelism, and can better allow the implementation to use the available resources.[/QUOTE]

I’ve done that on NVidia (w/ and w/o geom shader instancing) and the results weren’t pretty. This is about the same speed as drawing the geometry twice from the CPU.

One option the OP might consider checking out on the GPUs you care about is OVR_multiview / OVR_multiview2. From what I gather, these are like NV_stereo_view_rendering in that you only submit the geometry once, but it let’s you modify more varying data per viewport than just gl_Position.x, potentially supports more than 2 viewports, and doesn’t require you to use geometry shaders (which are just flat slow if you’re pumping more than a small amount of geometry through them per frame).

NVidia just add support for these 2 OVR extensions in their latest drivers. On Pascal, reportedly this internally and transparently instances the geometry (per viewport), but supposedly on Turing / RTX it sounds like this runs a single, wide, unified geometry pass for all viewports.

UPDATE: Checking gpuinfo.org for OVR_multiview reports, it looks like NVidia may be the only one out there with this support on desktop GL GPUs.

vx7johmi · October 17, 2018, 1:38am

Did some work around “stereo instancing” a couple of years ago (before the “NV_stereo_view_rendering” extension):

Efficient Stereoscopic Rendering of Building Information Models (BIM)
http://jcgt.org/published/0005/03/01/paper-lowres.pdf

-Paper includes GLSL shader-code for several options, also how to do it without any extension for viewport selection in vertex shader
-Only used this on NVIDIA with “NV_viewport_array2”, but should be the same on AMD with “AMD_vertex_shader_viewport_index”
-Paper also includes code on how to combine this with conventional instancing (basically, using the modulus operator (%) to distinguish between left and right instances)

Please let me know if you need some more info!

/Mikael

glnoob · October 17, 2018, 6:57pm

Thank you all for the information. This thread is very helpful.