I had posted it some time ago in a rather hidden place here, this is a bump.
Uses the strengths of ZCULL/HiZ, and is probably easy to add in future silicon:
// this instance-shader is called once per instance
// all of these uniforms below are user-specified, not expected by GL
// added tokens: gl_OcclusionBBMin and gl_OcclusionBBMax
uniform mat4 uniMVP; // matrix projection-view, or projection-view-world (in case of portals, clustering)
uniform vec4 uniFrustumPlanes[6];
uniform float uniBoundingSphereRadius;
bindable uniform vec3 buniInstancePosition[]; // element at index gl_InstanceID is used here
bindable uniform mat3 buniInstanceRotation[]; // element at index gl_InstanceID is used here
uniform vec3 uniBoundingVolumeVerts[3*12]; // a convex box in this case. Could be something more obscure. Could be dependent on gl_InstanceID.
void main(){
vec4 pos = uniMVP * buniInstancePosition[gl_InstanceID];
if(m_ClipSphereByFrustrumPlanes(pos)){
clip();return;
}
mat3 rot = buniInstanceRotation[gl_InstanceID];
mat4 nodeTransform = uniMVP * m_Make4x4FromPosAndRot(pos,rot);
vec4 minXYZW = vec3(1.e+5,1.e+5,1.e+5,1.e+5);
vec4 maxXYZW = vec2(-1.e+5,-1.e+5,-1.e+5,-1.e+5);
//------[ secondary rough occlusion test via a lowest-poly mesh ]--------[
// a box, consisting of 12 triangles is used here, and 12 can be the
// imposed maximum count of triangles to test occlusion with.
// Uses ZCULL and optionally EarlyZ
// (ZCULL being roughest, fastest z-culling test,
// EarlyZ being fast but less rough z-culling test)
for(int tri=0;tri<12;tri++){
for(int v=0;v<3;v++){
vec4 vpos = nodeTransform * uniBoundingVolumeVerts[tri*3+v];
gl_Position = vpos;
minXYZW = min(minXYZW,vpos);
maxXYZW = max(maxXYZW,vpos);
EmitVertex();
}
EndPrimitive();
}
//----------------------------------------------------------------------/
//----[ primary, roughest occlusion test via a screen-aligned quad ]---------[
// uses only ZCULL. If it doesn't pass ZCULL, the secondary test is skipped.
gl_OcclusionBBMin = minXYZW;
gl_OcclusionBBMax = maxXYZW;
//---------------------------------------------------------------------------/
}
bool m_ClipSphereByFrustrumPlanes(in vec4 pos){
// here use uniFrustumPlanes and uniBoundingSphereRadius to do preliminary frustum culling
}
mat4 m_Make4x4FromPosAndRot(in vec4 pos,in mat3 rot){
// some maths
}
Notes:
The triangles from the secondary rough-occlusion test do not modify z-buffer, color-buffers or stencil-buffer. Those triangles are not further transformed by the currently-bound vertex shader, and do not use the currently-bound fragment shader. The result from the shader is a single bool (stored internally in a bit, byte, int). The shader is executed before drawing the mesh-instance, or preemptively executed for several mesh-instances. The latter version improves usage of paralellism, but can give false positives (i.e instance 4 is occluding instance 7, but #7 being regarded as visible, as we have batch-computed the visibility of instances 0…10).
Further improvement:
Addition of “int gl_IBO_FirstIndex=0”, “gl_IBO_Length” and “int gl_VBO_FirstIndex=0”, to specify what range of the VBO and IBO (index buffer) this mesh-instance should use.
This can be used to let the shader select a LOD version of a model, or use a different model altogether (but still with the same bound shaders, render-states and render-targets).
Further optional improvement:
Have the gpu write results from the instance-shader to a byte-buffer-object, created by the user. That buffer is initially reset to “true” for all instanceIDs, and is required to be at least NumInstances big. If an object is occluded (as decided roughly by the instance-shader and its querying of ZCULL via those triangles), then gl_InstanceVisible[gl_InstanceID]=false; . The user can then use glMapBufferRange to retrieve occlusion info.
This can be used as feedback on which instances were drawn, and to do cpu-side computation regarding the result.