Part of the Khronos Group

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 4 of 4

Thread: Grass shader

  1. #1
    Junior Member Newbie
    Join Date
    Jun 2018

    Grass shader

    I want to optimize rendering of the grass field. Currently i am using instanced rendering to draw it, vbo of transform matrices and not duplicating by geometry shader. The idea is to cull grass by radius from the camera(can be optimized with bvh tree) and do frustum culling on the cpu and send an array of ints if its visible or not as another vbo. Can i access each instance of mesh in geometry shader and simply not emit it? Is this a good idea?

  2. #2
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Quote Originally Posted by someRand View Post
    Can i access each instance of mesh in geometry shader and simply not emit it? Is this a good idea?
    If you pass your instance list down to the geometry shader, you can cull your instances and assign them to LOD bins in a pre-pass prior to your instanced geometry renders. This can be useful when you have a few things (100s-1000s) that are each expensive to render (especially at higher LODs) and you don't want to cull and LOD them on the CPU.

    However, generally speaking the geometry shader is slow. If you only have 1 LOD and the cost of transforming a single instance is small (vertex shader work), it makes little sense to try and protect against that work using a per-instance geometry shader.

    Have you tried culling batches of grass on the CPU and just faded out the clumps before they cull out?

  3. #3
    Junior Member Newbie
    Join Date
    Jun 2018
    There are no lods, grass model is already simple and the culling is done on the CPU, however, there is little optimization (5 fps).
    Lod can be made by drawing another grass over this one at lower density as separate draw calls (from 0 to 20 draw high density grass, from 20 to 40 draw low density grass).
    Here is what i got so far.
    Vertex shader:
    Code :
    #version 330
    in vec3 position;
    in vec2 uv;
    in vec3 offset;   //array of offsets, instanced rendering
    in int visibility;  //array of visibility, 0 not visible, 1 visible
    //directional light
    struct DirectionalLight{
        vec4 color;
        vec4 specular;
        vec4 direction;
        mat4 matrixA;
        mat4 matrixB;
        mat4 matrixC;
        mat4 matrixD;
        float intensity;
        int useShadows;
    layout (std140) uniform perCamera{
        DirectionalLight dirLight;
        vec4 cameraPos;
        mat4 cameraMat;
        mat4 cameraMatInverse;
        mat4 projectionMat;
        mat4 projectionMatInverse;
        mat4 projectionCameraMatInverse;
        vec4 ambientLight;              
        float minimumAmbient;
        float zNear;
        float zFar;
        int fps;
    } pc;
    out vec2 uv0;
    flat out int vis0;
    uniform mat4 entityMat;     //matrix of entity transform
    uniform float time;
    uniform vec3 obstacles[50];
    uniform float obstaclesRadius[50];
    uniform int numObstacles;
    void calculateObstacle(inout vec4 worldPos, in float radius, in vec3 obs){
        float dist = distance(obs,;
        float circle = 1.0 - clamp(dist / radius, 0.0, 1.0);
        vec3 sphereDisp = - obs;
        sphereDisp *= circle;
        vec3 dir = normalize( - obs);
        worldPos.xz += sphereDisp.xz* 2.2;
    void main(){
    	//vertex transform
    	vec4 worldPos = entityMat * vec4(position + offset,1);
        //iteractive grass
        if(position.y > 0.5){
            for(int i = 0; i < numObstacles; i++)
                calculateObstacle(worldPos, obstaclesRadius[i], obstacles[i]);
            worldPos.x += sin(time * 1.2) * 0.08;
            worldPos.z += cos(time * 1.2) * 0.08;
        gl_Position = pc.projectionMat * pc.cameraMat * worldPos;
    	uv0 = uv;
        vis0 = visibility;

    Fragment shader:
    Code :
    #version 330
    uniform sampler2D samplers[4];
    in vec2 uv0;
    flat in int vis0;
    layout(location = 0) out vec4 out_g_worldNormalSpecPower;
    layout(location = 1) out vec4 out_g_albedoSpecIntesity;
    layout(location = 2) out vec4 out_g_unusedShadeless;
    void main(){
        //if(vis0 == 0) discard;  //the optimization
    	vec4 difColor = vec4(1,0,0,1);
    	vec4 specColor = vec4(1,1,1,1);
    	float specularIntensity = 0.3;
        difColor = texture2D(samplers[0], uv0);
    	if(difColor.a < 0.2) discard;
    	//out to g buffer
    	out_g_worldNormalSpecPower = vec4(0,1,0, 10.0);
    	out_g_albedoSpecIntesity = vec4(, specularIntensity);
    	out_g_unusedShadeless = vec4(0,0,0,0);
    Simple cpu culling for testing:
    Code :
    void GrassRenderer::updateVisibility(const glm::vec3& camPos){
    	auto trs = thisEntity()->transform()->getTransformMatrix();
    	for (unsigned i = 0; i < _offsets.size(); ++i) {
    		glm::vec3 point = trs * glm::vec4(_offsets[i], 1.0f);
    		if (glm::distance(camPos, point) < 20.0f)
    			_visibility[i] = 1;    //will change this to float and add some interpolation so instances on edge scale and fade instead of popping.
    			_visibility[i] = 0;

    The idea was to skip instances that are not visible from rendering, i thought that geometry shader could do this.
    This is what i wanted

    Here are some results from rendering 5000 instances on gtx850m.
    Click image for larger version. 

Name:	no opt.jpg 
Views:	47 
Size:	20.2 KB 
ID:	2918
    Click image for larger version. 

Name:	opt.jpg 
Views:	35 
Size:	20.7 KB 
ID:	2919

    Is there any better way to optimize this?


  4. #4
    Junior Member Regular Contributor
    Join Date
    May 2013
    Each instance has overhead. So if you only draw very few triangles per instance the end result will have bad performance compared to the amount of triangles you draw.

    I was trying to find some older benchmarks I think tera nova engine did on desktop PCs for this kind of stuff. But was unable to find them. From memory I think you have to get in the range of a few thousand triangles per instance. AMD cards preferring a higher triangle count then Nvidia.

    Found it, was the outerra engine:
    Last edited by Osbios; 12-27-2018 at 11:33 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts