Low rendering performance

The problem is that i have a relatively low FPS while rendering a scene:

FPS: 20
Triangles: 274180
Batches (DIPs): 11
MTriangles/s: about 5.3

Geometry is a fullscreen terrain with bumpmapping and an keyframe-interpolated bumpmapped model.

GeForce 6600 256Mb
P-IV Northwood 2.2 GHz

Internal GPU timing says that GPU idle time is within 2-5%

What could be the problem? (or is it realy a problem?)

Do you store all your data in VBO?
Does FPS increase when you switch your screen resoulution DOWN (fill-rate limited) or not?
Many-many question, you gave very little useful information to say anything on that.

VBO with GL_STATIC_DRAW_ARB flag is used for both models.

In 320x240 FPS increases up to 25 FPS (VS 20 FPS in 800x600). Seems i’m not fill-rate limited.

I’m ready to answer that many questions. But i don’t know what to ask :frowning:

At developer.nvidia.com there are some papers/slides about how to measure, where your bottlenecks are. When you read those, you will have many more ideas, what questions to ask :slight_smile:

Jan.

Geometry is a fullscreen terrain with bumpmapping and an keyframe-interpolated bumpmapped model.
Sounds like you’re probably fillrate/shader limited. Are you doing your bump mapping efficiently? What happens if you turn it off?

If i turn the camera away from the geometry so that nothing is visible FPS increases up to 30 and GPU idle time is 20-25%

Here is the code of my shaders:

/*VERTEX_PROGRAM*/

varying vec2 TexCoord0;
varying vec2 TexCoord1;
varying vec4 TexCoord2;

varying vec3 Normal;
varying vec3 Tangent; 
varying vec3 Binormal; 
varying vec3 LightDir;

varying vec3 VertexPosition;

uniform mat4 SHADOW_PROJECTION;

void main()
{
   gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;

   TexCoord0 = vec2(gl_MultiTexCoord0);
   TexCoord1 = vec2(gl_MultiTexCoord1);
   TexCoord2 = SHADOW_PROJECTION * gl_ModelViewMatrix * gl_Vertex;

   VertexPosition = vec3(gl_Vertex);

  LightDir = vec3(gl_ModelViewMatrix * vec4(0.08248, 0.08249, -0.816497, 0.0) );

  /* Compute fake tangent space */
  Normal = vec3( 0.0, 0.0, 1.0 );
  Binormal = normalize(cross(Normal + vec3(3.0,2.0,1.0), Normal));
  Tangent = normalize(cross(Normal, Binormal));
}


/*FRAGMENT_PROGRAM*/

uniform sampler2D       Texture0; /* diffuse */
uniform sampler2D       Texture1; /* lightmap */
uniform sampler2D       Texture2; /* shadow map */
uniform sampler2D       Texture3; /* detail blending */
uniform sampler2D       Texture4; /* detail1         */
uniform sampler2D       Texture5; /* detail2         */
uniform sampler2D       Texture6; /* detail3         */
uniform sampler2D       Texture7; /* detail4         */

varying vec2 TexCoord0;
varying vec2 TexCoord1;
varying vec4 TexCoord2;

varying vec3 Normal;
varying vec3 Tangent; 
varying vec3 Binormal; 
varying vec3 LightDir;

varying vec3 VertexPosition;

vec4 Ambient;
vec4 Diffuse;

uniform vec4 LIGHT_POS;

    void PointLight(vec3 BumpNormal)
    {
        /* Compute vector from surface to light position */
        vec3 VP = vec3(LIGHT_POS) - VertexPosition;

        /* Compute distance between surface and light position */
        float d = length(VP);

        /* Normalize the vector from surface to light position */
        VP = normalize(VP);

        /* Compute attenuation */
        float constAttenuation  = 0.01;
        float linearAttenuation = 0.001;
        float quadraticAttenuation = 0.002;
       
        float attenuation = 1.0 / (constAttenuation +
                                   linearAttenuation * d +
                                   quadraticAttenuation * d * d);

        float nDotVP = max(0.0, dot(BumpNormal, VP));

        vec4 ambient = vec4( 0.05, 0.01, 0.05, 0.0 );
        vec4 diffuse = vec4( 0.5,  0.1, 0.7, 0.0 );

        Ambient  += ambient * attenuation;
        Diffuse  += diffuse * nDotVP * attenuation;
    } 

void main()
{
   Ambient = vec4( 0.0, 0.0, 0.0, 0.0 );
   Diffuse = vec4( 0.0, 0.0, 0.0, 0.0 );

   vec2 Coord = vec2( TexCoord0.x, 1.0 - TexCoord0.y );

   vec4 Color       = texture2D( Texture0, Coord );
   vec4 Lightmap    = texture2D( Texture1, Coord );
   vec4 DetailBlend = texture2D( Texture3, Coord );

   vec3 BumpMap = - (vec3(texture2D(Texture5, Coord)) * 2.0 - 1.0);

   BumpMap = BumpMap.x * normalize(Tangent.xyz) + 
             BumpMap.y * normalize(Binormal.xyz) +
             BumpMap.z * normalize(Normal.xyz);

   vec3 lightDir = normalize(LightDir);

   vec3 normal = normalize( BumpMap );

   PointLight( normal );

   float diffuse = max( 0.0, 0.4 * dot(lightDir, normal) ) + 0.6;

   /* calculate details */   
   vec4 Detail1 = texture2D( Texture4, TexCoord1*0.25 );
   vec4 Detail2 = texture2D( Texture5, TexCoord1*0.2 ); 
   vec4 Detail3 = texture2D( Texture6, TexCoord1*0.3 );
   vec4 Detail4 = texture2D( Texture7, TexCoord1*0.05 );

   float Sum = dot( DetailBlend, vec4( 1.0, 1.0, 1.0, 1.0 ) );

   if ( Sum < 1.0 ) Sum = 1.0;

   vec4 Detail = ((DetailBlend.x * Detail1 + 
                  DetailBlend.y * Detail2 + 
                  DetailBlend.z * Detail3)*1.4 +
                  DetailBlend.w * Detail4*0.6) / Sum;

   Detail = Detail - vec4( 0.5, 0.5, 0.5, 0.0 );

   gl_FragColor = ( diffuse + Diffuse ) * 0.9 * Color * Lightmap + 0.25*Detail;
}

It looks like you’re not fillrate limited.
Your vertex shader is not complex.
So it’s either your application that eats up too much CPU or you’re just limited by your GPU’s vertex processing power.
You can get more FPS by reordering vertices in your terrain grid to make them optimized for GPU vertex cache.

I think, as k_szczech has mentioned, the main problem is inproper CPU usage.
What are you doing in general? How do you clip your geometry? Do you have some strong CPU processing? That points out from your remark about turning away from landscape…

GPU idle time
How are you computing this value?

It looks like you’re not fillrate limited.
I wouldn’t go that far.

That shader is doing 8 texture accesses. I can’t imagine that this will be fast on a GeForce 6600.

Now, the fact that turning away from the geometry doesn’t increase the framerate a whole lot does suggest that there are other limits just behind the fillrate. However, it seems perfectly reasonable to suggest that this shader be properly simplified for lower-end cards.

How are you computing this value?
Via NVPM library and instrumented OpenGL driver.

That shader is doing 8 texture accesses.
But it seems that’s not realy a problem. With changing this shader to just constant color ( gl_FragColor=vec4(1,1,1,0); ) i still have 28-30 FPS - probably too few.

And i’m not doing any non-graphics computations.
Maybe problem could be in a missusage of the API somehow?

Maybe problem could be in a missusage of the API somehow?
Hard to say without seeing the full monty (not that I want to).

At this point, ya might want to go ahead and follow standard operating procedures.

seeing the full monty
Oops…That’s about 500kb of rendering code…

Where it is possible to find good recomendations of API usage?

I reckon you’re using a vertex buffer larger than the optimal limit. Try breaking your terrain mesh into smaller chunks.

There could be other reasons like many redundent calls. Ex : Calling glBindTexture with the same texture ID.
Or your binding the same VBO and calling glVertexPointer and the other functions redundently.
I’ve seen this kind of behavior in the Irlicht engine. I’m sure you are not using Irlicht, I just felt like pointing that out.

What about vsync?
Are you able to run other programs and get 500FPS?
In your VBO, what is your vertex format? vertex? normals? texcoords? float or bytes?
etc. etc.

I reckon you’re using a vertex buffer larger than the optimal limit.
I guess i do :frowning:
What is the recommended limit for VB and for glDrawElements() ?

Calling glBindTexture with the same texture ID
No. State changes are presorted and all redundant calls are removed. glVertexPointer() is called 11 times per frame. Vsync is disabled.

Simple OpenGL programs like small rotating textured triangle give more than 1000 FPS on this machine.

VBO format is: vertex (3 floats) + normals (3 floats) + texcoords (2 floats)

Looking at developer.nvidia.com it took me 3 seconds to find this document:
http://download.nvidia.com/developer/presentations/GDC_2004/GDC2004_PracticalPerformanceAnalysis.pdf

You should read it and follow the “standard procedures”. Afterwards you will know where your bottlenecks are and then you can start optimizing them. Without knowing your bottlenecks, it doesn’t make sense starting to optimize something, because, even if you make it faster, you might not see a difference, at all, as long as some other bottleneck is more prominent.

Jan.

But it seems that’s not realy a problem. With changing this shader to just constant color ( gl_FragColor=vec4(1,1,1,0); ) i still have 28-30 FPS - probably too few.
Yes. As I pointed out, there are multiple problems here. You’re fillrate bound until you minimize your shader, after which point you’re probably CPU or vertex transfer bound.

I suspect that there is some form of pathological API usage. Unintended, of course, but it’s probably there.

I seem to remember there is a threshold on nvidia cards after which, if you store too many elements in a buffer, it becomes quite slow to render.

Make sure you don’t use a “strange” vertex format. XYZ, normals and UVs should be floats. Color, 4 unsigned bytes. Indices, unsigned int. Anything else is dangerous.

Y.

Originally posted by Sergey K.:
glVertexPointer() is called 11 times per frame.

Well that could be a reason… You should setup your VBO just once if it’s static, and you said it is.

Well that could be a reason…
No, that is unlikely.

Most applications will have to call it once per rendered object. Having 11 calls suggests having 11 objects, which is not alot.

A Direct3D implementation running on a “1GHz” processor could handle 10,000 Draw*Primitive calls, each of which provoked a switch to the driver (and thus a CPU switch out of protected mode). Not being able to do more than 330 (11 * 30fps) glVertexPointer calls per-second makes no sense.

I can render loads more than 330 polygons per second using immediate mode, each of which implicitly needs to do whatever glVertexPointer does.