PDA

View Full Version : Low rendering performance



Sergey K.
06-11-2007, 01:39 PM
The problem is that i have a relatively low FPS while rendering a scene:

FPS: 20
Triangles: 274180
Batches (DIPs): 11
MTriangles/s: about 5.3

Geometry is a fullscreen terrain with bumpmapping and an keyframe-interpolated bumpmapped model.

GeForce 6600 256Mb
P-IV Northwood 2.2 GHz

Internal GPU timing says that GPU idle time is within 2-5%

What could be the problem? (or is it realy a problem?)

Jackis
06-11-2007, 02:13 PM
Do you store all your data in VBO?
Does FPS increase when you switch your screen resoulution DOWN (fill-rate limited) or not?
Many-many question, you gave very little useful information to say anything on that.

Sergey K.
06-11-2007, 02:17 PM
VBO with GL_STATIC_DRAW_ARB flag is used for both models.

In 320x240 FPS increases up to 25 FPS (VS 20 FPS in 800x600). Seems i'm not fill-rate limited.


I'm ready to answer that many questions. But i don't know what to ask :(

Jan
06-11-2007, 03:12 PM
At developer.nvidia.com there are some papers/slides about how to measure, where your bottlenecks are. When you read those, you will have many more ideas, what questions to ask :)

Jan.

Korval
06-11-2007, 05:45 PM
Geometry is a fullscreen terrain with bumpmapping and an keyframe-interpolated bumpmapped model.Sounds like you're probably fillrate/shader limited. Are you doing your bump mapping efficiently? What happens if you turn it off?

Sergey K.
06-12-2007, 12:56 AM
If i turn the camera away from the geometry so that nothing is visible FPS increases up to 30 and GPU idle time is 20-25%

Here is the code of my shaders:


/*VERTEX_PROGRAM*/

varying vec2 TexCoord0;
varying vec2 TexCoord1;
varying vec4 TexCoord2;

varying vec3 Normal;
varying vec3 Tangent;
varying vec3 Binormal;
varying vec3 LightDir;

varying vec3 VertexPosition;

uniform mat4 SHADOW_PROJECTION;

void main()
{
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;

TexCoord0 = vec2(gl_MultiTexCoord0);
TexCoord1 = vec2(gl_MultiTexCoord1);
TexCoord2 = SHADOW_PROJECTION * gl_ModelViewMatrix * gl_Vertex;

VertexPosition = vec3(gl_Vertex);

LightDir = vec3(gl_ModelViewMatrix * vec4(0.08248, 0.08249, -0.816497, 0.0) );

/* Compute fake tangent space */
Normal = vec3( 0.0, 0.0, 1.0 );
Binormal = normalize(cross(Normal + vec3(3.0,2.0,1.0), Normal));
Tangent = normalize(cross(Normal, Binormal));
}


/*FRAGMENT_PROGRAM*/

uniform sampler2D Texture0; /* diffuse */
uniform sampler2D Texture1; /* lightmap */
uniform sampler2D Texture2; /* shadow map */
uniform sampler2D Texture3; /* detail blending */
uniform sampler2D Texture4; /* detail1 */
uniform sampler2D Texture5; /* detail2 */
uniform sampler2D Texture6; /* detail3 */
uniform sampler2D Texture7; /* detail4 */

varying vec2 TexCoord0;
varying vec2 TexCoord1;
varying vec4 TexCoord2;

varying vec3 Normal;
varying vec3 Tangent;
varying vec3 Binormal;
varying vec3 LightDir;

varying vec3 VertexPosition;

vec4 Ambient;
vec4 Diffuse;

uniform vec4 LIGHT_POS;

void PointLight(vec3 BumpNormal)
{
/* Compute vector from surface to light position */
vec3 VP = vec3(LIGHT_POS) - VertexPosition;

/* Compute distance between surface and light position */
float d = length(VP);

/* Normalize the vector from surface to light position */
VP = normalize(VP);

/* Compute attenuation */
float constAttenuation = 0.01;
float linearAttenuation = 0.001;
float quadraticAttenuation = 0.002;

float attenuation = 1.0 / (constAttenuation +
linearAttenuation * d +
quadraticAttenuation * d * d);

float nDotVP = max(0.0, dot(BumpNormal, VP));

vec4 ambient = vec4( 0.05, 0.01, 0.05, 0.0 );
vec4 diffuse = vec4( 0.5, 0.1, 0.7, 0.0 );

Ambient += ambient * attenuation;
Diffuse += diffuse * nDotVP * attenuation;
}

void main()
{
Ambient = vec4( 0.0, 0.0, 0.0, 0.0 );
Diffuse = vec4( 0.0, 0.0, 0.0, 0.0 );

vec2 Coord = vec2( TexCoord0.x, 1.0 - TexCoord0.y );

vec4 Color = texture2D( Texture0, Coord );
vec4 Lightmap = texture2D( Texture1, Coord );
vec4 DetailBlend = texture2D( Texture3, Coord );

vec3 BumpMap = - (vec3(texture2D(Texture5, Coord)) * 2.0 - 1.0);

BumpMap = BumpMap.x * normalize(Tangent.xyz) +
BumpMap.y * normalize(Binormal.xyz) +
BumpMap.z * normalize(Normal.xyz);

vec3 lightDir = normalize(LightDir);

vec3 normal = normalize( BumpMap );

PointLight( normal );

float diffuse = max( 0.0, 0.4 * dot(lightDir, normal) ) + 0.6;

/* calculate details */
vec4 Detail1 = texture2D( Texture4, TexCoord1*0.25 );
vec4 Detail2 = texture2D( Texture5, TexCoord1*0.2 );
vec4 Detail3 = texture2D( Texture6, TexCoord1*0.3 );
vec4 Detail4 = texture2D( Texture7, TexCoord1*0.05 );

float Sum = dot( DetailBlend, vec4( 1.0, 1.0, 1.0, 1.0 ) );

if ( Sum < 1.0 ) Sum = 1.0;

vec4 Detail = ((DetailBlend.x * Detail1 +
DetailBlend.y * Detail2 +
DetailBlend.z * Detail3)*1.4 +
DetailBlend.w * Detail4*0.6) / Sum;

Detail = Detail - vec4( 0.5, 0.5, 0.5, 0.0 );

gl_FragColor = ( diffuse + Diffuse ) * 0.9 * Color * Lightmap + 0.25*Detail;
}

k_szczech
06-12-2007, 03:35 AM
It looks like you're not fillrate limited.
Your vertex shader is not complex.
So it's either your application that eats up too much CPU or you're just limited by your GPU's vertex processing power.
You can get more FPS by reordering vertices in your terrain grid to make them optimized for GPU vertex cache.

Jackis
06-12-2007, 09:16 AM
I think, as k_szczech has mentioned, the main problem is inproper CPU usage.
What are you doing in general? How do you clip your geometry? Do you have some strong CPU processing? That points out from your remark about turning away from landscape...

Korval
06-12-2007, 10:36 AM
GPU idle timeHow are you computing this value?


It looks like you're not fillrate limited.I wouldn't go that far.

That shader is doing 8 texture accesses. I can't imagine that this will be fast on a GeForce 6600.

Now, the fact that turning away from the geometry doesn't increase the framerate a whole lot does suggest that there are other limits just behind the fillrate. However, it seems perfectly reasonable to suggest that this shader be properly simplified for lower-end cards.

Sergey K.
06-12-2007, 11:07 AM
How are you computing this value?Via NVPM library and instrumented OpenGL driver.


That shader is doing 8 texture accesses.But it seems that's not realy a problem. With changing this shader to just constant color ( gl_FragColor=vec4(1,1,1,0); ) i still have 28-30 FPS - probably too few.

And i'm not doing any non-graphics computations.
Maybe problem could be in a missusage of the API somehow?

Brolingstanz
06-12-2007, 11:19 AM
Maybe problem could be in a missusage of the API somehow?Hard to say without seeing the full monty (not that I want to).

At this point, ya might want to go ahead and follow standard operating procedures.

Sergey K.
06-12-2007, 11:24 AM
seeing the full montyOops...That's about 500kb of rendering code...

Where it is possible to find good recomendations of API usage?

knackered
06-12-2007, 12:28 PM
I reckon you're using a vertex buffer larger than the optimal limit. Try breaking your terrain mesh into smaller chunks.

V-man
06-12-2007, 01:14 PM
There could be other reasons like many redundent calls. Ex : Calling glBindTexture with the same texture ID.
Or your binding the same VBO and calling glVertexPointer and the other functions redundently.
I've seen this kind of behavior in the Irlicht engine. I'm sure you are not using Irlicht, I just felt like pointing that out.

What about vsync?
Are you able to run other programs and get 500FPS?
In your VBO, what is your vertex format? vertex? normals? texcoords? float or bytes?
etc. etc.

Sergey K.
06-12-2007, 01:27 PM
I reckon you're using a vertex buffer larger than the optimal limit.I guess i do :(
What is the recommended limit for VB and for glDrawElements() ?


Calling glBindTexture with the same texture IDNo. State changes are presorted and all redundant calls are removed. glVertexPointer() is called 11 times per frame. Vsync is disabled.

Simple OpenGL programs like small rotating textured triangle give more than 1000 FPS on this machine.

VBO format is: vertex (3 floats) + normals (3 floats) + texcoords (2 floats)

Jan
06-12-2007, 02:16 PM
Looking at developer.nvidia.com it took me 3 seconds to find this document:
http://download.nvidia.com/developer/presentations/GDC_2004/GDC2004_PracticalPerformanceAnalysis.pdf

You should read it and follow the "standard procedures". Afterwards you will know where your bottlenecks are and then you can start optimizing them. Without knowing your bottlenecks, it doesn't make sense starting to optimize something, because, even if you make it faster, you might not see a difference, at all, as long as some other bottleneck is more prominent.

Jan.

Korval
06-12-2007, 03:34 PM
But it seems that's not realy a problem. With changing this shader to just constant color ( gl_FragColor=vec4(1,1,1,0); ) i still have 28-30 FPS - probably too few.Yes. As I pointed out, there are multiple problems here. You're fillrate bound until you minimize your shader, after which point you're probably CPU or vertex transfer bound.

I suspect that there is some form of pathological API usage. Unintended, of course, but it's probably there.

Ysaneya
06-13-2007, 01:36 AM
I seem to remember there is a threshold on nvidia cards after which, if you store too many elements in a buffer, it becomes quite slow to render.

Make sure you don't use a "strange" vertex format. XYZ, normals and UVs should be floats. Color, 4 unsigned bytes. Indices, unsigned int. Anything else is dangerous.

Y.

HellKnight
06-13-2007, 03:06 PM
Originally posted by Sergey K.:
glVertexPointer() is called 11 times per frame.
Well that could be a reason... You should setup your VBO just once if it's static, and you said it is.

Korval
06-13-2007, 04:36 PM
Well that could be a reason...No, that is unlikely.

Most applications will have to call it once per rendered object. Having 11 calls suggests having 11 objects, which is not alot.

A Direct3D implementation running on a "1GHz" processor could handle 10,000 Draw*Primitive calls, each of which provoked a switch to the driver (and thus a CPU switch out of protected mode). Not being able to do more than 330 (11 * 30fps) glVertexPointer calls per-second makes no sense.

I can render loads more than 330 polygons per second using immediate mode, each of which implicitly needs to do whatever glVertexPointer does.

Sergey K.
06-13-2007, 11:03 PM
The part of the code where the vertices are submitted to GPU looks like:


int VertexOffset = FKeyframeNum * GetActiveVertexCount();
int NextFrameVertexOffset = FNextKeyframeNum * GetActiveVertexCount();

glVertexAttribPointerARB( 1, 3, GL_FLOAT, GL_FALSE, 0, FVertices + NextFrameVertexOffset );
glEnableVertexAttribArrayARB( 1 );

glEnableClientState(GL_VERTEX_ARRAY);

// normals
if (FNormals)
{
glNormalPointer(GL_FLOAT, 0, FNormals + VertexOffset );
glEnableClientState(GL_NORMAL_ARRAY);
}
else
{
glDisableClientState(GL_NORMAL_ARRAY);
}
// textures
for (int Tex = FrameBuffer->GetRendererExtensions()->GetMaxTextureUnits()-1;
Tex >= 0; --Tex )
{
glActiveTextureARB(GL_TEXTURE0+Tex);
glClientActiveTextureARB(GL_TEXTURE0+Tex);

if ( Tex <= static_cast<int>( FTexCoords.size() ) )
{
glTexCoordPointer( GetAllocationInfo().FTexChannels, GL_FLOAT, 0, FTexCoords[Tex] );
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
}
else
{
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
}
}

glVertexPointer(3, GL_FLOAT, 0, FVertices + VertexOffset );

GetIndices()->DrawElements( GetPrimitive(), GetActiveVertexCount() );Maybe something is bad here?

Korval
06-13-2007, 11:53 PM
FrameBuffer->GetRendererExtensions()->GetMaxTextureUnits()Is that actually making a glGet* call? If so, that's generally not a good idea. If you really need to ask this every time you render something (hint: you don't), then cache the value.

I see a lot of gl*Pointer calls, but I don't see any calls that actually bind the VBO(s) containing the vertex data. What does that look like?

zed
06-14-2007, 03:32 AM
man, no offense but thats some terrible shading shader
fix the ****, before u bitch
about performance

keep it real, nuf said

btw with
VBO format is: vertex (3 floats) + normals (3 floats) + texcoords (2 floats)
how is it possible to do bumpmaping with this (unless u use a fixed tangent of vec3(1,0,0) etc)



In 320x240 FPS increases up to 25 FPS (VS 20 FPS in 800x600). Seems i'm not fill-rate limited.thus less pixels need to be drawn + fps increases + youre !not! fillrate limited (run it by me again), just joking :)
personally i think youre drawing too many triangles per terrain bacth, from my testing 33x33 or 65x65 verts is about ideal wrt performance (perhaps 129x129 if u have ****e lighting, ie not spot or point lights)

edit-
(for the love of the lord, jimi hendrix, this sticking of asterixs instead of words is childish)

HellKnight
06-14-2007, 04:16 AM
Originally posted by Korval:

Most applications will have to call it once per rendered object. Having 11 calls suggests having 11 objects, which is not alot.
Maybe I'm terribly wrong but as far as I can recall most VBO setup work is done in the glVertexPointer call. I don't see why you need to respecify the VBO format every frame.



I can render loads more than 330 polygons per second using immediate mode, each of which implicitly needs to do whatever glVertexPointer does. I don't think rendering a polygon in immediate mode requires calling glVertexPointer at all. I guess there's an internal VBO for that case, the driver just needs to glBufferSubData into it every frame.

Sergey K.
06-14-2007, 08:19 AM
Is that actually making a glGet* call?It isn't. All this values are precached.



but I don't see any calls that actually bind the VBO(s)Here they go:


void clVBOVertexArray::FeedIntoGPU() const
{
glBindBufferARB(GL_ARRAY_BUFFER_ARB, FID);

clVertexArrayFeeder::FeedIntoGPU();

// clean-up
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
}Code for the clVertexArrayFeeder::FeedIntoGPU() method was actually posted before and the call GetIndices()->DrawElements() looks like:


void clVBOElementsArray::DrawElements(Lenum Primitive, int ActiveVertexCount ) const
{
UnLock();

int VertexCount = ( this == VAManager->GetCommonIndices() ) ? ActiveVertexCount : FCount;

glDrawElements( Primitive,
VertexCount,
FShortIndices ? GL_UNSIGNED_SHORT : GL_UNSIGNED_INT,
0 );

FrameBuffer->UpdateStats( Primitive, VertexCount );
}FShortIndices is always FALSE for now.

Yet another piece of code mentioned here:


void clVBOElementsArray::UnLock() const
{
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, FID );

if (!FLocked) return;

FATAL( glUnmapBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB)==GL_ FALSE,
"Unable to unmap GL VBO elements buffer");

FIndices = NULL;

FLocked = false;
}FLocked is always FALSE in this case.

Korval
06-14-2007, 10:38 AM
Maybe I'm terribly wrong but as far as I can recall most VBO setup work is done in the glVertexPointer call.Because there's only one set of vertex pointer state?

To render something out of a VBO you need to do the following steps:

- Bind the VBO.
- Set the gl*Pointers with offsets for each of the attributes you need to use.
- Call glDraw*.

For each object. Unless you're drawing the same object over and over again, you need to call gl*Pointer for each render when you change the offsets.


For Sergey:

I'd download glIntercept and get a log of the OpenGL calls you're making. It looks very much like something pathological is going on, but the code alone isn't enough to be able to tell what's going on.

V-man
06-14-2007, 07:16 PM
I make more calls to glBindBuffer and gl****Pointer that what he has and I have a inferior GPU. I can get from 100 to 200 FPS.
The less calls the better but 11 is nothing to worry about.

Release your exe and maybe others can test it for you.

tamlin
06-16-2007, 01:14 PM
Ysaneya wrote:
Indices, unsigned int. Anything else is dangerous.Depending on hardware, I'd say anything larger than unsigned short is "dangerous" for performance. Probably not applicable for this case, but I wanted to add it for completeness.

yooyo
06-18-2007, 04:18 PM
Shaders have way to much varying's. Try to reduce it somehow. Try to test with simple shader.