3D Engine: Render optimization basics?

Hi,

I’m making a 3D game. I need to know what techniques to use for the different objects that get rendered. Emphasis on speed, of course.

I did a search in both this and the beginner forum and mainly found older threads, so I thought I’ll make a new topic. Seems things (hardware) have changed since 2001. I also pondered if I should post on gamedev.net instead but I have a feeling I’d get a more profound answer here, and the questions aren’t specifically only game related.

  1. moving objects (player/monsters/etc)… I think a vertex array would be nice for these (?) I’m intending to use keyframe animation, the meshes are static. Although if I’m going to interpolate between some keyframes (QuakeII style (?)) they aren’t static anymore … should I use something else if the vertices aren’t static ?

  2. Static objects like the “world”, walls, floors etc. Is a vertex array good for this, or is a display list faster ?

  3. What kind of methods are there? Vertex arrays (compiled?), display lists (are there compiled versions of those too?), the normal glBegin(GL_TRIANGLES) …

  4. What are the most expensive render state changes that I should avoid? I am now sorting my triangles by texture id/name. Is there anything else I should know ? How about GL_DEPTH_TEST and GL_LIGHTING and things like that ? I’m not going to use shadows or anything more exotic than simple blending and some fog, this is only my second little game in OpenGL ever so I try to keep my feet on the ground

I do not want to use any hardware-dependent extensions like any NV or ATI exts. ARB and such “generic” extensions are fine, though.

Thanks for any help,

Andru

This should give you some idea of relative performance of state changes. The higher the number the slower they are. These numbers are from a GF5900U, but the numbers were similar on a GF4. The only thing that seems dependent on the graphics card was glViewport.

glEnable/Disable
GL_ALPHA_TEST 57
GL_BLEND 144
GL_COLOR_MATERIAL 218
GL_CULL_FACE 69
GL_CLIP_PLANE0 113
GL_DEPTH_TEST 194
GL_DITHER 78
GL_FOG 51
GL_LIGHTING 319
GL_LINE_STIPPLE 60
GL_LINE_SMOOTH 117
GL_LOGIC_OP 80
GL_NORMALIZE 71
GL_POINT_SMOOTH 46
GL_POLYGON_SMOOTH 66
GL_POLYGON_STIPPLE 68
GL_SCISSOR_TEST 830
GL_STENCIL_TEST 55
GL_TEXTURE_1D 92
GL_TEXTURE_2D 96

Other
glAlphaFunc 19
glBlendFunc 55
glColorMask 122
glCullFace 30
glDepthFunc 42
glDepthMask 41
glDepthRange 77
glDrawBuffer 69
glScissor 397
glViewport 412
glBindTexture 93

I would render all geometry using vertex arrays with VBO. Keep your state changes to a minimum and render as much geometry with as few calls as possible. You have already reduced your bind texture calls but you might want to consider packing your textures into fewer bigger textures to reduce your bind texture calls further.

Where possible, render front to back, this will save you some fillrate.

[This message has been edited by Adrian (edited 07-24-2003).]

Thanks for this states change stats. But can you explains how you benchmark it? And waht’s the unit?

I mean how can you be sure you benchmark the state change and not only the state change order?

Sorry for my english.

This should give you some idea of relative performance of state changes. The higher the number the slower they are. These numbers are from a GF5900U, but the numbers were similar on a GF4. The only thing that seems dependent on the graphics card was glViewport.

Do you have some program to do this? I’d like to see what my Radeon gives me.

I called each function 1000x, the figures are the total time in us(microseconds).

You can download the program from here http://www.adrian.lark.btinternet.co.uk/GLBench.htm

The program reports the numbers as ms but I mean us.

The source code is here http://www.adrian.lark.btinternet.co.uk/GLBenchSource.zip

Maybe some of the benchmarks are slightly flawed but I think it’s better than nothing.

Thanks for that benchy info, Adrian. Very helpful.

Could someone please elaborate on the vertex array/VBO/etc subject… what are my alternatives, there’s so many abbreviations out there I’m kinda confused: VA/VAR/VBO/CVA/… ? A small roundup of pros and cons for each would be helpful

Some of those state changes seem alarmingly slow. I wonder how some can take so long when, given how they scale to CPU speeds, they seem to be largely a matter of CPU work. I never thought the driver would spend so long on some of those operations.

Andru,
VA = vertex array.
VAR = vertex array range. NV specific, superceeded by VBO (yes, I’m getting similar perf now).
CVA = compiled vertex array. I’m not very up-to-date on this, I couldn’t make much of it a couple of years back. May be good for some multipass methods.
VBO = vertex buffer object. The new vendor indipendant extension for fast memory, use it. Some issues on current drivers.

Originally posted by Madoc:
Some of those state changes seem alarmingly slow.

Ignore the ms displayed by the program, it should say us which is millionths of a second. So most take about 100us for 1000 calls, so .1us per call.

Originally posted by Andru:
[b]Hi,

  1. moving objects (player/monsters/etc)… I think a vertex array would be nice for these (?) I’m intending to use keyframe animation, the meshes are static. Although if I’m going to interpolate between some keyframes (QuakeII style (?)) they aren’t static anymore … should I use something else if the vertices aren’t static ?
    [/b]

If you render in multipass then VBO is good even if you send lerped data.
If you render in one pass then VBO might be good because you send the vertex data in a very efficient way.
So use VBO if you have more than a few vertices.
If you don’t need the lerped data in the main memory (you’ve wrote that you don’t use stencil shadow), then you can lerp with vertex program and able to use static data all time.

Csiki

Thanks guys, I’ll look into Vertex Buffer Objects. You were very helpful!

Andru

Originally posted by Andru:
[b]Hi,

<snip>

  1. moving objects (player/monsters/etc)… I think a vertex array would be nice for these (?) I’m intending to use keyframe animation, the meshes are static. Although if I’m going to interpolate between some keyframes (QuakeII style (?)) they aren’t static anymore … should I use something else if the vertices aren’t static ?
    </snip>

Andru[/b]

you would be better off using skinned characters rather than lerping static meshes if speed is your main concern.

if you combine this with VBO and vertex programs, you will have to ‘clusterize’ the vertices based on bone indices, for example if you have 48 bones in a model, you divide the bones into groups that will fit into your available vertex program constant registers, say for example 16 bones (64 constants) per group, then remap the bone indices in the model to accomodate the cluster groupings. then you can draw an entire tristrip that matches those 16 bones (per material that is). you occasionaly may end up with a bone in multiple clusters, but that overhead is still pretty low.

another step past this is to do bone-weighting, so that every vertex can belong to more than one bone, with a weight per bone, say vertex 4 is a child of bone 3 (arm) with a weight of .5 and a child of bone 4 (elbow) with a weight of .5, all weights should add up to 1.0

hope this helps.

mtm

Originally posted by tweakoz:
[b] you would be better off using skinned characters rather than lerping static meshes if speed is your main concern.

if you combine this with VBO and vertex programs, you will have to ‘clusterize’ the vertices based on bone indices, for example if you have 48 bones in a model, you divide the bones into groups that will fit into your available vertex program constant registers, say for example 16 bones (64 constants) per group, then remap the bone indices in the model to accomodate the cluster groupings. then you can draw an entire tristrip that matches those 16 bones (per material that is). you occasionaly may end up with a bone in multiple clusters, but that overhead is still pretty low.

another step past this is to do bone-weighting, so that every vertex can belong to more than one bone, with a weight per bone, say vertex 4 is a child of bone 3 (arm) with a weight of .5 and a child of bone 4 (elbow) with a weight of .5, all weights should add up to 1.0

hope this helps.

mtm[/b]

I forgot to add , once this clusterizing is done, you essentially have a ‘deformable’ static vertex buffer, so it can go into VRAM and stay there, just update the bone matrices to animate the character, this works especially well if you are instancing hundreds of the same character, just update matrices and redraw static vertex buffer.

mtm