Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 2 of 2 FirstFirst 12
Results 11 to 19 of 19

Thread: Performance Question

  1. #11
    Junior Member Regular Contributor
    Join Date
    Mar 2002
    Location
    NI, Germany
    Posts
    114

    Re: Performance Question

    Jens Scheddin: As you were mentioning, the 0-1-2-2-1-3 order takes two vertex transformations to draw a triangle but the method in that link states "only 1 vertex for every 2 triangles needs to be computed".
    hehe, right. some time passed since i worked with strips . i gave it up because it wasn't better than plain triangles for me, IIRC.

  2. #12
    Member Regular Contributor
    Join Date
    Aug 2003
    Location
    France
    Posts
    299

    Re: Performance Question

    To Jens :
    1/ HUD optimizations could be
    - Alpha test instead of alpha blend
    - In case of full screen HUD, split in separate quads where alpha is not 0.0 (or under alpha test treshold)

    2/ There are 2 vertex cache, a pre transofrm, and a post transform. If you can optimize for the pre transform, you optimize as well for the post transform (makes sense, right ?). Tri strips are only a post transform optimization, remapping the buffer indices is a pre transform optimization, and it made my test app frame rate raise by a factor of 2.

    To JotDot :
    Again, I agree with Ysaneya, but the bottleneck problem is even more complex, as you might have multiple bottelnecks in a single frame. The first example that comes to mind is a bad CPU / GPU parallelisation.
    And Tom is right, you will never optimize anything with a 500+ fps. The same test app with 200K polys (or a million as Tom suggested) could do it, and you should be aware that fog, lighting, normalization options and other OGL states can drastically reduce performance on such test. Be sure you never switched them on, or if your final app needs to use these, be sure to enable them as soon as possible.

    SeskaPeel.

  3. #13
    Intern Newbie
    Join Date
    Jun 2003
    Posts
    32

    Re: Performance Question

    Ysaneya: I agree with you completely - plus I was oversimplifying it. I thought at the time I shouldn't be too technical. For example, the glClear I make for the z buffer is a "fixed overhead per frame" - I can't take a straight linear formula. As pointed out quite nicely there are many more issues than what I just mentioned.

    Tom: Yes, for a fillrate limited application I should definitely be focusing on reducing overdraw. I was just surprised that it wasn't clearly fillrate limited (in my view) - which I was aiming for. **This is the first time I am really trying hard to push the card.** In the past I have explored different avenues for reducing overdraw. Heck, years ago I even wrote a portal system with a software renderer back in the good old days when my 2 meg S3 Virge was a "leading contender". Now that was a major exercise! Fun though

    Jens: Thanks for your input! I was asking about strips since I was aware it might become a pain in the butt to use them. I didn't want to bother with that route unless people thought there were good performance benefits.

    SeskaPeel: Thanks! I never really thought about the hud optimizations. Of course a write is better than a read-modify-write any day. Now thinking of it - my simple test does have yet another "flaw": I am blending the text and thus needs a r-m-w which raises sync issues that were point out (thus affecting results). I probably am more fillrate limited than I initially suspected. About your second point: I never thought about the pre-transform in that fashion. Thanks I will keep that in mind.

    I really appreciate everyone's input. It has given me lots to think about

  4. #14
    Junior Member Regular Contributor
    Join Date
    Mar 2002
    Location
    NI, Germany
    Posts
    114

    Re: Performance Question

    Originally posted by SeskaPeel:
    To Jens :
    1/ HUD optimizations could be
    - Alpha test instead of alpha blend
    - In case of full screen HUD, split in separate quads where alpha is not 0.0 (or under alpha test treshold)

    2/ There are 2 vertex cache, a pre transofrm, and a post transform. If you can optimize for the pre transform, you optimize as well for the post transform (makes sense, right ?). Tri strips are only a post transform optimization, remapping the buffer indices is a pre transform optimization, and it made my test app frame rate raise by a factor of 2.
    Hmm, never thought about those HUD optimizations. I'll try it for myself. About rendering strips: we'll, i have to say that my geometry is probably not optimal for triangle strips (aprox. 5 triangles per surface due to BSP based indoor data), so for terrain rendering this might be a different situation. I really like this board because theres always something to learn like there are two vertex caches

    (Besides, one interesting thing i found out today is that far cry has a OpenGL renderer, too. just set r_driver to OpenGL...)

  5. #15
    Intern Contributor
    Join Date
    Feb 2002
    Posts
    73

    Re: Performance Question

    I'm not happy with the expression "I'm fill limited" or "transform limited". Most applications usually have several bottlenecks within the same frame, so you usually don't get "free stuff" by increasing workload for the presumably non-bottleneck stages.

    Take a normal view of a tesselated terrain, for example (not from above). The triangles near you will be fill-limited, and triangles away from you will be transform-limited (except if all triangles are smaller than the "ideal triangle").

    The reason is that the post-transform caches are still too small to provide proper load-balancing in most applications...

    Michael

  6. #16
    Intern Newbie
    Join Date
    Jun 2003
    Posts
    32

    Re: Performance Question

    The first thing I noticed about that article I mentioned above (pertaining to tri strips) - is that the row size specified is 16. I thought that was a bit small. (But I'm no expert in this field.)

    Part of what I am doing is terrain with vlod. I realized that I should try to reduce the number of batches sent. I decided to start with somewhere around 16x16 patches and experiment from there. If I decided to use strips, I was surprised when I discovered that I would have to use degenerates inside a patch if I wanted anything larger - not simply to stitch patches together in one batch.

    Yes the expressions "fill limited" and "transform limited" are a bit over simplified. The GPUs are becoming much more complex which makes it tougher to simply put any single label on one problem.

    This sure is different than when I played around with software rendering. Back then, all I needed to do is examine my code, rewrite a subroutine or two, and possibly bring out the assembler

  7. #17
    Member Regular Contributor
    Join Date
    Aug 2003
    Location
    France
    Posts
    299

    Re: Performance Question

    Once and for all about tristrips : except in some specific cases, don't expect anything about it. The real fight is about pretransform cache, and remapping buffer indices (should I explain what it is ?) do the job near to perfectly. What's more, it's a generic case method, and works even better when you use heavy vertex structure (position, normal, color4, tangent, 4 distinct textures channels - diffuse, detail, lightmap, normal map - morph targets, matrix skinning index, ...). Still waiting for a cache that supports multiple rendering pass.

    SeskaPeel.

  8. #18
    Intern Newbie
    Join Date
    Jun 2003
    Posts
    32

    Re: Performance Question

    SeskaPeel: Yes, I am starting to think along the same lines regarding the tri strips.

    I tried last night to search for more info on the pretransform cache but had not much luck so far. If you have the time, any additional info/insite/links about it would be quite useful. Right now, I am unclear as to whether or not I am doing things as efficiently as possible in regards to the pretransform cache - but I am still at the point where I could easily adapt my code if needed.

    So far I understand the pre-TnL cache is "between" the video memory and the T&L unit (makes sense). I don't have any references to the size of that cache. Is this where the GL_MAX_ELEMENTS_VERTICES / INDICES come into play? Since the cache is smaller than what you can throw at the video card, it does make sense that the vertices should be re-ordered so that we would minimize cache misses.

    Other than that, this is all I got so far. Any corrections / additions / hints would be quite useful.

    Thanks for your time

  9. #19
    Member Regular Contributor
    Join Date
    Aug 2003
    Location
    France
    Posts
    299

    Re: Performance Question

    The one and only resource :
    NVTriStrip.lib implementation.
    I wish you good luck,

    SeskaPeel.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •