Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 10 of 32

Thread: Vertex cache optimization

Hybrid View

  1. #1
    Advanced Member Frequent Contributor _NK47's Avatar
    Join Date
    Mar 2008
    Posts
    574

    Vertex cache optimization

    I read some information about cache optimization before writing here still would like to know how is everybody dealing with it. seems like indexed triangle list is the way to go on todays gfx cards but im kindof lost on how to exactly optimize for the cache (pre/post). indices in correct order is self-explanatory, anybody has experience in this to share whats most important and how to achieve good results?

  2. #2
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    2,891

    Re: Vertex cache optimization

    Quote Originally Posted by _NK47
    cache optimization...would like to know how is everybody dealing with it...anybody has experience in this to share whats most important and how to achieve good results?
    We've been using Tom Forsyth's Linear-Speed Vertex Cache Optimisation. Very fast, yields good results, easy to write, and doesn't care about the topology of what you're trying to render.

    Don't even bother with NVTriStrip. It's slow, has some bugs, and doesn't yield very optimal results by comparison (at least that was the case when I tried it).

    If you're rendering regular grid's check out Ignacio Castaņo's Optimal Grid Rendering. More on that here and Forsyth's comments on that here.

    And regardless, check out the ACMR (average cache miss ratio) stats on Ignacio's page. Also see his spiel on why ACMR might not be the absolute best metric to use (but is better than most), suggesting average transform to vertex ratio (ATVR) instead.

    While we're on the subject, this page is a humorous but informative must-read.

    Also regarding your "how to exactly optimize for the cache (pre/post)" query, the above links will probably make it obvious. But the top-priority cache to optimize for is the post-vertex shader cache (aka post-T&L cache), because it lets you skip whole vertex shader runs! It's basically a FIFO cache of vertex shader outputs. The pre-vertex shader cache (aka pre-T&L cache) is nothing more than just a LRU memory prefetch cache of vertex attribute data (vertex shader inputs). So whenever you see someone state "optimize for the vertex cache" they're typically talking about the post-vertex shader cache.

  3. #3
    Advanced Member Frequent Contributor _NK47's Avatar
    Join Date
    Mar 2008
    Posts
    574

    Re: Vertex cache optimization

    wow, thats quite some info, thanks Dark Photon! many of the links i read before, wanted to know more before going into practice. all links help alot guess i will take another look at Linear-Speed Vertex Cache Optimisation and start coding.

  4. #4
    Super Moderator Frequent Contributor Groovounet's Avatar
    Join Date
    Jul 2004
    Posts
    936

    Re: Vertex cache optimization

    One other point to explore it fragments optimization too.

    The idea is to sort the triangles so that the first triangle in the list are the more probable to be seen so that the z-test will then discard more fragments ...

    I haven't explore this territory yet but it's definitely the next level of vertex cache optimisation. ^_^

  5. #5
    Super Moderator OpenGL Lord
    Join Date
    Dec 2003
    Location
    Grenoble - France
    Posts
    5,655

    Re: Vertex cache optimization

    @Groovounet: Huh ?
    Unless I missed something, this has nothing to do with vertex cache ! Z-test can not help for vertices. This only optimize away complex fragment shader operations, which is valuable, but different.

  6. #6
    Senior Member OpenGL Pro Ilian Dinev's Avatar
    Join Date
    Jan 2008
    Location
    Watford, UK
    Posts
    1,262

    Re: Vertex cache optimization

    Imho, using the strengths of cards with unified shaders and z-buffer facilities will yield much better results than optimizing strips and indexes forever.
    Initial depth-pass with possibly helper-quads that are a simplified versions of the walls in a level (wall indexes are z-sorted on cpu) to quickly lay-down rough (but nice for compression) Z values. This will limit fragment-overdraw to 4x. Triangles can be quickly culled before the gpu attempts to generate any fragments. Move calculations from vert to frag shader. GPUs have inherent limit to number of triangles they can setup per cycle (1), so you can overtake all units for fragments.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •