Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 5 of 5

Thread: slow local arrays in GLSL

  1. #1
    Junior Member Newbie
    Join Date
    Mar 2011
    Location
    Australia
    Posts
    21

    slow local arrays in GLSL

    Hi,
    I'm surprised at the cost of declaring arrays in GLSL programs.
    The array doesn't need to be initialized, all I do is write to a random element and read from the same one (so the array can't be optimized out by the compiler). Simply declaring a larger array causes a significant slowdown, which seems to increase linearly with size.

    I would quite like to know why this happens. What is the GPU doing which takes longer? Are there any tricks to circumventing this (the array cannot be constant/uniform buffer)?

  2. #2
    Super Moderator OpenGL Lord
    Join Date
    Dec 2003
    Location
    Grenoble - France
    Posts
    5,580

    Re: slow local arrays in GLSL

    Each execution pipeline has to carry its own block of memory, so I am not surprised it costs more with bigger sizes.

    But I do not understand how working on a single cell of a big uninitialized array could be useful ?

  3. #3
    Advanced Member Frequent Contributor
    Join Date
    Dec 2007
    Location
    Hungary
    Posts
    985

    Re: slow local arrays in GLSL

    The reason behind this is that usually a single processing core of a GPU can execute as many threads as there is register memory. The threads don't actually run in parallel, but when a thread is scheduled out, e.g. in order to hide the latency of a texel fetch, another thread is executed. This way the cores are kept busy all the time.

    However, if you have large local memory usage, i.e. register memory usage, that means that less threads can be executed concurrently on a single core, thus performance is decreased.

    As a generic guideline, one shall always use the minimum possible register memory in a shader to ensure high performance.
    Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
    Technical Blog: http://www.rastergrid.com/blog/

  4. #4
    Junior Member Newbie
    Join Date
    Mar 2011
    Location
    Australia
    Posts
    21

    Re: slow local arrays in GLSL

    Thank you both for your replies.

    Working on a single cell was to stop the compiler optimizing the array out, so I could confirm the array size was my problem, and not the operations I was performing on it.

    Just to clarify, from what you say there are two issues. The first, threads use a shared pool of register memory. When the total memory used overflows, threads get dropped. I assume this mean cores/"stream processors" become inactive (not sure of the terminology). I would assume the amount of memory and is hardware specific but it would be possible to work out the maximum amount of memory usable before concurrent threads are reduced.

    The second issue is the thread memory block being copied back and forth during global memory operations, such as an imageLoad(). In the example above, this shouldn't happen. However, if I were to fill the local array from global memory these copy operations would delay the next set of threads from running.

    Is there a way I could manipulate the cache to store my array data, avoiding these issues? For example, if OpenGL knows each thread operates on its own block of global data, that data does not need to be coherent and can be stored/operated on in cache. Should this happen anyway, if I simply operate on the data directly with image load/store (I've tested with various image unit modifiers and had no luck so far)?

  5. #5
    Junior Member Newbie
    Join Date
    Sep 2011
    Location
    China
    Posts
    29
    I also encountered this problem several days ago, and spent several days to optimize the algorithm to minimize registers usage. Occasionally I found that nv had updated the driver(301.24,beta) for geforce, so downloaded it and setup and suddenly..... my program got a 5 times speedup on gtx570 over the original version. try it.

    But it is still slow on other cards (fx 3800 and quadro 4000). so my work on optimization is still useful

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •