Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 9 of 9

Thread: Xorg VRAM leak because of Qt/OpenGL Application

  1. #1
    Junior Member Newbie
    Join Date
    Jul 2018
    Posts
    4

    Xorg VRAM leak because of Qt/OpenGL Application

    Hello board,

    I am working on a complex Qt/OpenGL Application.
    Xorg starts leaking in VRAM when i'm using the application and never release the memory, until I restart X of course.

    Code :
    $ nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
    | N/A   46C    P8     4W /  N/A |     50MiB /  4040MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
     
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    0     29628      G   /usr/lib/xorg-server/Xorg                     47MiB |
    +-----------------------------------------------------------------------------+
    $ ./myOpenGLQtBasedApp ... doing graphic stuff then exiting
    $ nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
    | N/A   46C    P8     4W /  N/A |    110MiB /  4040MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
     
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    0     29628      G   /usr/lib/xorg-server/Xorg                    107MiB |
    +-----------------------------------------------------------------------------+


    The version of Xorg does not matter, tested a few.
    The version of the driver does not matter, as long as it's nvidia, tested 340, 384, 390
    The linux distribution does not matter, tested Ubuntu 16.04, 18.04, fedora
    The de does not matter, tested Unity, Gnome-shell, Xfce, Lxde + Compton, Openbox + compton
    The compositor used does not matter, but the leak disappear without a compositor.
    I did not test Wayland.

    Do you know what could cause this behavior ?
    Could this be due to OpenGL Sharing Context ?
    If yes, where and how could it be implemented in our application or in Qt ? Could we force OpenGL not to share anything between processes ?
    If not, what could, in our code create this behavior ?

  2. #2
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,546
    Quote Originally Posted by mwestphal View Post
    I am working on a complex Qt/OpenGL Application.
    Xorg starts leaking in VRAM when i'm using the application and never release the memory, until I restart X of course.
    ...
    The compositor used does not matter, but the leak disappear without a compositor.
    That's not too surprising. I was just about to suggest that you disable the compositor. Without that, X itself shouldn't consume much VRAM.

    Do you know what could cause this behavior ?
    Compositors use the GPU for composite rendering, so you should expect more GPU memory consumption.

    Does consumption grow every time you run your up or is there an upper bound?
    What's the max Xorg consumption you've seen when running with the compositor (and without)?
    Have you ever run out of VRAM because of this?

    It could be a leak, but it could also just be unused GPU memory pooled by the compositor or X via the NVidia driver (e.g. scratch texture memory).

    For complete control of your GPU (performance, memory consumption, VSync, latency, etc.), just disable the compositor, and be glad that you have that option. On Windows, DWM (its compositor) has been the cause of needless GPU usage limitations since Vista and now can't be disabled.

    If you can't or don't want to just disable the compositor (and IFF this VRAM consumption has become a problem), I'd figure out how to determine 1) exactly what memory is being allocated on the GPU for what purpose and by whom, and 2) what configuration controls in the compositor and X allow you to reduce or at least bound that memory usage.
    Last edited by Dark Photon; 07-02-2018 at 09:38 AM.

  3. #3
    Junior Member Newbie
    Join Date
    Jul 2018
    Posts
    4
    Does consumption grow every time you run your up or is there an upper bound?
    There is no upper bound, Xorg VRAM usage keeps growing as long as I run, use my application and close it

    What's the max Xorg consumption you've seen when running with the compositor (and without)?
    With the compositor, 96% of the VRAM
    Without, Stays around 30% when running graphic intensicve stuff on my application

    Have you ever run out of VRAM because of this?
    Actually, once the VRAM reach 96%, Xorg starts to leak in RAM, once the RAM is full, OpenGL starts failing completelly

    It could be a leak, but it could also just be unused GPU memory pooled by the compositor or X via the NVidia driver (e.g. scratch texture memory).
    That was out first guess, but it appears that the memory is never released, even when running out of it.

    For complete control of your GPU (performance, memory consumption, VSync, latency, etc.), just disable the compositor, and be glad that you have that option. On Windows, DWM (its compositor) has been the cause of needless GPU usage limitations since Vista and now can't be disabled.
    This is indeed a temporary solution.
    Funny enough, on windows we dot not see this leak at all.
    Of course we can't ship the application this way and expect our users to disable their compositor

    If you can't or don't want to just disable the compositor (and IFF this VRAM consumption has become a problem), I'd figure out how to determine
    1) exactly what memory is being allocated on the GPU for what purpose and by whom, and
    How can we do that ? Apitrace was not able to find any leak.
    I would be happy to use any gpu profiling tools, but could not find any that could track memory allocation.

    2) what configuration controls in the compositor and X allow you to reduce or at least bound that memory usage.
    I tried that with xfwm compositor, disabling everything exept the compositor itself and it still leaked.
    If you have a compositor to suggest that i could test with, that would be great.

  4. #4
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,546
    Quote Originally Posted by mwestphal View Post
    > Does consumption grow every time you run your up or is there an upper bound?

    There is no upper bound, Xorg VRAM usage keeps growing as long as I run, use my application and close it

    > What's the max Xorg consumption you've seen when running with the compositor (and without)?

    With the compositor, 96% of the VRAM
    Without, Stays around 30% when running graphic intensicve stuff on my application
    Actually, once the VRAM reach 96%, Xorg starts to leak in RAM, once the RAM is full, OpenGL starts failing completelly
    And you have 4GB of GPU memory (GTX1050 Ti, presumably)? That's some leak!

    > For complete control of your GPU (performance, memory consumption, VSync, latency, etc.), just disable the compositor,
    This is indeed a temporary solution.
    Of course we can't ship the application this way and expect our users to disable their compositor
    Makes sense. Not many apps can assume they have complete control of the system configuration.

    Funny enough, on windows we dot not see this leak at all.
    That's an interesting data point.

    Question: You mentioned shared context and multiple processes. Does your application create multiple threads or processes? And just to make sure, you're only talking about residual Xorg VRAM consumption after all of your threads/processes have been killed and are not running, right?

    The fact that there aren't a lot of others reporting this problem does tend to suggest that your app's behavior might somehow be instigating this, or this is a compositor bug for a less commonly used window manager.

    I would be happy to use any gpu profiling tools, but could not find any that could track memory allocation.
    I haven't gone looking for any GUI GPU Profiling tools. There is "nvidia-settings". Click on the GPU 0 tab at the left and it'll display your total and allocated GPU memory. It doesn't show you how much GPU storage has been evicted from GPU memory back to CPU memory though (which is what happens when you run out of GPU memory).

    If you want to see that, you can write a very short, simple GL program using NVX_gpu_memory_info. With this, you can query and log to the console how much GPU memory is still available (and how much GPU storage has been evicted back from GPU memory to CPU memory), emitting new consumption and evicted numbers every time they change.

    Either way gives you gives you a simple tool to play around with your application and the window manager (move/resize/push/pop windows, etc.) to see what actions seem to be triggering causing the leakage.

    I tried that with xfwm compositor, disabling everything exept the compositor itself and it still leaked.
    If you have a compositor to suggest that i could test with, that would be great.
    I would suggest using KDE, as it gets a lot of testing and use. GNOME is another common one.

    Does this problem only happen with your app? If so, besides looking for X and window manager settings to configure their memory usage, another option to consider is to whittle down your app (disabling code) until the problem goes away. Then you'll have a pretty good idea as to what your app is doing to instigate this problem.
    Last edited by Dark Photon; 07-02-2018 at 12:50 PM.

  5. #5
    Junior Member Newbie
    Join Date
    Jul 2018
    Posts
    4
    Question: You mentioned shared context and multiple processes. Does your application create multiple threads or processes? And just to make sure, you're only talking about residual Xorg VRAM consumption after all of your threads/processes have been killed and are not running, right?
    Our application run on a single thread. My question was more of a suggestion around the line of : is my OpenGL application sharing stuff with Xorg wich does not get released after ?

    The fact that there aren't a lot of others reporting this problem does tend to suggest that your app's behavior might somehow be instigating this, or this is a compositor bug for a less commonly used window manager.
    We reproduce with many desktop env and many compositors, including Kwin, so probably not.
    However we do have a specific design on the Qt side, using QOpenGLWindow in a specific way that may not be used universally
    So a bug (in Qt?) may not be impossible.

    I haven't gone looking for any GUI GPU Profiling tools. There is "nvidia-settings". Click on the GPU 0 tab at the left and it'll display your total and allocated GPU memory. It doesn't show you how much GPU storage has been evicted from GPU memory back to CPU memory though (which is what happens when you run out of GPU memory).
    Thanks to you, I know now why it stars leakings in the RAM after exhausting the VRAM, we still needs to figure out the initial issue.

    If you want to see that, you can write a very short, simple GL program using NVX_gpu_memory_info. With this, you can query and log to the console how much GPU memory is still available (and how much GPU storage has been evicted back from GPU memory to CPU memory), emitting new consumption and evicted numbers every time they change.
    So I've used one I found here : https://mail.kde.org/pipermail/kde-f...ry/021312.html
    It is great, It is way more precise than nvidia-smi.
    However, regarding the evicted memory, it appears only when my VRAM is exhausted, so not ultra usefull.

    Also I stumbled upon this : https://www.phoronix.com/scan.php?pa...query_resource
    And this : https://developer.nvidia.com/designw...r-opengl-usage
    And this : https://github.com/NVIDIA/nvidia-query-resource-opengl

    So this could help, but it gives me, even for a simple example like glxgears:
    Code :
    Error: failed to query resource usage information for pid 30714
    Could you test this on your side ?

    Either way gives you gives you a simple tool to play around with your application and the window manager (move/resize/push/pop windows, etc.) to see what actions seem to be triggering causing the leakage.
    Indeed, it allowed me to identify that the leak appears only if I close one of my QOpenGLWindow.

    I would suggest using KDE, as it gets a lot of testing and use. GNOME is another common one.
    As said before, it appears in every single de with a compositor.

    Does this problem only happen with your app? If so, besides looking for X and window manager settings to configure their memory usage, another option to consider is to whittle down your app (disabling code) until the problem goes away. Then you'll have a pretty good idea as to what your app is doing to instigate this problem.
    Indeed, we have already doing a pass on this and where unsuccessful, we will try again !

  6. #6
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,546
    Quote Originally Posted by mwestphal View Post
    Also I stumbled upon this : https://www.phoronix.com/scan.php?pa...query_resource
    And this : https://developer.nvidia.com/designw...r-opengl-usage
    And this : https://github.com/NVIDIA/nvidia-query-resource-opengl

    So this could help, but it gives me, even for a simple example like glxgears:
    Code :
    Error: failed to query resource usage information for pid 30714
    Could you test this on your side ?
    I tested this on Linux and got the same error when I pointed nvidia-query-resource-opengl to a process running OpenGL, and force LD_PRELOADed their shared library into the OpenGL process' image.

    On Windows, I got:

    Code :
    Resource query not supported for 'nv_asm_ex02.exe' (pid 10020)

    which is what I got on Linux before I LD_PRELOADED the shared lib. So it's possible I wasn't running it properly on Windows.

  7. #7
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,546
    I took a few minutes to dig deeper and see what wasn't working properly on Linux.

    The "nvidia-query-resource-opengl" process successfully executes the following, connecting to the client OpenGL process, sending/receiving NVQR_QUERY_CONNECT, and then sending NVQR_QUERY_MEMORY_INFO:

    Code :
       nvqr_connect()                    
         create_client()
         open_server_connection()
         connect_to_server()
            write_server_command()       # NVQR_QUERY_CONNECT ->
            open_client_connection()
            read_server_response()
       nvqr_request_meminfo()
         write_server_command()          # NVQR_QUERY_MEMORY_INFO ->

    However, it then fails to receive a valid response to the NVQR_QUERY_MEMORY_INFO in:

    Code :
       nvqr_request_meminfo()
         read_server_response()

    resulting in the "failed query resource usage information" error above.

    On the OpenGL app side (server side), in process_client_commands() it properly responds to the NVQR_QUERY_CONNECT. But then when it receives the NVQR_QUERY_MEMORY_INFO request from the client, it calls:

    Code :
      do_query()
        glXMakeCurrent   ( ctx )
        glQueryResourceNV( GL_QUERY_RESOURCE_TYPE_VIDMEM_ALLOC_NV, -1,
                           4096, data )  # 
        glXMakeCurrent   ( NULL )

    The glQueryResourceNV() returns 0, which is failure. So it ends up sending back an empty reply buffer.

    So what's ultimately causing the failure is glQueryResourceNV() failing on the OpenGL app side.

    Worth trying would be calling glQueryResourceNV() in a stand-alone OpenGL app w/o the client/server socket comms and w/o the 2nd GL context. I haven't done that yet, but plan to.

  8. #8
    Junior Member Newbie
    Join Date
    Jul 2018
    Posts
    4
    The issue was resolved thanks to an intense debugging session.

    This is a Qt issue, caused by our usage of QVTKOpenGLWindow and windowContainer.
    The leak was caused by NULL parenting the parent of the windowContainer containing the QVTKOpenGLWindow just before deletion.

    This code was here before when we used a QOpenGLWidget and it caused no issue. In any case, NULL parenting a widget before deletion is useless so removing the line resolve the issue.

    This leak shouldn't happen though, even in this situation, so I have opened a Qt issue to report it.

    If you managed to fix the nvidia tool, let me know !

    Edit : How do i tag this as solved ?

  9. #9
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,546
    Glad to hear you found a solution.

    Quote Originally Posted by mwestphal View Post
    How do i tag this as solved ?
    No need. We don't close threads or update the thread subject here.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •