Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: Seg fault in Nvidia libGL file

  1. #1
    Junior Member Newbie
    Join Date
    Dec 2017
    Posts
    6

    Seg fault in Nvidia libGL file

    Hi everyone. I am posting here because I cant find a solution to a problem I'm having. I am on a Mint 18.1 xfce desktop with a Nvidia 7300GS video card. Driver 304.135.
    I have checked all the config/system files on my system and cant find anything wrong with them. The problem is two applications open and crash. In my syslog file there is a segmentation fault in libGL.so.304.135 after the crash. I have no other problems with the display. Other OpenGL apps work fine.
    I have switched to a different desktop, same OS though. Reinstalled the driver. FYI, the libGL.so.304.135 file is from Nvidia.
    Card problem or what am I missing? Thanks for reading.

    PS, I dont have this problem using the nouveau driver. 64bit system.

    lrwxrwxrwx 1 root root 10 Mar 17 2017 /usr/lib32/nvidia-304/libGL.so -> libGL.so.1
    lrwxrwxrwx 1 root root 16 Mar 17 2017 /usr/lib32/nvidia-304/libGL.so.1 -> libGL.so.304.135
    -rw-r--r-- 1 root root 833540 Mar 17 2017 /usr/lib32/nvidia-304/libGL.so.304.135
    lrwxrwxrwx 1 root root 14 Aug 10 15:55 /usr/lib/i386-linux-gnu/mesa/libGL.so.1 -> libGL.so.1.2.0
    -rw-r--r-- 1 root root 453128 Aug 10 15:55 /usr/lib/i386-linux-gnu/mesa/libGL.so.1.2.0
    lrwxrwxrwx 1 root root 10 Mar 17 2017 /usr/lib/nvidia-304/libGL.so -> libGL.so.1
    lrwxrwxrwx 1 root root 16 Mar 17 2017 /usr/lib/nvidia-304/libGL.so.1 -> libGL.so.304.135
    -rw-r--r-- 1 root root 1076560 Mar 17 2017 /usr/lib/nvidia-304/libGL.so.304.135
    lrwxrwxrwx 1 root root 14 Aug 10 15:51 /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1 -> libGL.so.1.2.0
    -rw-r--r-- 1 root root 463424 Aug 10 15:52 /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0
    Last edited by prsman; 12-22-2017 at 09:47 AM.

  2. #2
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,215
    We're going to need more info to be able to help you much.

    Looks like you've got two sets of NVidia GL drivers on your system. That's often a recipe for problems like you're seeing.

    NVidia:
    Code :
    lrwxrwxrwx 1 root root 10 Mar 17 2017      /usr/lib32/nvidia-304/libGL.so -> libGL.so.1
    lrwxrwxrwx 1 root root 16 Mar 17 2017      /usr/lib32/nvidia-304/libGL.so.1 -> libGL.so.304.135
    -rw-r--r-- 1 root root 833540 Mar 17 2017  /usr/lib32/nvidia-304/libGL.so.304.135
    lrwxrwxrwx 1 root root 10 Mar 17 2017      /usr/lib/nvidia-304/libGL.so -> libGL.so.1
    lrwxrwxrwx 1 root root 16 Mar 17 2017      /usr/lib/nvidia-304/libGL.so.1 -> libGL.so.304.135
    -rw-r--r-- 1 root root 1076560 Mar 17 2017 /usr/lib/nvidia-304/libGL.so.304.135

    Mesa:
    Code :
    lrwxrwxrwx 1 root root 14 Aug 10 15:55     /usr/lib/i386-linux-gnu/mesa/libGL.so.1 -> libGL.so.1.2.0
    -rw-r--r-- 1 root root 453128 Aug 10 15:55 /usr/lib/i386-linux-gnu/mesa/libGL.so.1.2.0
    lrwxrwxrwx 1 root root 14 Aug 10 15:51     /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1 -> libGL.so.1.2.0
    -rw-r--r-- 1 root root 463424 Aug 10 15:52 /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0

    First, have you gone through the correct steps to disable nouveau so that your NVidia drivers get full-and-clear control of the GPU (websearch: Mint 18.1 switch from nouveau to nvidia drivers)? Have you installed the NVidia drivers properly? If you install them from their run script, it handles searching your system for other potentially conflicting libraries and removing them.

    Some things to check on your system:

    lsmod | egrep -i 'nouveau|nvidia'
    glxinfo | egrep 'OpenGL|glx'

    On an OpenGL program that works fine, try running:

    ldd PROGNAME | grep GL

    for instance:

    ldd `which glxgears | grep GL`

    Now for an OpenGL program which doesn't work fine, try running the same. Do you see a difference in which GL it's linking with?

    You can try running strace or valgrind on a program which doesn't work to see if you can get a line on what it's trying to do when it crashes.

  3. #3
    Junior Member Newbie
    Join Date
    Dec 2017
    Posts
    6

    Seg fault in Nvidia libGL file

    Thanks for the reply, Dark Photon. The Nvidia driver is installed with Mint's Driver Manager, not the .run file. I will do the checks you suggested and report back.
    I had a feeling it might a conflict but I dont enough about Linux to "drill down" into a problem. All the checks I did shows nouveau is not installed. Thanks to you I have a path to drill down on this problem. Will report back. Thanks

    (edit) Well this is weird. The lsmod command for nouveau returns nothing, nvidia returns nvidia. The command glxinfo | egrep returns Nvidia glx, not mesa. Using ldd Program | grep GL returns nothing. I guess the two apps dont use GL.

    I changed out the video card for a newer one, installed the driver with Mints Driver Manager and get the same seg fault in the newer Nvidia libGl file.

    If the ldd command returns nothing for the failing apps which implies they dont use GL, new card and driver, is mesa conflicting?

    kernel: [ 4106.429222] python[3731]: segfault at 8 ip 00007f933155e82d sp 00007ffc5aecea70 error 4 in libGL.so.340.102[7f93314b0000+c7000]
    kernel: [ 4220.778952] mintinstall[3824]: segfault at 8 ip 00007fd5bd0bc82d sp 00007ffd93932620 error 4 in libGL.so.340.102[7fd5bd00e000+c7000]
    Last edited by prsman; 12-23-2017 at 01:30 PM. Reason: update

  4. #4
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,215
    Quote Originally Posted by prsman View Post
    (edit) Well this is weird. The lsmod command for nouveau returns nothing, nvidia returns nvidia.
    That's good. It suggests your nouveau support was probably disabled properly when the NVidia driver was installed.

    The command glxinfo | egrep returns Nvidia glx, not mesa.
    That's good again!

    Using ldd Program | grep GL returns nothing. I guess the two apps dont use GL.
    That's pretty fishy, and suggests they may not be using OpenGL afterall.

    Try it with glxinfo and/or glxgears to confirm that you do see dynamic dependencies on libGL.

    I changed out the video card for a newer one, installed the driver with Mints Driver Manager and get the same seg fault in the newer Nvidia libGl file.

    If the ldd command returns nothing for the failing apps which implies they dont use GL, new card and driver, is mesa conflicting?
    Weird. Well the ldd thing does suggest that those executables do not have a "direct link-time dependency" on OpenGL. However, that doesn't mean that they don't have an "indirect link-time depenency" on OpenGL. (for instance, program -> somelibrary -> libGL.so). You might do an ldd on its dependencies. Or more generally, you might run "env LD_DEBUG=all your_program". That'll give you very verbose information about the decisions the dynamic linker is making about which dynamic libraries to pull in and from where. Greping the output for libGL might be revealing.

    Alternatively, it could be that your program loads libGL.so into memory at runtime after startup via dlopen() / dlsym(). Running "strace | egrep 'dlopen|dlsym'" might help confirm/refute whether they're doing that.

    In any case, somehow they're pulling in libGL.

    Too soon to say whether this has anything to do with conflicting Mesa libs.

  5. #5
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,215
    Code :
    kernel: [ 4106.429222] python[3731]: segfault at 8 ip 00007f933155e82d  sp 00007ffc5aecea70 error 4 in libGL.so.340.102[7f93314b0000+c7000]
    I just noticed the python in your error. Do these crashes you're getting only occur when you're using Python scripts that make use of OpenGL?

    If not, I'd redirect your debugging to built executable images which link with libGL that crash in the NVidia driver.

    If there aren't any, you may just have a problem with your Python OpenGL (PyOpenGL?) bindings finding the correct GL library.

  6. #6
    Junior Member Newbie
    Join Date
    Dec 2017
    Posts
    6
    Hi Dark Photon, thanks for sticking with this.

    Alternatively, it could be that your program loads libGL.so into memory at runtime after startup via dlopen() / dlsym(). Running "strace | egrep 'dlopen|dlsym'" might help confirm/refute whether they're doing that.
    I did not see in the strace where it calls for libGL. Runnig ldd /usr/bin/mintinstall returns: not a dynamic executable. I maybe running this command wrong.
    you might run "env LD_DEBUG=all your_program
    My linux chops are not enough to now how to run this.
    Both apps that crash are python scripts.
    From the strace, one where the app fails:

    --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=5086, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
    close(3) = 0
    wait4(5086, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 5086
    getuid() = 1000
    rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7f55f106a4b0}, {0x53d3a0, [], SA_RESTORER, 0x7f55f1410390}, 8) = 0
    rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7f55f106a4b0}, {SIG_DFL, [], 0}, 8) = 0
    rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
    clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7ffc6da1794c) = 5088
    wait4(5088, OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
    OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
    Segmentation fault
    [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 5088
    rt_sigaction(SIGINT, {0x53d3a0, [], SA_RESTORER, 0x7f55f106a4b0}, NULL, 8) = 0
    rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7f55f106a4b0}, NULL, 8) = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=5088, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
    rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f55f1410390}, {0x53d3a0, [], SA_RESTORER, 0x7f55f106a4b0}, 8) = 0
    brk(0x1833000) = 0x1833000
    brk(0x1831000) = 0x1831000
    exit_group(0) = ?
    +++ exited with 0 +++
    In line 9 after the wait, is where I have to enter my password, it opens and crashes and I get the seg fault in syslog.

    --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=5450, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
    close(3) = 0
    wait4(5450, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 5450
    getuid() = 1000
    rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7f5c6ec584b0}, {0x53d3a0, [], SA_RESTORER, 0x7f5c6effe390}, 8) = 0
    rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7f5c6ec584b0}, {SIG_DFL, [], 0}, 8) = 0
    rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
    clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7ffde08f80fc) = 5452
    wait4(5452, OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
    OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
    Killed
    [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 5452
    rt_sigaction(SIGINT, {0x53d3a0, [], SA_RESTORER, 0x7f5c6ec584b0}, NULL, 8) = 0
    rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7f5c6ec584b0}, NULL, 8) = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=5452, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
    rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f5c6effe390}, {0x53d3a0, [], SA_RESTORER, 0x7f5c6ec584b0}, 8) = 0
    brk(0x2561000) = 0x2561000
    brk(0x255f000) = 0x255f000
    exit_group(0) = ?
    +++ exited with 0 +++
    In line 9, same as above but after the password the app opens, stays open until I close it. No seg fault in syslog.
    Could it be a problem with the python OpenGL bindings? Thanks

  7. #7
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,215
    Quote Originally Posted by prsman View Post
    Quote Originally Posted by Dark Photon
    you might run "env LD_DEBUG=all your_program
    My linux chops are not enough to now how to run this.
    Don't psyc yourself out. It's easy. For example, instead of:
    Code :
    glxinfo
    run:
    Code :
    env LD_DEBUG=all glxinfo

    However, I don't think you need to do this now. Your info says this problem is happening down in a child process, not in the linking of the main process you're launching.

    Both apps that crash are python scripts.
    Not only that, but (from your output), both are trying to run some Java stuff, and totally unsuccessfully it appears. This is getting weird.

    If apps like glxinfo and glxgears work fine, but only these Python scripts crash like this, then that suggests that there is a problem with the way these Python scripts are trying to use OpenGL, or the environment in which they're being run hasn't been setup properly.

    In line 9 after the wait, is where I have to enter my password, it opens and crashes and I get the seg fault in syslog.
    Password? What, is this trying to do a remote login to some machine, or to change user to root or something?

    Could it be a problem with the python OpenGL bindings? Thanks
    It's sounding likely that it's at least something specific to what these Python scripts are doing, possibly how the Python bindings are trying to load/use OpenGL (or Java). How's Java involved in this?

    The fact that they do segfault in libGL.so.340.102 (NVidia GL library) suggests they're at least in the NVidia GL library and not one of those other GL libs you have on your system. But who knows what they're doing wrong that makes them crash.

    Your straces suggest that it's segfaulting in a child process, which in the last case appears that it might be a Java process. Is that Java process what's trying to use OpenGL?

    You may be able to get more clues what's going by running with "strace -f" rather than just "strace", which will trace down into child processes as well.

    However, I really think you're going to need to find a way to simplify this test cases you've got. Let's get Java out of the picture, and possibly even Python too. Alternatively, you should probably seek help from the folks that wrote this Python/Java/GL app.

    So far based on the evidence (and how stable I know NVidia's GL drivers are), this is looking like you've got some app/process that's mis-using the NVidia GL drivers, and that you just happen to crash in there because the app code is buggy or is encountering a use case it wasn't coded for.
    Last edited by Dark Photon; 12-24-2017 at 02:20 PM.

  8. #8
    Junior Member Newbie
    Join Date
    Dec 2017
    Posts
    6
    Hi Dark Photon, thanks again. The app needs my password because it can make changes to the system. (install programs). I dont know how Java is involved but the other failing app gives a similar java message. Will run the strace -f and see what I find.

    So far based on the evidence (and how stable I know NVidia's GL drivers are), this is looking like you've got some app/process that's mis-using the NVidia GL drivers, and that you just happen to crash in there because the app code is buggy or is encountering a use case it wasn't coded for.
    Then we can agree this is not an OpenGL problem per say and something involving OpenGL. If I dont find something from strace I will close this post. Thank you for your help.

  9. #9
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    4,215
    Another thing you can try: see if you can catch the crash in gdb. When it crashes, dump the stack to see what the program was doing when it crashed (stack trace):
    Code :
    gdb YOUR_PROGRAM
    set args ARGS_TO_YOUR_PROGRAM
    run
    <wait for crash>
    where
    quit

  10. #10
    Junior Member Newbie
    Join Date
    Dec 2017
    Posts
    6
    Hi Dark Photon, well strace -f gave me a file 12 megs and 256K of lines. Read it all and could find anything about libGL. Then I installed a program called apport which is crash reporter.
    I am having trouble reading the core dump. Using the command less on the crash file I see this:

    7f9b134d3000-7f9b13509000 r-xp 00000000 08:01 1182746 /usr/lib/x86_64-linux-gnu/mesa-egl/libEGL.so.1.0.0
    7f9b13509000-7f9b13708000 ---p 00036000 08:01 1182746 /usr/lib/x86_64-linux-gnu/mesa-egl/libEGL.so.1.0.0
    7f9b13708000-7f9b1370a000 r--p 00035000 08:01 1182746 /usr/lib/x86_64-linux-gnu/mesa-egl/libEGL.so.1.0.0
    7f9b1370a000-7f9b1370c000 rw-p 00037000 08:01 1182746 /usr/lib/x86_64-linux-gnu/mesa-egl/libEGL.so.1.0.0
    7f9b1370c000-7f9b137d3000 r-xp 00000000 08:01 3542199 /usr/lib/nvidia-340/libGL.so.340.102
    7f9b137d3000-7f9b13802000 rwxp 000c7000 08:01 3542199 /usr/lib/nvidia-340/libGL.so.340.102
    7f9b13802000-7f9b1381e000 r-xp 000f6000 08:01 3542199 /usr/lib/nvidia-340/libGL.so.340.102
    7f9b1381e000-7f9b13a1d000 ---p 00112000 08:01 3542199 /usr/lib/nvidia-340/libGL.so.340.102
    7f9b13a1d000-7f9b13a42000 rw-p 00111000 08:01 3542199 /usr/lib/nvidia-340/libGL.so.340.102
    So the file is being called I think. I will run your gdb command. Hopefully that will tell us something. Thanks again.

    (EDIT) I ran the gdb command and when it crashed, I typed where and got: no stack ? The apport program generated a crash file and in it is the same
    as in the quotes above for the other failing app.. Interesting the crash reports are called _PROGRAM.py.0.crash. In each program directory is a file called Program.py
    I will try the gdb program again in case I screwed up.
    Last edited by prsman; 12-26-2017 at 05:28 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •