Chapter 8. Common Problems
This section provides solutions to common problems associated with the NVIDIA
Linux x86_64 Driver.
Q. My X server fails to start, and my X log file contains the error:
(EE) NVIDIA(0): The NVIDIA kernel module does not appear to
(EE) NVIDIA(0): be receiving interrupts generated by the NVIDIA
graphics
(EE) NVIDIA(0): device PCI:x:x:x. Please see the COMMON PROBLEMS
(EE) NVIDIA(0): section in the README for additional information.
A. This can be caused by a variety of problems, such as PCI IRQ routing
errors, I/O APIC problems, conflicts with other devices sharing the IRQ (or
their drivers), or MSI compatibility problems.
If possible, configure your system such that your graphics card does not
share its IRQ with other devices (try moving the graphics card to another
slot if applicable, unload/disable the driver(s) for the device(s) sharing
the card’s IRQ, or remove/disable the device(s)).
Depending on the nature of the problem, one of (or a combination of) these
kernel parameters might also help:
Parameter Behavior
-------------- ---------------------------------------------------
pci=noacpi don't use ACPI for PCI IRQ routing
pci=biosirq use PCI BIOS calls to retrieve the IRQ routing
table
noapic don't use I/O APICs present in the system
acpi=off disable ACPI
The problem may also be caused by MSI compatibility problems. See “MSI
Interrupts” for details.
…
Q. My X server fails to start, and my X log file contains the error:
(EE) NVIDIA(0): The interrupt for NVIDIA graphics device PCI:x:x:x
(EE) NVIDIA(0): appears to be edge-triggered. Please see the COMMON
(EE) NVIDIA(0): PROBLEMS section in the README for additional
information.
A. An edge-triggered interrupt means that the kernel has programmed the
interrupt as edge-triggered rather than level-triggered in the Advanced
Programmable Interrupt Controller (APIC). Edge-triggered interrupts are not
intended to be used for sharing an interrupt line between multiple devices;
level-triggered interrupts are the intended trigger for such usage. When
using edge-triggered interrupts, it is common for device drivers using that
interrupt line to stop receiving interrupts. This would appear to the end
user as those devices no longer working, and potentially as a full system
hang. These problems tend to be more common when multiple devices are
sharing that interrupt line.
This occurs when ACPI is not used to program interrupt routing in the APIC.
It may also occur when ACPI is disabled, or fails to initialize. In these
cases, the Linux kernel falls back to tables provided by the system BIOS.
In some cases the system BIOS assumes ACPI will be used for routing
interrupts and configures these tables to incorrectly label all interrupts
as edge-triggered. The current interrupt configuration can be found in
/proc/interrupts.
Available workarounds include: updating to a newer system BIOS, a more
recent Linux kernel with ACPI enabled, or passing the ‘noapic’ option to
the kernel to force interrupt routing through the traditional Programmable
Interrupt Controller (PIC). The Linux kernel also provides an interrupt
polling mechanism you can use to attempt to work around this problem. This
mechanism can be enabled by passing the ‘irqpoll’ option to the kernel.
Currently, the NVIDIA driver will attempt to detect edge triggered
interrupts and X will purposely fail to start (to avoid stability issues).
This behavior can be overridden by setting the “NVreg_RMEdgeIntrCheck”
NVIDIA Linux kernel module parameter. This parameter defaults to “1”, which
enables the edge triggered interrupt detection. Set this parameter to “0”
to disable this detection.
…
Driver fails to initialize when MSI interrupts are enabled
The Linux NVIDIA driver uses Message Signaled Interrupts (MSI) by default.
This provides compatibility and scalability benefits, mainly due to the
avoidance of IRQ sharing.
Some systems have been seen to have problems supporting MSI, while working
fine with virtual wire interrupts. These problems manifest as an inability
to start X with the NVIDIA driver, or CUDA initialization failures. The
NVIDIA driver will then report an error indicating that the NVIDIA kernel
module does not appear to be receiving interrupts generated by the GPU.
Problems have also been seen with suspend/resume while MSI is enabled. All
known problems have been fixed, but if you observe problems with
suspend/resume that you did not see with previous drivers, disabling MSI
may help you.
NVIDIA is working on a long-term solution to improve the driver's out of
the box compatibility with system configurations that do not fully support
MSI.
MSI interrupts can be disabled via the NVIDIA kernel module parameter
"NVreg_EnableMSI=0". This can be set on the command line when loading the
module, or more appropriately via your distribution's kernel module
configuration files (such as those under /etc/modprobe.d/).
…
Q. OpenGL applications leak significant amounts of memory on my system!
A. If your kernel is making use of the -rmap VM, the system may be leaking
memory due to a memory management optimization introduced in -rmap14a. The
-rmap VM has been adopted by several popular distributions, the memory leak
is known to be present in some of the distribution kernels; it has been
fixed in -rmap15e.
If you suspect that your system is affected, try upgrading your kernel or
contact your distribution’s vendor for assistance.