Bug 589007 - Fedora >11 -> Intel S775 + proprietary Nvidia driver = freezes
Summary: Fedora >11 -> Intel S775 + proprietary Nvidia driver = freezes
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 12
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-05 06:12 UTC by puntarenas
Modified: 2010-09-22 15:19 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-09-22 15:19:14 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Created by nvidia-bug-report.sh (38.12 KB, application/x-gzip)
2010-05-05 06:12 UTC, puntarenas
no flags Details
/var/log/messages (35.05 KB, application/x-gzip)
2010-05-05 06:14 UTC, puntarenas
no flags Details
nvidia-bug-report.log (38.12 KB, application/x-gzip)
2010-05-05 06:19 UTC, puntarenas
no flags Details
nvidia-bug-report.log (40.46 KB, application/x-gzip)
2010-05-09 21:00 UTC, puntarenas
no flags Details
/var/log/messages (59.47 KB, application/x-gzip)
2010-05-09 21:01 UTC, puntarenas
no flags Details
logs of freezing MSI P43T-C51 (BIOS V2.5) with Nvidia GTX280 (99.09 KB, application/x-gzip)
2010-06-19 10:47 UTC, puntarenas
no flags Details

Description puntarenas 2010-05-05 06:12:23 UTC
Created attachment 411478 [details]
Created by nvidia-bug-report.sh

Description of problem:

Fedora with any later version than F11 randomly freezes when (and only when) Nvidia's proprietary driver comes into play. 

I am using a Nvidia GTX280 on a Gigabyte 965P-DS4 Rev2. According to other users affected, Intel's P965 chipset in general and/or using a PCIe 2.x card in PCIe 1.x seem to be a possible root of the incompatibillity.

There is a thread with lots of nvidia-bug-report.logs and other users affected by the problem:
http://www.nvnews.net/vbulletin/showthread.php?t=149056

I'm aware that I am reporting a bug connected with a proprietary driver and therefore a tainted kernel, but hopefully some of you kernel developers know what changed between F11 and later kernels. I'm not too confident Nvidia will fix their drivers soon and F11 reaches it's EOL within the next few month.

Version-Release number of selected component (if applicable):

several versions of Fedora kernels (i686 and x86_64) with most recent Nvidia drivers are afected.

How reproducible:

Waiting between some seconds and sometimes hours, most of time system freezes within 15 minutes.

Steps to Reproduce:

1. Install F12 or F13
2. Install Nvidia's proprietary driver (either using rpmfusion repository or following this guide: http://forums.fedoraforum.org/showthread.php?t=240860 )
3. Reboot and wait for the system to freeze (sometimes at GDM login, sometimes after several minutes or even hours in Gnome)
  
Actual results:

random Freezes

Expected results:

rock stable operation as F11 provides

Additional info:

From /var/log/messages:

Mar 18 09:59:56 client01 kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Mar 18 09:59:56 client01 kernel: Pid: 0, comm: swapper Tainted: P           2.6.33-1.fc13.i686 #1
Mar 18 09:59:56 client01 kernel: Call Trace:
Mar 18 09:59:56 client01 kernel: [<c0489e39>] __report_bad_irq+0x33/0x74
Mar 18 09:59:56 client01 kernel: [<c0489f74>] note_interrupt+0xfa/0x152
Mar 18 09:59:56 client01 kernel: [<c048a556>] handle_fasteoi_irq+0x8f/0xb2
Mar 18 09:59:56 client01 kernel: [<c0404ef0>] handle_irq+0x40/0x4c
Mar 18 09:59:56 client01 kernel: [<c040475f>] do_IRQ+0x46/0x9f
Mar 18 09:59:56 client01 kernel: [<c0403975>] common_interrupt+0x35/0x3c
Mar 18 09:59:56 client01 kernel: [<c0409745>] ? mwait_idle+0x68/0x78
Mar 18 09:59:56 client01 kernel: [<c04025be>] cpu_idle+0x9b/0xb5
Mar 18 09:59:56 client01 kernel: [<c07adcaa>] start_secondary+0x204/0x242
Mar 18 09:59:56 client01 kernel: handlers:
Mar 18 09:59:56 client01 kernel: [<c06adaca>] (usb_hcd_irq+0x0/0x8d)
Mar 18 09:59:56 client01 kernel: [<fa16cb15>] (nv_kern_isr+0x0/0x59 [nvidia])
Mar 18 09:59:56 client01 kernel: Disabling IRQ #16
Mar 18 10:00:01 client01 kernel: NVRM: Xid (0001:00): 16, Head 00000001 Count 00000000
Mar 18 10:00:02 client01 kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000044a5


From /var/log/Xorg.0.log.old:

[   230.167] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[   230.167] 
Backtrace:
[   230.182] 0: /usr/bin/Xorg (xorg_backtrace+0x3c) [0x80e4e8c]
[   230.182] 1: /usr/bin/Xorg (mieqEnqueue+0x1b7) [0x80e4797]
[   230.182] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xd4) [0x80be044]
[   230.182] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x142000+0x2f62) [0x144f62]
[   230.182] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x142000+0x3209) [0x145209]
[   230.182] 5: /usr/bin/Xorg (0x8047000+0x697a0) [0x80b07a0]
[   230.182] 6: /usr/bin/Xorg (0x8047000+0x11f614) [0x8166614]
[   230.182] 7: (vdso) (__kernel_sigreturn+0x0) [0x7cc400]
[   230.182] 8: (vdso) (__kernel_vsyscall+0x2) [0x7cc416]
[   230.182] 9: /lib/libc.so.6 (__gettimeofday+0x16) [0x2e58c6]
[   230.182] 10: /usr/lib/xorg/modules/drivers/nvidia_drv.so (_nv001056X+0xcd) [0xfab93d]
[   230.456] (WW) Mar 18 09:59:57 NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x0000cf58)
[   237.456] (WW) Mar 18 10:00:04 NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x0000cf58)


I would love to use Nouveau, but without power management it would waste 20-30W with my GTX280 so I'm stuck with the proprietary driver for now.

Comment 1 puntarenas 2010-05-05 06:14:06 UTC
Created attachment 411479 [details]
/var/log/messages

Comment 2 puntarenas 2010-05-05 06:19:00 UTC
Created attachment 411481 [details]
nvidia-bug-report.log

Comment 3 leigh scott 2010-05-05 09:21:17 UTC
Try updating your ancient kernel

Mar 18 09:56:20 client01 kernel: imklog 4.4.2, log source = /proc/kmsg started.
Mar 18 09:56:20 client01 rsyslogd: [origin software="rsyslogd" swVersion="4.4.2" x-pid="1049" x-info="http://www.rsyslog.com"] (re)start
Mar 18 09:56:20 client01 kernel: Initializing cgroup subsys cpuset
Mar 18 09:56:20 client01 kernel: Initializing cgroup subsys cpu
Mar 18 09:56:20 client01 kernel: Linux version 2.6.33-1.fc13.i686 (mockbuild.fedoraproject.org) (gcc version 4.4.3 20100211 (Red Hat 4.4.3-6) (GCC) ) #1 SMP Wed Feb 24 20:11:36 UTC 2010



I believe this issue was fixed in kernel-2.6.33.2-57.fc13

Comment 4 puntarenas 2010-05-05 17:25:33 UTC
Okay, last time I tried was with prehistoric 2.6.33.1-24-fc13.i686, so I upgraded my test installation to 2.6.33.2-57.fc13.i686 and kmod-nvidia-195.36.24-1.fc13.3 from rpmfusion following your advice.

The system runs for more than 8 hours now and things look good. I even played around with gnome-shell and did some screencasting. No matter what my Fedora is rock stable again. Funny enough I was trying every workaround suggestion and Voodoo magic I could find on the net for several month now and just between my last try and this bugreport the problem was fixed.

Thank you very much Leigh, would you just give some last assistance with closing this bugreport as I'm not familiar with this bugtracker and what tags are to be set?

Comment 5 leigh scott 2010-05-06 00:20:41 UTC
Here's a link that explains the tags

https://bugzilla.redhat.com/page.cgi?id=fields.html#status

Comment 6 puntarenas 2010-05-09 20:58:46 UTC
Thanks again Leigh, but even with the link you provided me I have no clue how to re-open this bug. Unfortunately the freezes hit me again. 

As I still have no idea how to reproduce the freezes on purpose I probably just was lucky when my system seemed to be stable. On the other hand there were some kernel updates lately and maybe the bug was reintroduced with one of those.

I installed F13 RC2 i686 yesterday and the freezes are back with 2.6.33.3-85.fc13.i686 and kmod-nvidia-195.36.24-1.fc13.4 from rpmfusion (dracut rebuild of initrd and blacklisted nouveau).

From /var/log/Xorg.0.log.old:

[   131.871] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[   131.871] 
Backtrace:
[   131.871] 0: /usr/bin/Xorg (xorg_backtrace+0x3c) [0x80e51dc]
[   131.871] 1: /usr/bin/Xorg (mieqEnqueue+0x1b7) [0x80e4ae7]
[   131.872] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xd2) [0x80be302]
[   131.872] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x280000+0x30a2) [0x2830a2]
[   131.872] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x280000+0x3349) [0x283349]
[   131.872] 5: /usr/bin/Xorg (0x8047000+0x69aa0) [0x80b0aa0]
[   131.872] 6: /usr/bin/Xorg (0x8047000+0x11f8f4) [0x81668f4]
[   131.872] 7: (vdso) (__kernel_sigreturn+0x0) [0xaea400]
[   131.872] 8: (vdso) (__kernel_vsyscall+0x2) [0xaea416]
[   131.872] 9: /lib/libc.so.6 (__gettimeofday+0x16) [0x4401e6]
[   131.872] 10: /usr/lib/xorg/modules/drivers/nvidia_drv.so (_nv001057X+0xcd) [0x397daed]
[   132.196] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0xdfff2fff, 0x0000fb08)
[   139.196] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0xdfff2fff, 0x0000fb08)
[   142.197] 


From /var/log/messages:

May  9 22:36:04 client01 kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
May  9 22:36:04 client01 kernel: Pid: 0, comm: swapper Tainted: P           2.6.33.3-85.fc13.i686 #1
May  9 22:36:04 client01 kernel: Call Trace:
May  9 22:36:04 client01 kernel: [<c047a0da>] __report_bad_irq+0x2e/0x6f
May  9 22:36:04 client01 kernel: [<c047a210>] note_interrupt+0xf5/0x14d
May  9 22:36:04 client01 kernel: [<c047a7c3>] handle_fasteoi_irq+0x85/0xa4
May  9 22:36:04 client01 kernel: [<c0404cd3>] handle_irq+0x3b/0x48
May  9 22:36:04 client01 kernel: [<c0404558>] do_IRQ+0x41/0x9a
May  9 22:36:04 client01 kernel: [<c0403830>] common_interrupt+0x30/0x38
May  9 22:36:04 client01 kernel: [<c04091f3>] ? mwait_idle+0x5c/0x67
May  9 22:36:04 client01 kernel: [<c04024b8>] cpu_idle+0x91/0xad
May  9 22:36:04 client01 kernel: [<c075e2ea>] rest_init+0x62/0x64
May  9 22:36:04 client01 kernel: [<c09b78f1>] start_kernel+0x346/0x34b
May  9 22:36:04 client01 kernel: [<c09b7099>] i386_start_kernel+0x99/0xa0
May  9 22:36:04 client01 kernel: handlers:
May  9 22:36:04 client01 kernel: [<c0676877>] (usb_hcd_irq+0x0/0x6a)
May  9 22:36:04 client01 kernel: [<f8c3a49c>] (nv_kern_isr+0x0/0x54 [nvidia])
May  9 22:36:04 client01 kernel: Disabling IRQ #16
May  9 22:36:09 client01 kernel: NVRM: Xid (0001:00): 16, Head 00000001 Count 00000000
May  9 22:36:10 client01 kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0000121d
May  9 22:36:10 client01 kernel: NVRM: Xid (0001:00): 6, PE0003

Comment 7 puntarenas 2010-05-09 21:00:39 UTC
Created attachment 412676 [details]
nvidia-bug-report.log

Comment 8 puntarenas 2010-05-09 21:01:14 UTC
Created attachment 412677 [details]
/var/log/messages

Comment 9 puntarenas 2010-05-21 09:24:20 UTC
2.6.33.4-95.fc13.i686 with kmod-nvidia-2.6.33.4-95.fc13.i686 from RPM-Fusion and the bug hit me again. This time GDM restarted after approximately 30 seconds so there was no need for "SysRq-REISUB".
 
From dmesg:

irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 1327, comm: Xorg Tainted: P           2.6.33.4-95.fc13.i686 #1
Call Trace:
 [<c047a0fa>] __report_bad_irq+0x2e/0x6f
 [<c047a230>] note_interrupt+0xf5/0x14d
 [<c047a7e3>] handle_fasteoi_irq+0x85/0xa4
 [<c0404cd3>] handle_irq+0x3b/0x48
 [<c0404558>] do_IRQ+0x41/0x9a
 [<c0403830>] common_interrupt+0x30/0x38
 [<fb7d1921>] ? _nv007242rm+0xc/0x1d [nvidia]
 [<fb7972ee>] ? _nv014221rm+0x34/0x45 [nvidia]
 [<fb798daa>] ? _nv013862rm+0x4c/0x119 [nvidia]
 [<fb794018>] ? _nv013908rm+0x3a/0xe3 [nvidia]
 [<fb7941ba>] ? _nv014650rm+0xf9/0x185 [nvidia]
 [<fb79426d>] ? _nv014649rm+0x27/0x33 [nvidia]
 [<fb731b32>] ? _nv010009rm+0xf6/0x597 [nvidia]
 [<fb651c97>] ? _nv016283rm+0x21f/0x2db [nvidia]
 [<fb652711>] ? _nv015638rm+0x125/0x156 [nvidia]
 [<fb76df7b>] ? _nv004584rm+0x2b6/0x86d [nvidia]
 [<fb76cf40>] ? _nv004586rm+0x3c/0x47 [nvidia]
 [<fb76d297>] ? _nv004577rm+0xdb/0x45d [nvidia]
 [<fb76cf40>] ? _nv004586rm+0x3c/0x47 [nvidia]
 [<fb76d047>] ? _nv004581rm+0xfc/0x271 [nvidia]
 [<fb76cf40>] ? _nv004586rm+0x3c/0x47 [nvidia]
 [<fb7ed14b>] ? _nv004545rm+0xa7/0xdf [nvidia]
 [<fb7eec57>] ? rm_free_unused_clients+0x65/0xa6 [nvidia]
 [<fb8c733c>] ? nv_kern_ctl_close+0x7b/0xa7 [nvidia]
 [<fb8c7c24>] ? nv_kern_close+0x86/0x2f5 [nvidia]
 [<c04c95ef>] ? __fput+0x9f/0x181
 [<c04c963a>] ? __fput+0xea/0x181
 [<c04c96e4>] ? fput+0x13/0x15
 [<c04c6d5f>] ? filp_close+0x51/0x5b
 [<c04389c0>] ? put_files_struct+0x5f/0xb3
 [<c0438a48>] ? exit_files+0x34/0x38
 [<c043a17b>] ? do_exit+0x200/0x615
 [<c04433d8>] ? __sigqueue_free+0x2d/0x30
 [<c0443766>] ? __dequeue_signal+0xd6/0xfe
 [<c044574b>] ? dequeue_signal+0xb1/0x120
 [<c043a5fb>] ? do_group_exit+0x6b/0x94
 [<c0445b28>] ? get_signal_to_deliver+0x36e/0x389
 [<c077303f>] ? do_page_fault+0x0/0x2fa
 [<c04026d4>] ? do_signal+0x5a/0x6f4
 [<c0420552>] ? force_sig_info_fault+0x43/0x4a
 [<c0424106>] ? kmap_atomic_prot+0xb3/0xd2
 [<c0420420>] ? is_prefetch+0x21/0x110
 [<c0420754>] ? __bad_area_nosemaphore+0xe1/0xf4
 [<c077303f>] ? do_page_fault+0x0/0x2fa
 [<c0420774>] ? bad_area_nosemaphore+0xd/0x10
 [<c07731d3>] ? do_page_fault+0x194/0x2fa
 [<c07717ab>] ? do_device_not_available+0x0/0x50
 [<c077303f>] ? do_page_fault+0x0/0x2fa
 [<c0402d8d>] ? do_notify_resume+0x1f/0x79
 [<c0770e9c>] ? work_notifysig+0x13/0x1b
handlers:
[<c06773b3>] (usb_hcd_irq+0x0/0x6a)
[<fb8c749c>] (nv_kern_isr+0x0/0x54 [nvidia])
Disabling IRQ #16

Comment 10 puntarenas 2010-05-25 17:23:40 UTC
2.6.33.4-95.fc13.i686 with "Nvidia Linux Display Driver 256.25 Beta" and the system also randomly freezes. 
Seems like there will be no fix or workaround from Nvidia's side anytime soon and all my hope lies on kernel developers now.

Comment 11 Maciek Borzecki 2010-05-25 18:52:51 UTC
I did some investigation before and:
1. the issue started appearing since the last of 2.6.31 kernels in F12, and is present in F13
2. the problem seems to be fedora specific, ubuntu/debian unstable/arch do not have this problem
3. comparing the kernels configurations between 2.6.32 ubuntu (working) and 2.6.32 F13 (affected) the only options that drew my attention were: CONFIG_X86_X2APIC, CONFIG_INTR_REMAP, which might come in handy if you have a system with a large number of interrupt lines (not a typical desktop system)
4. somehow Gigabyte 965P-DS* main boards are present in most affected systems (judging by information gathered asking google) - maybe a hardware/firmware bug?

Still, due to lack of time, I have not built a custom kernel for with the options listed above set to disabled. Actually, it might be a reasonable next step to do.

Comment 12 puntarenas 2010-05-27 08:45:03 UTC
Thank you Maciek! I wanted to build a custom kernel myself now following your hint and the guide at the Fedora wiki, but I fear I am by far not skilled enough to help tracking this bug down further:

http://fedoraproject.org/wiki/Docs/CustomKernel

Unfortunately I got stuck at "Configure Kernel Options", because I found neither CONFIG_X86_X2APIC nor CONFIG_INTR_REMAP in one of the config files. The files I looked for those options were:

~/rpmbuild/BUILD/kernel-2.6.33/linux-2.6.33.i686/config-*

~/rpmbuild/BUILD/kernel-2.6.33/linux-2.6.33.i686/configs/kernel-2.6.33.4-i686.config

config-2.6.32-21-386 from linux-image-2.6.32-21-386_2.6.32-21.32_i386.deb (Ubuntu)

Comment 13 leigh scott 2010-05-27 18:22:42 UTC
Have you tried adding 

intel_iommu=igfx_off

or

iommu=soft 


to the end of kernel line in /boot/grub/grub.conf  ?

Comment 14 puntarenas 2010-05-28 10:23:35 UTC
Sorry, I should have mentioned that. I tried intel_iommu=igfx_off after I found another bug, which I thought might somehow be related to my problem:
https://bugzilla.redhat.com/show_bug.cgi?id=538163

Before that I tried a lot of possible workarounds, mostly taken from some forum posts and without realy knowing what they are all about. Here is what I tried:

Added to kernel line in /boot/grub/grub.conf:

intel_iommu=igfx_off
iommu=soft
noapic
acpi=off
acpi=off notsc
pci=nommconf clocksource=hpet
notsc clocksource=hpet
notsc clocksource=acpi_pm

Added to device seciton in /etc/X11/xorg.conf:

"AccelMethod" "EXA"
"AccelMethod" "XAA"

I also set PowerMizer to high performance all the time as mentioned here:
http://www.nvnews.net/vbulletin/showthread.php?t=143434

Option  "Coolbits" "1"
Option  "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x3"

Nothing changed my experience, after some time the system freezes, sometimes the screen even turns black. Then it either needs to be rebooted using SysRq+REISUB or (very rare) GDM comes up with a login screen again. I think GDM restarting is mostly when the freeze occurs before GDM has been completely started, but I am not sure and as being said, I can not reproduce the issue.

Also note that Xorg doesn't need to run at all for the freezes, I also faced them several times when trying to get a nvidia-bug-report.log from runlevel 3 (nvidia kernel module was loaded however).

Comment 15 leigh scott 2010-05-28 10:38:19 UTC
(In reply to comment #14)

> 
> Added to device seciton in /etc/X11/xorg.conf:
> 
> "AccelMethod" "EXA"
> "AccelMethod" "XAA"


The above options aren't useful for nvidia and should be deleted.


> I also set PowerMizer to high performance all the time as mentioned here:
> http://www.nvnews.net/vbulletin/showthread.php?t=143434
> 
> Option  "Coolbits" "1"
> Option  "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x2222;
> PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x3"


Remove the powermizer line as it shouldn't be there , I believe the powermizer line should be added in a conf file in /etc/modprobe.d/

Comment 16 puntarenas 2010-05-28 11:29:54 UTC
This was just to sum up what I tried so far and probably most of those thing were just pretty stupid. However I always switch back to a clean Out-of-the-Box install whenever I messed things up with any workaround (I have a dd image at hand for that).

Comment 17 a_merljak 2010-06-03 17:33:19 UTC
(In reply to comment #16)
> This was just to sum up what I tried so far and probably most of those thing
> were just pretty stupid. However I always switch back to a clean Out-of-the-Box
> install whenever I messed things up with any workaround (I have a dd image at
> hand for that).    

Same problem for me. Tried the nvidia propertary driver

[   273.523] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00007c48, 0x00007e38)
[   274.852] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[   274.852] 
Backtrace:
[   274.874] 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x49ecb8]
[   274.874] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49e664]
[   274.874] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x477e24]
[   274.874] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7fbec02b9000+0x3dbf) [0x7fbec02bcdbf]
[   274.874] 4: /usr/bin/Xorg (0x400000+0x6aae7) [0x46aae7]
[   274.874] 5: /usr/bin/Xorg (0x400000+0x1180f3) [0x5180f3]
[   274.874] 6: /lib64/libc.so.6 (0x3fba000000+0x32a40) [0x3fba032a40]
[   274.874] 7: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7fbec0b21000+0x78140) [0x7fbec0b99140]
[   274.874] 8: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (_nv001056X+0x289) [0x7fbec0b99d79]
[   274.874] 9: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7fbec0b21000+0xd5206) [0x7fbec0bf6206]
[   274.874] 10: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7fbec0b21000+0x383c7e) [0x7fbec0ea4c7e]
[   274.874] 11: /usr/bin/Xorg (0x400000+0xce777) [0x4ce777]
[   274.874] 12: /usr/bin/Xorg (0x400000+0x2c32c) [0x42c32c]
[   274.874] 13: /usr/bin/Xorg (0x400000+0x219ca) [0x4219ca]
[   274.874] 14: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x3fba01ec5d]
[   274.874] 15: /usr/bin/Xorg (0x400000+0x21579) [0x421579]
[   280.523] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00007c48, 0x00007e38)

and also nouveau

[  4315.203] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[  4315.204] 
Backtrace:
[  4315.251] 0: /usr/bin/X (xorg_backtrace+0x28) [0x49ecb8]
[  4315.251] 1: /usr/bin/X (mieqEnqueue+0x1f4) [0x49e664]
[  4315.251] 2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x477e24]
[  4315.251] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f40250d4000+0x3dbf) [0x7f40250d7dbf]
[  4315.251] 4: /usr/bin/X (0x400000+0x6aae7) [0x46aae7]
[  4315.251] 5: /usr/bin/X (0x400000+0x1180f3) [0x5180f3]
[  4315.251] 6: /lib64/libc.so.6 (0x3fba000000+0x32a40) [0x3fba032a40]
[  4315.251] 7: /lib64/libc.so.6 (ioctl+0x7) [0x3fba0d95c7]
[  4315.251] 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x3fcd603388]
[  4315.251] 9: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x3fcd60360b]
[  4315.251] 10: /usr/lib64/libdrm_nouveau.so.1 (0x7f4026a81000+0x2dfd) [0x7f4026a83dfd]
[  4315.251] 11: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfe) [0x7f4026a83fee]
[  4315.251] 12: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f4026c86000+0x6478) [0x7f4026c8c478]
[  4315.251] 13: /usr/lib64/xorg/modules/libexa.so (0x7f4025be5000+0x7d98) [0x7f4025becd98]
[  4315.251] 14: /usr/bin/X (0x400000+0xd4c7c) [0x4d4c7c]
[  4315.251] 15: /usr/bin/X (0x400000+0x29fb9) [0x429fb9]
[  4315.251] 16: /usr/bin/X (0x400000+0x2c32c) [0x42c32c]
[  4315.251] 17: /usr/bin/X (0x400000+0x219ca) [0x4219ca]
[  4315.251] 18: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x3fba01ec5d]
[  4315.251] 19: /usr/bin/X (0x400000+0x21579) [0x421579]

hope that someone solves the problem. For me are F12 and F13 unusable.

P965 NEO mobo
Intel Core2 6400
GeForce 8800 GT

Comment 18 Nils 2010-06-04 12:08:49 UTC
Got the same with my GeForce 8600 GT.

I can reproduce by just starting a game like Urban Terror:

Jun  4 12:51:05 darkmatter kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Jun  4 12:51:05 darkmatter kernel: Pid: 0, comm: swapper Tainted: P           2.6.33.5-112.fc13.i686 #1
Jun  4 12:51:05 darkmatter kernel: Call Trace:
Jun  4 12:51:05 darkmatter kernel: [<c0479efa>] __report_bad_irq+0x2e/0x6f
Jun  4 12:51:05 darkmatter kernel: [<c047a030>] note_interrupt+0xf5/0x14d
Jun  4 12:51:05 darkmatter kernel: [<c047a5e3>] handle_fasteoi_irq+0x85/0xa4
Jun  4 12:51:05 darkmatter kernel: [<c0404cd3>] handle_irq+0x3b/0x48
Jun  4 12:51:05 darkmatter kernel: [<c0404558>] do_IRQ+0x41/0x9a
Jun  4 12:51:05 darkmatter kernel: [<c0403830>] common_interrupt+0x30/0x38
Jun  4 12:51:05 darkmatter kernel: [<c04091f3>] ? mwait_idle+0x5c/0x67
Jun  4 12:51:05 darkmatter kernel: [<c04024b8>] cpu_idle+0x91/0xad
Jun  4 12:51:05 darkmatter kernel: [<c076c2a9>] start_secondary+0x1f5/0x233
Jun  4 12:51:05 darkmatter kernel: handlers:
Jun  4 12:51:05 darkmatter kernel: [<c06771a3>] (usb_hcd_irq+0x0/0x6a)
Jun  4 12:51:05 darkmatter kernel: [<fbae149c>] (nv_kern_isr+0x0/0x54 [nvidia])
Jun  4 12:51:05 darkmatter kernel: Disabling IRQ #16

Comment 19 Richard Allen 2010-06-07 10:41:28 UTC
Same here.  GeForce 8600 GTS in a HP dc7700 convertable minitower.



NVRM: Xid (0001:00): 6, PE0001
NVRM: Xid (0001:00): 6, PE0001
NVRM: Xid (0001:00): 8, Channel 0000007e
NVRM: os_pci_init_handle: invalid context!
NVRM: os_pci_init_handle: invalid context!
NVRM: Xid (0001:00): 8, Channel 0000007e
NVRM: os_pci_init_handle: invalid context!
NVRM: os_pci_init_handle: invalid context!
NVRM: Xid (0001:00): 8, Channel 0000007e


[ 10162.293] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 10162.294]
Backtrace:
[ 10162.304] 0: /usr/bin/Xorg (xorg_backtrace+0x3c) [0x80e51dc]
[ 10162.304] 1: /usr/bin/Xorg (mieqEnqueue+0x1b7) [0x80e4ae7]
[ 10162.304] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xd2) [0x80be302]
[ 10162.304] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x1e7000+0x30a2) [0x1ea0a2]
[ 10162.304] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x1e7000+0x3349) [0x1ea349]
[ 10162.305] 5: /usr/bin/Xorg (0x8047000+0x69aa0) [0x80b0aa0]
[ 10162.305] 6: /usr/bin/Xorg (0x8047000+0x11f8f4) [0x81668f4]
[ 10162.305] 7: (vdso) (__kernel_sigreturn+0x0) [0xd50400]
[ 10162.305] 8: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0xd51000+0x382e5c) [0x10d3e5c]

Comment 20 Richard Allen 2010-06-07 23:33:21 UTC
FYI, my DC7700 machine has:

Intel Q965 Express Chipset
Intel Core 2 Duo Dual Core Processor
2 full-height PCI, 1 full-height PCI Express x1, 1 full height PCI Express x16

Comment 21 Alexey Puzankov 2010-06-09 14:50:46 UTC
I meet this bug on one of PC:

AsRock P43DE motherboard
Core2quade 9400
Nvidia 9800GTX

/var/log/messages
Jun 10 04:41:31 aleo acpid: client connected from 1749[0:0]
Jun 10 04:41:31 aleo acpid: 1 client rule loaded
Jun 10 04:41:48 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000001 Count 00000000
Jun 10 04:41:53 aleo kernel: NVRM: Xid (0005:00): 8, Channel 0000007f
Jun 10 04:41:53 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:41:53 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:42:03 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:42:03 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:42:04 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000000 Count 00000003
Jun 10 04:42:04 aleo kernel: Clocksource tsc unstable (delta = 14058518999 ns)
Jun 10 04:42:04 aleo kernel: Switching to clocksource acpi_pm
Jun 10 04:42:11 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000001 Count 00000001
Jun 10 04:42:12 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000000 Count 00000004
Jun 10 04:42:17 aleo kernel: NVRM: Xid (0005:00): 8, Channel 0000007f
Jun 10 04:42:17 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:42:17 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:42:27 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:42:27 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:42:34 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000001 Count 00000002
Jun 10 04:42:35 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000000 Count 00000005
Jun 10 04:42:51 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:42:51 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:42:51 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:42:51 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:43:14 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:43:14 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:43:14 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:43:14 aleo kernel: NVRM: os_pci_init_handle: invalid context!
Jun 10 04:43:27 aleo kernel: NVRM: os_pci_init_handle: invalid context!

/var/log/Xorg.0.log
[    33.397] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x00003f08)
[    40.397] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x00003f08)
[    56.134] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x000052fc)
[    63.134] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x000052fc)
[    80.526] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x000094bc)
[    87.526] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x000094bc)
[   115.549] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0xdfff2fff, 0x00009f48)
[   122.549] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0xdfff2fff, 0x00009f48)

Comment 22 Frederick Kay 2010-06-13 13:12:11 UTC
Happens to me as well. Interestingly the Bug only appeared after I switched my Mainboard because it doesn't work anymore. Both with Fedora 13 and NVIDIA Geforce 8800 GT. I couldn't create a kernel log so far, because of the crash (it freezes after a few minutes or whenever I start an OpenGL program like glxgears or nvidia-settings (after a few seconds).


Before I had a MSI P35 Neo2-FR/FIR (Intel P35 chipset)
I now have a Gigabyte GA-EP45T-UD3LR (Intel P45)

Maybe it's related to the chipset driver? (Just a guess).

Comment 23 Alexey Puzankov 2010-06-14 13:51:56 UTC
If this problem associate with chipset, then my chipset - Intel P43

Comment 24 Fernando Fernandez Pedraza 2010-06-15 09:05:06 UTC
In my system the chipset is intel X48. It´s a Sun Ultra 24.
I attach the output of lspci -v

Comment 25 Fernando Fernandez Pedraza 2010-06-15 09:07:56 UTC
00:00.0 Host bridge: Intel Corporation 82X38/X48 Express DRAM Controller (rev 01)
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, fast devsel, latency 0
	Capabilities: [e0] Vendor Specific Information: Len=0c <?>
	Kernel driver in use: x38_edac
	Kernel modules: x38_edac

00:01.0 PCI bridge: Intel Corporation 82X38/X48 Express Host-Primary PCI Express Bridge (rev 01) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	Capabilities: [88] Subsystem: Sun Microsystems Computer Corp. Device 5351
	Capabilities: [80] Power Management version 3
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [a0] Express Root Port (Slot+), MSI 00
	Kernel driver in use: pcieport

00:03.0 Communication controller: Intel Corporation 82X38/X48 Express MEI Controller (rev 01)
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, fast devsel, latency 0, IRQ 10
	Memory at f9fffc00 (64-bit, non-prefetchable) [size=16]
	Capabilities: [50] Power Management version 3
	Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+

00:06.0 PCI bridge: Intel Corporation 82X38/X48 Express Host-Secondary PCI Express Bridge (rev 01) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 0000e000-0000efff
	Memory behind bridge: fa000000-feafffff
	Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
	Capabilities: [88] Subsystem: Sun Microsystems Computer Corp. Device 5351
	Capabilities: [80] Power Management version 3
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [a0] Express Root Port (Slot+), MSI 00
	Kernel driver in use: pcieport

00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network Connection (rev 02)
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, fast devsel, latency 0, IRQ 27
	Memory at f9fc0000 (32-bit, non-prefetchable) [size=128K]
	Memory at f9ffe000 (32-bit, non-prefetchable) [size=4K]
	I/O ports at dc00 [size=32]
	Capabilities: [c8] Power Management version 2
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [e0] Vendor Specific Information: Len=06 <?>
	Kernel driver in use: e1000e
	Kernel modules: e1000e

00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, medium devsel, latency 0, IRQ 10
	I/O ports at d880 [size=32]
	Capabilities: [50] Vendor Specific Information: Len=06 <?>
	Kernel driver in use: uhci_hcd

00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, medium devsel, latency 0, IRQ 15
	I/O ports at d800 [size=32]
	Capabilities: [50] Vendor Specific Information: Len=06 <?>
	Kernel driver in use: uhci_hcd

00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) (prog-if 20 [EHCI])
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, medium devsel, latency 0, IRQ 14
	Memory at f9fff800 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Capabilities: [98] Vendor Specific Information: Len=06 <?>
	Kernel driver in use: ehci_hcd

00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02)
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, fast devsel, latency 0, IRQ 3
	Memory at f9ff4000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [50] Power Management version 2
	Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
	Kernel driver in use: HDA Intel
	Kernel modules: snd-hda-intel

00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, medium devsel, latency 0, IRQ 7
	I/O ports at d480 [size=32]
	Capabilities: [50] Vendor Specific Information: Len=06 <?>
	Kernel driver in use: uhci_hcd

00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, medium devsel, latency 0, IRQ 4
	I/O ports at d400 [size=32]
	Capabilities: [50] Vendor Specific Information: Len=06 <?>
	Kernel driver in use: uhci_hcd

00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, medium devsel, latency 0, IRQ 14
	I/O ports at d080 [size=32]
	Capabilities: [50] Vendor Specific Information: Len=06 <?>
	Kernel driver in use: uhci_hcd

00:1d.3 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, medium devsel, latency 0, IRQ 10
	I/O ports at d000 [size=32]
	Capabilities: [50] Vendor Specific Information: Len=06 <?>
	Kernel driver in use: uhci_hcd

00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) (prog-if 20 [EHCI])
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, medium devsel, latency 0, IRQ 7
	Memory at f9fff400 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Capabilities: [98] Vendor Specific Information: Len=06 <?>
	Kernel driver in use: ehci_hcd

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) (prog-if 01 [Subtractive decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=32
	Memory behind bridge: feb00000-febfffff
	Capabilities: [50] Subsystem: Sun Microsystems Computer Corp. Device 5351

00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, medium devsel, latency 0
	Capabilities: [e0] Vendor Specific Information: Len=0c <?>
	Kernel modules: iTCO_wdt

00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) (prog-if 01 [AHCI 1.0])
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 26
	I/O ports at cc00 [size=8]
	I/O ports at c880 [size=4]
	I/O ports at c800 [size=8]
	I/O ports at c480 [size=4]
	I/O ports at c400 [size=32]
	Memory at f9ffd800 (32-bit, non-prefetchable) [size=2K]
	Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit-
	Capabilities: [70] Power Management version 3
	Capabilities: [a8] SATA HBA v1.0
	Capabilities: [b0] Vendor Specific Information: Len=06 <?>
	Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: medium devsel, IRQ 14
	Memory at f9fff000 (64-bit, non-prefetchable) [size=256]
	I/O ports at 0400 [size=32]
	Kernel driver in use: i801_smbus
	Kernel modules: i2c-i801

00:1f.6 Signal processing controller: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem (rev 02)
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: fast devsel, IRQ 14
	Memory at fed08000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [50] Power Management version 3

02:00.0 VGA compatible controller: nVidia Corporation Quadro FX 370 (rev a1) (prog-if 00 [VGA controller])
	Subsystem: nVidia Corporation Device 0491
	Flags: bus master, fast devsel, latency 0, IRQ 10
	Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
	Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
	I/O ports at ec00 [size=128]
	[virtual] Expansion ROM at feae0000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 2
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Kernel driver in use: nvidia
	Kernel modules: nvidia, nouveau, nvidiafb

03:04.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) (prog-if 10 [OHCI])
	Subsystem: Sun Microsystems Computer Corp. Device 5351
	Flags: bus master, medium devsel, latency 64, IRQ 4
	Memory at febff800 (32-bit, non-prefetchable) [size=2K]
	Memory at febf8000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [44] Power Management version 2
	Kernel driver in use: firewire_ohci
	Kernel modules: firewire-ohci

Comment 26 puntarenas 2010-06-19 10:40:58 UTC
I switched from the Gigabyte 965P-DS4 to a MSI P43T-C51 and it is still the same. As soon as the proprietary Nvidia driver comes into play it is just a matter of time and the system freezes.

I am sorry for excluding other chipsets in the first place, I thougt only P965-Boards are affected. Note that I could not find the "irq 16: nobody cared" line in /var/log/messages when the freeze occured on the MSI-Board. Please have a look at "logs-GTX280-MSI-P43T-C51.tar.gz" for reference.

Comment 27 puntarenas 2010-06-19 10:47:59 UTC
Created attachment 425327 [details]
logs of freezing MSI P43T-C51 (BIOS V2.5) with Nvidia GTX280

MSI P43T-C51 (BIOS V2.5) with Nvidia GTX280

kernel 2.6.33.5-112-fc13.i686
kmod-nvidia-195.36.24.2.fc13.1 (rpmfusion)


-> random freezes

nvidia-bug-report.log
/var/log/messages
/var/log/dmesg
/var/log/Xorg.0.log

Comment 28 Richard Allen 2010-06-30 11:40:33 UTC
I'm a bit afraid we wont get any help from either the Fedora or NVidia people on this issue so I suggest we pool our resources and try to figure this out by our selves.  After all, we do have the source of the kernel it self.

We know this is a Fedora only issue and only in F12 and F13 and apparently it hits only Intel based chipsets.  So I installed the F13 kernel srpm and took a look.

[root@rikkilap SOURCES]# ls -l | grep -i intel
-rw-r--r--. 1 root root      553 Jan 11 21:10 drm-intel-big-hammer.patch
-rw-r--r--. 1 root root     1830 Apr 19 21:34 drm-intel-gen5-dither.patch
-rw-r--r--. 1 root root     1015 Apr 19 21:31 drm-intel-make-lvds-work.patch
-rw-r--r--. 1 root root   475052 May  6 17:25 drm-intel-next.patch
-rw-r--r--. 1 root root     4188 Apr 29 18:38 drm-intel-sdvo-fix-2.patch
-rw-r--r--. 1 root root     3918 Apr 26 15:20 drm-intel-sdvo-fix.patch
-rw-r--r--. 1 root root     1228 Mar  3  2009 hda_intel-prealloc-4mb-dmabuffer.patch
-rw-r--r--. 1 root root     2923 May 26 16:03 linux-2.6-intel-iommu-igfx.patch
-rw-r--r--. 1 root root      909 Feb  6 22:41 neuter_intel_microcode_load.patch


[root@rikkilap SPECS]# grep -ic ^Patch kernel.spec
141


Since this is a Fedora only issue, one of those 141 patches has to be to blame.
I'm going to start rebuilding kernels and try to isolate which of the patches is to blame.

Does anyone have any ideas on what could be a good place to start?
Prehaps these?

[root@rikkilap SOURCES]# ls -l *i915*
-rw-r--r--. 1 root root  1625 May 27 01:37 drm-i915-fix-non-ironlake-965-class-crashes.patch
-rw-r--r--. 1 root root 10430 May 27 01:37 drm-i915-use-pipe_control-instruction-on-ironlake-and-sandy-bridge.patch

Comment 29 Jan Vlug 2010-06-30 21:05:54 UTC
I think pretty sure I'm suffering from this bug too.
- The problems started for me after the upgrade to Fedora 12.
- Fedora 11 was rock stable.
- Fedora 13 still has the issue.
- First, I thought it was nvidia driver related. Then I removed all nvidia stuff, and I'm using nouveau now, but the bug still is there.
- Sometimes, my system runs for hours without a crash.
- Sometimes, it hangs already during boot.
- The more hardware is connected, the sooner the crash occurs (e.g. external disk, OpenMoko phone, both via USB).
- I have the impression that I have less frequent freezes after using MSI interrupts for my audio (and nvidia, when I was still using this driver).
- See this bug, which actually is about the same issue: bug #588036, I refer there to some other bugs and reports that look related.
- I'm using an ASUS P5B-MX motherboard.
- Sometimes the system freezes without leaving any logging about errors.
- Sometimes there are errors logged.
- Let me know if logs during the crashes are useful, I can attach them to this bug. 

This is my hardware (lspci -v):
00:00.0 Host bridge: Intel Corporation 82946GZ/PL/GL Memory Controller Hub (rev 02)
	Subsystem: ASUSTeK Computer Inc. Device 823b
	Flags: bus master, fast devsel, latency 0
	Capabilities: [e0] Vendor Specific Information: Len=09 <?>

00:01.0 PCI bridge: Intel Corporation 82946GZ/PL/GL PCI Express Root Port (rev 02) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 0000e000-0000efff
	Memory behind bridge: fa000000-feafffff
	Prefetchable memory behind bridge: 00000000e0000000-00000000efffffff
	Capabilities: [88] Subsystem: ASUSTeK Computer Inc. Device 823b
	Capabilities: [80] Power Management version 3
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [a0] Express Root Port (Slot+), MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [140] Root Complex Link
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1b.0 Audio device: Intel Corporation N10/ICH 7 Family High Definition Audio Controller (rev 01)
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: bus master, fast devsel, latency 0, IRQ 27
	Memory at f9ffc000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [50] Power Management version 2
	Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [130] Root Complex Link
	Kernel driver in use: HDA Intel
	Kernel modules: snd-hda-intel

00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 01) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 00001000-00001fff
	Memory behind bridge: 80000000-801fffff
	Prefetchable memory behind bridge: 0000000080200000-00000000803fffff
	Capabilities: [40] Express Root Port (Slot+), MSI 00
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8179
	Capabilities: [a0] Power Management version 2
	Capabilities: [100] Virtual Channel
	Capabilities: [180] Root Complex Link
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1c.1 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 2 (rev 01) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 00002000-00002fff
	Memory behind bridge: feb00000-febfffff
	Prefetchable memory behind bridge: 0000000080400000-00000000805fffff
	Capabilities: [40] Express Root Port (Slot+), MSI 00
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8179
	Capabilities: [a0] Power Management version 2
	Capabilities: [100] Virtual Channel
	Capabilities: [180] Root Complex Link
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1d.0 USB Controller: Intel Corporation N10/ICH7 Family USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: bus master, medium devsel, latency 0, IRQ 23
	I/O ports at d480 [size=32]
	Kernel driver in use: uhci_hcd

00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: bus master, medium devsel, latency 0, IRQ 19
	I/O ports at d800 [size=32]
	Kernel driver in use: uhci_hcd

00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: bus master, medium devsel, latency 0, IRQ 18
	I/O ports at d880 [size=32]
	Kernel driver in use: uhci_hcd

00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 01) (prog-if 00 [UHCI])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: bus master, medium devsel, latency 0, IRQ 16
	I/O ports at dc00 [size=32]
	Kernel driver in use: uhci_hcd

00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 01) (prog-if 20 [EHCI])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: bus master, medium devsel, latency 0, IRQ 23
	Memory at f9ffbc00 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Kernel driver in use: ehci_hcd

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01 [Subtractive decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=04, subordinate=04, sec-latency=32
	Capabilities: [50] Subsystem: ASUSTeK Computer Inc. Device 8179

00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: bus master, medium devsel, latency 0
	Capabilities: [e0] Vendor Specific Information: Len=0c <?>
	Kernel modules: leds-ss4200, iTCO_wdt, intel-rng

00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) (prog-if 8a [Master SecP PriP])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: bus master, medium devsel, latency 0, IRQ 18
	I/O ports at 01f0 [size=8]
	I/O ports at 03f4 [size=1]
	I/O ports at 0170 [size=8]
	I/O ports at 0374 [size=1]
	I/O ports at ffa0 [size=16]
	Kernel driver in use: ata_piix
	Kernel modules: ata_generic, pata_acpi

00:1f.2 IDE interface: Intel Corporation N10/ICH7 Family SATA IDE Controller (rev 01) (prog-if 8f [Master SecP SecO PriP PriO])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
	I/O ports at d400 [size=8]
	I/O ports at d080 [size=4]
	I/O ports at d000 [size=8]
	I/O ports at cc00 [size=4]
	I/O ports at c880 [size=16]
	Capabilities: [70] Power Management version 2
	Kernel driver in use: ata_piix
	Kernel modules: ata_generic, pata_acpi

00:1f.3 SMBus: Intel Corporation N10/ICH 7 Family SMBus Controller (rev 01)
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: medium devsel, IRQ 19
	I/O ports at 0400 [size=32]
	Kernel driver in use: i801_smbus
	Kernel modules: i2c-i801

01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GTS] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Device 8241
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
	I/O ports at ec00 [size=128]
	Expansion ROM at feae0000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 2
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
	Kernel modules: nouveau, nvidiafb

02:00.0 Ethernet controller: Atheros Communications L1 Gigabit Ethernet (rev b0)
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Flags: bus master, fast devsel, latency 0, IRQ 28
	Memory at febc0000 (64-bit, non-prefetchable) [size=256K]
	Expansion ROM at feba0000 [disabled] [size=128K]
	Capabilities: [40] Power Management version 2
	Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [58] Express Endpoint, MSI 00
	Capabilities: [6c] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Kernel driver in use: atl1
	Kernel modules: atl1

Comment 30 puntarenas 2010-07-05 20:31:13 UTC
Thanks for all your efforts Richard. I tried 2.6.33.5-124.fc13.i686 together with kmod-nvidia 195.36.24 from rpmfusion lately and nothing changed. Then I installed Ubuntu 10.4 and used it for several days. Now as the freezes occur randomly I am not absolutely sure, but as far as I could see Ubuntu is not affected, so I would like to confirm this again..

Unfortunately all I can contribute is trying most recent Fedora kernels and reporting back, although I fear that is not very helpful at all.

Comment 31 a_merljak 2010-07-12 17:30:51 UTC
(In reply to comment #30)
> Thanks for all your efforts Richard. I tried 2.6.33.5-124.fc13.i686 together
> with kmod-nvidia 195.36.24 from rpmfusion lately and nothing changed. Then I
> installed Ubuntu 10.4 and used it for several days. Now as the freezes occur
> randomly I am not absolutely sure, but as far as I could see Ubuntu is not
> affected, so I would like to confirm this again..
> 
> Unfortunately all I can contribute is trying most recent Fedora kernels and
> reporting back, although I fear that is not very helpful at all.    

I'm using the latest kernel-2.6.34.1 from koji and the latest official nvidia 256.35 drivers and the system seems to be stable for the last two days

http://koji.fedoraproject.org/koji/buildinfo?buildID=182791

Comment 32 Richard Allen 2010-07-12 18:05:36 UTC
It's great to hear some good news.  It takes me just under 2 hours just to compile my PAE kernel.  NVidia 256.35?

rpmfusion only has 195.36.  Does NVidia ship something much newer than rpmfusion does?

Comment 33 Alexey Puzankov 2010-07-13 06:17:02 UTC
(In reply to comment #32)
> rpmfusion only has 195.36.  Does NVidia ship something much newer than
> rpmfusion does?    

I find it on http://atrpms.net/dist/f13/ , but no time for test.

Comment 34 Richard Allen 2010-07-14 23:50:49 UTC
This morning I removed (rpm -e) all rpmfusion nvidia software from my machine and downloaded the 256.35 install kit directly from NVidia's website since rpmfusion still has no 256.35 rpm's.

[root@morticia ~]# uptime
 23:48:59 up  7:11,  8 users,  load average: 0.03, 0.11, 0.16

Not a single crash today :)    I know I meight be a bit premature but it's been ages since I made it past 10 minutes uptime :)

Comment 35 Richard Allen 2010-07-16 16:38:18 UTC
Spoke too soon.   With the 256.35 drivers, the situation is much better.
Did get a crash:

Jul 16 13:22:36 morticia kernel: NVRM: os_pci_init_handle: invalid context!
Jul 16 13:22:36 morticia kernel: NVRM: os_pci_init_handle: invalid context!
Jul 16 13:22:49 morticia kernel: NVRM: Xid (0001:00): 8, Channel 0000007f
Jul 16 13:22:49 morticia kernel: NVRM: os_pci_init_handle: invalid context!
Jul 16 13:22:49 morticia kernel: NVRM: os_pci_init_handle: invalid context!
Jul 16 13:22:49 morticia kernel: NVRM: os_pci_init_handle: invalid context!
Jul 16 13:22:49 morticia kernel: NVRM: os_pci_init_handle: invalid context!
Jul 16 13:22:50 morticia kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00a8ae8a
Jul 16 13:22:50 morticia kernel: NVRM: Xid (0001:00): 16, Head 00000001 Count 00a8705f
Jul 16 13:25:25 morticia kernel: INFO: task Xorg:2390 blocked for more than 120 seconds.
Jul 16 13:25:25 morticia kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 16 13:25:25 morticia kernel: Xorg          D 0000928d     0  2390   2387 0x00400084
Jul 16 13:25:25 morticia kernel: f4e83d64 00003086 322447cc 0000928d 00000000 c0a48394 c0a4cf40 c0a4cf40
Jul 16 13:25:25 morticia kernel: c0a4cf40 f35dc25c 00000000 f2627c3c f2f5f6c8 00000000 f25d7180 0000928d
Jul 16 13:25:25 morticia kernel: f35dbfc0 f2f5f6d4 00000000 00000001 f2f5f6d0 f34ccc64 7fffffff f35dbfc0
Jul 16 13:25:25 morticia kernel: Call Trace:
Jul 16 13:25:25 morticia kernel: [<c0781d05>] schedule_timeout+0x22/0xad
Jul 16 13:25:25 morticia kernel: [<c07612bb>] ? sk_wake_async+0x19/0x32
Jul 16 13:25:25 morticia kernel: [<c0781bdb>] wait_for_common+0xbe/0x108
Jul 16 13:25:25 morticia kernel: [<c0439877>] ? default_wake_function+0x0/0xd
Jul 16 13:25:25 morticia kernel: [<c0781c97>] wait_for_completion+0x12/0x14
Jul 16 13:25:25 morticia kernel: [<f9c3cc0b>] os_acquire_sema+0x33/0x59 [nvidia]
Jul 16 13:25:25 morticia kernel: [<f9794a74>] ? _nv000472rm+0xb/0x31 [nvidia]
Jul 16 13:25:25 morticia kernel: [<f9c1509b>] _nv021309rm+0xa/0x21 [nvidia]
Jul 16 13:25:25 morticia kernel: [<f97d51b0>] ? _nv003206rm+0x1e1/0x22f [nvidia]
Jul 16 13:25:25 morticia kernel: [<f97d521d>] ? _nv002032rm+0x1f/0x23 [nvidia]
Jul 16 13:25:25 morticia kernel: [<f97bb546>] ? _nv001711rm+0x2b/0x4e [nvidia]
Jul 16 13:25:25 morticia kernel: [<f9c21e92>] ? _nv002113rm+0x5c2/0x5fb [nvidia]
Jul 16 13:25:25 morticia kernel: [<f9c1e95f>] ? rm_ioctl+0x3e/0x6d [nvidia]
Jul 16 13:25:25 morticia kernel: [<f9c3afdf>] ? nv_kern_ioctl+0x2bf/0x314 [nvidia]
Jul 16 13:25:25 morticia kernel: [<f9c3b065>] ? nv_kern_unlocked_ioctl+0x16/0x1b [nvidia]
Jul 16 13:25:25 morticia kernel: [<f9c3b065>] ? nv_kern_unlocked_ioctl+0x16/0x1b [nvidia]
Jul 16 13:25:25 morticia kernel: [<c04dadf9>] ? vfs_ioctl+0x27/0x91
Jul 16 13:25:25 morticia kernel: [<f9c3b04f>] ? nv_kern_unlocked_ioctl+0x0/0x1b [nvidia]
Jul 16 13:25:25 morticia kernel: [<c04db39a>] ? do_vfs_ioctl+0x48e/0x4cc
Jul 16 13:25:25 morticia kernel: [<c0572635>] ? selinux_file_ioctl+0x3e/0x41
Jul 16 13:25:25 morticia kernel: [<c04db419>] ? sys_ioctl+0x41/0x61
Jul 16 13:25:25 morticia kernel: [<c040889f>] ? sysenter_do_call+0x12/0x28

Comment 36 Laszlo Beres 2010-09-01 11:19:09 UTC
(In reply to comment #24)

> In my system the chipset is intel X48. It´s a Sun Ultra 24.
> I attach the output of lspci -v

Same here. 

[root@lberes log]# uname -rv
2.6.33.8-149.fc13.i686.PAE #1 SMP Tue Aug 17 22:39:27 UTC 2010
[root@lberes log]# 

[root@lberes log]# rpm -qa | grep -i kmod-nvidia
kmod-nvidia-2.6.33.8-149.fc13.i686.PAE-195.36.31-1.fc13.5.i686
[root@lberes log]#

Comment 37 Jan Vlug 2010-09-01 11:37:27 UTC
It might be a workaround to enabled 'PEG force x1' in your BIOS.

I experienced random crashes both with nvidia and nouveau drivers. Currently, I'm using nouveau, and since I enabled this BIOS setting I did not have a single crash for nearly two months. Maybe it works for the nvidia driver as well.

See also bug #588036.

Comment 38 Per Wahlstrom 2010-09-21 17:26:59 UTC
Hi folks, 

This is by no means a Fedora-only issue. I run Ubuntu, and after a motherboard (+cpu) switch, this started happening to me too. Maybe I can provide some info, as I'd very much like this issue to be resolved.

I had no issues at all on the old motherboard:
- Old MB: Asus P5W DH Deluxe (Intel 975x chip-set) + Core 2 Duo E6600 CPU

On new motherboard I get the behavior described in this bug.
- New MB: Asus P6T SE (Intel X58 chip-set) + Core i7 930 CPU

Same Ubuntu installation in both cases.
- Kernel: 2.6.31-20
- nVidia driver: 185.18.36

Problem did not go away after upgrading nVidia driver to 256.53.

The log messages "NVRM: os_pci_init_handle: invalid context!" is printed from the nVidia driver kernel interface code in kernel/os-interface.c (in the unpacked nvidia driver directory).
This is an error condition - when the os_pci_init_handle() function is called in an interrupt context, from the looks of it.

That's as far as I've come.
And it seems like running glxgears is a good way to force the bug to manifest.

Cheers!

Comment 39 Per Wahlstrom 2010-09-21 19:01:46 UTC
Addition: enabling "Sync to VBlank" in nvidia-settings OpenGL Settings seems to have removed the problem. Running Starcraft II in a window (in Wine), glxgears, and playing a video simultaneously now for 30 minutes, plus playing QuakeLive, with nothing in the syslog and no problems.

Comment 40 Chuck Ebbert 2010-09-22 15:19:14 UTC
We don't provide or support the nvidia driver.


Note You need to log in before you can comment on or make changes to this bug.