Created attachment 411478 [details] Created by nvidia-bug-report.sh Description of problem: Fedora with any later version than F11 randomly freezes when (and only when) Nvidia's proprietary driver comes into play. I am using a Nvidia GTX280 on a Gigabyte 965P-DS4 Rev2. According to other users affected, Intel's P965 chipset in general and/or using a PCIe 2.x card in PCIe 1.x seem to be a possible root of the incompatibillity. There is a thread with lots of nvidia-bug-report.logs and other users affected by the problem: http://www.nvnews.net/vbulletin/showthread.php?t=149056 I'm aware that I am reporting a bug connected with a proprietary driver and therefore a tainted kernel, but hopefully some of you kernel developers know what changed between F11 and later kernels. I'm not too confident Nvidia will fix their drivers soon and F11 reaches it's EOL within the next few month. Version-Release number of selected component (if applicable): several versions of Fedora kernels (i686 and x86_64) with most recent Nvidia drivers are afected. How reproducible: Waiting between some seconds and sometimes hours, most of time system freezes within 15 minutes. Steps to Reproduce: 1. Install F12 or F13 2. Install Nvidia's proprietary driver (either using rpmfusion repository or following this guide: http://forums.fedoraforum.org/showthread.php?t=240860 ) 3. Reboot and wait for the system to freeze (sometimes at GDM login, sometimes after several minutes or even hours in Gnome) Actual results: random Freezes Expected results: rock stable operation as F11 provides Additional info: From /var/log/messages: Mar 18 09:59:56 client01 kernel: irq 16: nobody cared (try booting with the "irqpoll" option) Mar 18 09:59:56 client01 kernel: Pid: 0, comm: swapper Tainted: P 2.6.33-1.fc13.i686 #1 Mar 18 09:59:56 client01 kernel: Call Trace: Mar 18 09:59:56 client01 kernel: [<c0489e39>] __report_bad_irq+0x33/0x74 Mar 18 09:59:56 client01 kernel: [<c0489f74>] note_interrupt+0xfa/0x152 Mar 18 09:59:56 client01 kernel: [<c048a556>] handle_fasteoi_irq+0x8f/0xb2 Mar 18 09:59:56 client01 kernel: [<c0404ef0>] handle_irq+0x40/0x4c Mar 18 09:59:56 client01 kernel: [<c040475f>] do_IRQ+0x46/0x9f Mar 18 09:59:56 client01 kernel: [<c0403975>] common_interrupt+0x35/0x3c Mar 18 09:59:56 client01 kernel: [<c0409745>] ? mwait_idle+0x68/0x78 Mar 18 09:59:56 client01 kernel: [<c04025be>] cpu_idle+0x9b/0xb5 Mar 18 09:59:56 client01 kernel: [<c07adcaa>] start_secondary+0x204/0x242 Mar 18 09:59:56 client01 kernel: handlers: Mar 18 09:59:56 client01 kernel: [<c06adaca>] (usb_hcd_irq+0x0/0x8d) Mar 18 09:59:56 client01 kernel: [<fa16cb15>] (nv_kern_isr+0x0/0x59 [nvidia]) Mar 18 09:59:56 client01 kernel: Disabling IRQ #16 Mar 18 10:00:01 client01 kernel: NVRM: Xid (0001:00): 16, Head 00000001 Count 00000000 Mar 18 10:00:02 client01 kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000044a5 From /var/log/Xorg.0.log.old: [ 230.167] [mi] EQ overflowing. The server is probably stuck in an infinite loop. [ 230.167] Backtrace: [ 230.182] 0: /usr/bin/Xorg (xorg_backtrace+0x3c) [0x80e4e8c] [ 230.182] 1: /usr/bin/Xorg (mieqEnqueue+0x1b7) [0x80e4797] [ 230.182] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xd4) [0x80be044] [ 230.182] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x142000+0x2f62) [0x144f62] [ 230.182] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x142000+0x3209) [0x145209] [ 230.182] 5: /usr/bin/Xorg (0x8047000+0x697a0) [0x80b07a0] [ 230.182] 6: /usr/bin/Xorg (0x8047000+0x11f614) [0x8166614] [ 230.182] 7: (vdso) (__kernel_sigreturn+0x0) [0x7cc400] [ 230.182] 8: (vdso) (__kernel_vsyscall+0x2) [0x7cc416] [ 230.182] 9: /lib/libc.so.6 (__gettimeofday+0x16) [0x2e58c6] [ 230.182] 10: /usr/lib/xorg/modules/drivers/nvidia_drv.so (_nv001056X+0xcd) [0xfab93d] [ 230.456] (WW) Mar 18 09:59:57 NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x0000cf58) [ 237.456] (WW) Mar 18 10:00:04 NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x0000cf58) I would love to use Nouveau, but without power management it would waste 20-30W with my GTX280 so I'm stuck with the proprietary driver for now.
Created attachment 411479 [details] /var/log/messages
Created attachment 411481 [details] nvidia-bug-report.log
Try updating your ancient kernel Mar 18 09:56:20 client01 kernel: imklog 4.4.2, log source = /proc/kmsg started. Mar 18 09:56:20 client01 rsyslogd: [origin software="rsyslogd" swVersion="4.4.2" x-pid="1049" x-info="http://www.rsyslog.com"] (re)start Mar 18 09:56:20 client01 kernel: Initializing cgroup subsys cpuset Mar 18 09:56:20 client01 kernel: Initializing cgroup subsys cpu Mar 18 09:56:20 client01 kernel: Linux version 2.6.33-1.fc13.i686 (mockbuild.fedoraproject.org) (gcc version 4.4.3 20100211 (Red Hat 4.4.3-6) (GCC) ) #1 SMP Wed Feb 24 20:11:36 UTC 2010 I believe this issue was fixed in kernel-2.6.33.2-57.fc13
Okay, last time I tried was with prehistoric 2.6.33.1-24-fc13.i686, so I upgraded my test installation to 2.6.33.2-57.fc13.i686 and kmod-nvidia-195.36.24-1.fc13.3 from rpmfusion following your advice. The system runs for more than 8 hours now and things look good. I even played around with gnome-shell and did some screencasting. No matter what my Fedora is rock stable again. Funny enough I was trying every workaround suggestion and Voodoo magic I could find on the net for several month now and just between my last try and this bugreport the problem was fixed. Thank you very much Leigh, would you just give some last assistance with closing this bugreport as I'm not familiar with this bugtracker and what tags are to be set?
Here's a link that explains the tags https://bugzilla.redhat.com/page.cgi?id=fields.html#status
Thanks again Leigh, but even with the link you provided me I have no clue how to re-open this bug. Unfortunately the freezes hit me again. As I still have no idea how to reproduce the freezes on purpose I probably just was lucky when my system seemed to be stable. On the other hand there were some kernel updates lately and maybe the bug was reintroduced with one of those. I installed F13 RC2 i686 yesterday and the freezes are back with 2.6.33.3-85.fc13.i686 and kmod-nvidia-195.36.24-1.fc13.4 from rpmfusion (dracut rebuild of initrd and blacklisted nouveau). From /var/log/Xorg.0.log.old: [ 131.871] [mi] EQ overflowing. The server is probably stuck in an infinite loop. [ 131.871] Backtrace: [ 131.871] 0: /usr/bin/Xorg (xorg_backtrace+0x3c) [0x80e51dc] [ 131.871] 1: /usr/bin/Xorg (mieqEnqueue+0x1b7) [0x80e4ae7] [ 131.872] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xd2) [0x80be302] [ 131.872] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x280000+0x30a2) [0x2830a2] [ 131.872] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x280000+0x3349) [0x283349] [ 131.872] 5: /usr/bin/Xorg (0x8047000+0x69aa0) [0x80b0aa0] [ 131.872] 6: /usr/bin/Xorg (0x8047000+0x11f8f4) [0x81668f4] [ 131.872] 7: (vdso) (__kernel_sigreturn+0x0) [0xaea400] [ 131.872] 8: (vdso) (__kernel_vsyscall+0x2) [0xaea416] [ 131.872] 9: /lib/libc.so.6 (__gettimeofday+0x16) [0x4401e6] [ 131.872] 10: /usr/lib/xorg/modules/drivers/nvidia_drv.so (_nv001057X+0xcd) [0x397daed] [ 132.196] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0xdfff2fff, 0x0000fb08) [ 139.196] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0xdfff2fff, 0x0000fb08) [ 142.197] From /var/log/messages: May 9 22:36:04 client01 kernel: irq 16: nobody cared (try booting with the "irqpoll" option) May 9 22:36:04 client01 kernel: Pid: 0, comm: swapper Tainted: P 2.6.33.3-85.fc13.i686 #1 May 9 22:36:04 client01 kernel: Call Trace: May 9 22:36:04 client01 kernel: [<c047a0da>] __report_bad_irq+0x2e/0x6f May 9 22:36:04 client01 kernel: [<c047a210>] note_interrupt+0xf5/0x14d May 9 22:36:04 client01 kernel: [<c047a7c3>] handle_fasteoi_irq+0x85/0xa4 May 9 22:36:04 client01 kernel: [<c0404cd3>] handle_irq+0x3b/0x48 May 9 22:36:04 client01 kernel: [<c0404558>] do_IRQ+0x41/0x9a May 9 22:36:04 client01 kernel: [<c0403830>] common_interrupt+0x30/0x38 May 9 22:36:04 client01 kernel: [<c04091f3>] ? mwait_idle+0x5c/0x67 May 9 22:36:04 client01 kernel: [<c04024b8>] cpu_idle+0x91/0xad May 9 22:36:04 client01 kernel: [<c075e2ea>] rest_init+0x62/0x64 May 9 22:36:04 client01 kernel: [<c09b78f1>] start_kernel+0x346/0x34b May 9 22:36:04 client01 kernel: [<c09b7099>] i386_start_kernel+0x99/0xa0 May 9 22:36:04 client01 kernel: handlers: May 9 22:36:04 client01 kernel: [<c0676877>] (usb_hcd_irq+0x0/0x6a) May 9 22:36:04 client01 kernel: [<f8c3a49c>] (nv_kern_isr+0x0/0x54 [nvidia]) May 9 22:36:04 client01 kernel: Disabling IRQ #16 May 9 22:36:09 client01 kernel: NVRM: Xid (0001:00): 16, Head 00000001 Count 00000000 May 9 22:36:10 client01 kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0000121d May 9 22:36:10 client01 kernel: NVRM: Xid (0001:00): 6, PE0003
Created attachment 412676 [details] nvidia-bug-report.log
Created attachment 412677 [details] /var/log/messages
2.6.33.4-95.fc13.i686 with kmod-nvidia-2.6.33.4-95.fc13.i686 from RPM-Fusion and the bug hit me again. This time GDM restarted after approximately 30 seconds so there was no need for "SysRq-REISUB". From dmesg: irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 1327, comm: Xorg Tainted: P 2.6.33.4-95.fc13.i686 #1 Call Trace: [<c047a0fa>] __report_bad_irq+0x2e/0x6f [<c047a230>] note_interrupt+0xf5/0x14d [<c047a7e3>] handle_fasteoi_irq+0x85/0xa4 [<c0404cd3>] handle_irq+0x3b/0x48 [<c0404558>] do_IRQ+0x41/0x9a [<c0403830>] common_interrupt+0x30/0x38 [<fb7d1921>] ? _nv007242rm+0xc/0x1d [nvidia] [<fb7972ee>] ? _nv014221rm+0x34/0x45 [nvidia] [<fb798daa>] ? _nv013862rm+0x4c/0x119 [nvidia] [<fb794018>] ? _nv013908rm+0x3a/0xe3 [nvidia] [<fb7941ba>] ? _nv014650rm+0xf9/0x185 [nvidia] [<fb79426d>] ? _nv014649rm+0x27/0x33 [nvidia] [<fb731b32>] ? _nv010009rm+0xf6/0x597 [nvidia] [<fb651c97>] ? _nv016283rm+0x21f/0x2db [nvidia] [<fb652711>] ? _nv015638rm+0x125/0x156 [nvidia] [<fb76df7b>] ? _nv004584rm+0x2b6/0x86d [nvidia] [<fb76cf40>] ? _nv004586rm+0x3c/0x47 [nvidia] [<fb76d297>] ? _nv004577rm+0xdb/0x45d [nvidia] [<fb76cf40>] ? _nv004586rm+0x3c/0x47 [nvidia] [<fb76d047>] ? _nv004581rm+0xfc/0x271 [nvidia] [<fb76cf40>] ? _nv004586rm+0x3c/0x47 [nvidia] [<fb7ed14b>] ? _nv004545rm+0xa7/0xdf [nvidia] [<fb7eec57>] ? rm_free_unused_clients+0x65/0xa6 [nvidia] [<fb8c733c>] ? nv_kern_ctl_close+0x7b/0xa7 [nvidia] [<fb8c7c24>] ? nv_kern_close+0x86/0x2f5 [nvidia] [<c04c95ef>] ? __fput+0x9f/0x181 [<c04c963a>] ? __fput+0xea/0x181 [<c04c96e4>] ? fput+0x13/0x15 [<c04c6d5f>] ? filp_close+0x51/0x5b [<c04389c0>] ? put_files_struct+0x5f/0xb3 [<c0438a48>] ? exit_files+0x34/0x38 [<c043a17b>] ? do_exit+0x200/0x615 [<c04433d8>] ? __sigqueue_free+0x2d/0x30 [<c0443766>] ? __dequeue_signal+0xd6/0xfe [<c044574b>] ? dequeue_signal+0xb1/0x120 [<c043a5fb>] ? do_group_exit+0x6b/0x94 [<c0445b28>] ? get_signal_to_deliver+0x36e/0x389 [<c077303f>] ? do_page_fault+0x0/0x2fa [<c04026d4>] ? do_signal+0x5a/0x6f4 [<c0420552>] ? force_sig_info_fault+0x43/0x4a [<c0424106>] ? kmap_atomic_prot+0xb3/0xd2 [<c0420420>] ? is_prefetch+0x21/0x110 [<c0420754>] ? __bad_area_nosemaphore+0xe1/0xf4 [<c077303f>] ? do_page_fault+0x0/0x2fa [<c0420774>] ? bad_area_nosemaphore+0xd/0x10 [<c07731d3>] ? do_page_fault+0x194/0x2fa [<c07717ab>] ? do_device_not_available+0x0/0x50 [<c077303f>] ? do_page_fault+0x0/0x2fa [<c0402d8d>] ? do_notify_resume+0x1f/0x79 [<c0770e9c>] ? work_notifysig+0x13/0x1b handlers: [<c06773b3>] (usb_hcd_irq+0x0/0x6a) [<fb8c749c>] (nv_kern_isr+0x0/0x54 [nvidia]) Disabling IRQ #16
2.6.33.4-95.fc13.i686 with "Nvidia Linux Display Driver 256.25 Beta" and the system also randomly freezes. Seems like there will be no fix or workaround from Nvidia's side anytime soon and all my hope lies on kernel developers now.
I did some investigation before and: 1. the issue started appearing since the last of 2.6.31 kernels in F12, and is present in F13 2. the problem seems to be fedora specific, ubuntu/debian unstable/arch do not have this problem 3. comparing the kernels configurations between 2.6.32 ubuntu (working) and 2.6.32 F13 (affected) the only options that drew my attention were: CONFIG_X86_X2APIC, CONFIG_INTR_REMAP, which might come in handy if you have a system with a large number of interrupt lines (not a typical desktop system) 4. somehow Gigabyte 965P-DS* main boards are present in most affected systems (judging by information gathered asking google) - maybe a hardware/firmware bug? Still, due to lack of time, I have not built a custom kernel for with the options listed above set to disabled. Actually, it might be a reasonable next step to do.
Thank you Maciek! I wanted to build a custom kernel myself now following your hint and the guide at the Fedora wiki, but I fear I am by far not skilled enough to help tracking this bug down further: http://fedoraproject.org/wiki/Docs/CustomKernel Unfortunately I got stuck at "Configure Kernel Options", because I found neither CONFIG_X86_X2APIC nor CONFIG_INTR_REMAP in one of the config files. The files I looked for those options were: ~/rpmbuild/BUILD/kernel-2.6.33/linux-2.6.33.i686/config-* ~/rpmbuild/BUILD/kernel-2.6.33/linux-2.6.33.i686/configs/kernel-2.6.33.4-i686.config config-2.6.32-21-386 from linux-image-2.6.32-21-386_2.6.32-21.32_i386.deb (Ubuntu)
Have you tried adding intel_iommu=igfx_off or iommu=soft to the end of kernel line in /boot/grub/grub.conf ?
Sorry, I should have mentioned that. I tried intel_iommu=igfx_off after I found another bug, which I thought might somehow be related to my problem: https://bugzilla.redhat.com/show_bug.cgi?id=538163 Before that I tried a lot of possible workarounds, mostly taken from some forum posts and without realy knowing what they are all about. Here is what I tried: Added to kernel line in /boot/grub/grub.conf: intel_iommu=igfx_off iommu=soft noapic acpi=off acpi=off notsc pci=nommconf clocksource=hpet notsc clocksource=hpet notsc clocksource=acpi_pm Added to device seciton in /etc/X11/xorg.conf: "AccelMethod" "EXA" "AccelMethod" "XAA" I also set PowerMizer to high performance all the time as mentioned here: http://www.nvnews.net/vbulletin/showthread.php?t=143434 Option "Coolbits" "1" Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x3" Nothing changed my experience, after some time the system freezes, sometimes the screen even turns black. Then it either needs to be rebooted using SysRq+REISUB or (very rare) GDM comes up with a login screen again. I think GDM restarting is mostly when the freeze occurs before GDM has been completely started, but I am not sure and as being said, I can not reproduce the issue. Also note that Xorg doesn't need to run at all for the freezes, I also faced them several times when trying to get a nvidia-bug-report.log from runlevel 3 (nvidia kernel module was loaded however).
(In reply to comment #14) > > Added to device seciton in /etc/X11/xorg.conf: > > "AccelMethod" "EXA" > "AccelMethod" "XAA" The above options aren't useful for nvidia and should be deleted. > I also set PowerMizer to high performance all the time as mentioned here: > http://www.nvnews.net/vbulletin/showthread.php?t=143434 > > Option "Coolbits" "1" > Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x2222; > PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x3" Remove the powermizer line as it shouldn't be there , I believe the powermizer line should be added in a conf file in /etc/modprobe.d/
This was just to sum up what I tried so far and probably most of those thing were just pretty stupid. However I always switch back to a clean Out-of-the-Box install whenever I messed things up with any workaround (I have a dd image at hand for that).
(In reply to comment #16) > This was just to sum up what I tried so far and probably most of those thing > were just pretty stupid. However I always switch back to a clean Out-of-the-Box > install whenever I messed things up with any workaround (I have a dd image at > hand for that). Same problem for me. Tried the nvidia propertary driver [ 273.523] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0x00007c48, 0x00007e38) [ 274.852] [mi] EQ overflowing. The server is probably stuck in an infinite loop. [ 274.852] Backtrace: [ 274.874] 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x49ecb8] [ 274.874] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49e664] [ 274.874] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xc4) [0x477e24] [ 274.874] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7fbec02b9000+0x3dbf) [0x7fbec02bcdbf] [ 274.874] 4: /usr/bin/Xorg (0x400000+0x6aae7) [0x46aae7] [ 274.874] 5: /usr/bin/Xorg (0x400000+0x1180f3) [0x5180f3] [ 274.874] 6: /lib64/libc.so.6 (0x3fba000000+0x32a40) [0x3fba032a40] [ 274.874] 7: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7fbec0b21000+0x78140) [0x7fbec0b99140] [ 274.874] 8: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (_nv001056X+0x289) [0x7fbec0b99d79] [ 274.874] 9: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7fbec0b21000+0xd5206) [0x7fbec0bf6206] [ 274.874] 10: /usr/lib64/xorg/modules/drivers/nvidia_drv.so (0x7fbec0b21000+0x383c7e) [0x7fbec0ea4c7e] [ 274.874] 11: /usr/bin/Xorg (0x400000+0xce777) [0x4ce777] [ 274.874] 12: /usr/bin/Xorg (0x400000+0x2c32c) [0x42c32c] [ 274.874] 13: /usr/bin/Xorg (0x400000+0x219ca) [0x4219ca] [ 274.874] 14: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x3fba01ec5d] [ 274.874] 15: /usr/bin/Xorg (0x400000+0x21579) [0x421579] [ 280.523] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0x00007c48, 0x00007e38) and also nouveau [ 4315.203] [mi] EQ overflowing. The server is probably stuck in an infinite loop. [ 4315.204] Backtrace: [ 4315.251] 0: /usr/bin/X (xorg_backtrace+0x28) [0x49ecb8] [ 4315.251] 1: /usr/bin/X (mieqEnqueue+0x1f4) [0x49e664] [ 4315.251] 2: /usr/bin/X (xf86PostMotionEventP+0xc4) [0x477e24] [ 4315.251] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f40250d4000+0x3dbf) [0x7f40250d7dbf] [ 4315.251] 4: /usr/bin/X (0x400000+0x6aae7) [0x46aae7] [ 4315.251] 5: /usr/bin/X (0x400000+0x1180f3) [0x5180f3] [ 4315.251] 6: /lib64/libc.so.6 (0x3fba000000+0x32a40) [0x3fba032a40] [ 4315.251] 7: /lib64/libc.so.6 (ioctl+0x7) [0x3fba0d95c7] [ 4315.251] 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x3fcd603388] [ 4315.251] 9: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x3fcd60360b] [ 4315.251] 10: /usr/lib64/libdrm_nouveau.so.1 (0x7f4026a81000+0x2dfd) [0x7f4026a83dfd] [ 4315.251] 11: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfe) [0x7f4026a83fee] [ 4315.251] 12: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f4026c86000+0x6478) [0x7f4026c8c478] [ 4315.251] 13: /usr/lib64/xorg/modules/libexa.so (0x7f4025be5000+0x7d98) [0x7f4025becd98] [ 4315.251] 14: /usr/bin/X (0x400000+0xd4c7c) [0x4d4c7c] [ 4315.251] 15: /usr/bin/X (0x400000+0x29fb9) [0x429fb9] [ 4315.251] 16: /usr/bin/X (0x400000+0x2c32c) [0x42c32c] [ 4315.251] 17: /usr/bin/X (0x400000+0x219ca) [0x4219ca] [ 4315.251] 18: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x3fba01ec5d] [ 4315.251] 19: /usr/bin/X (0x400000+0x21579) [0x421579] hope that someone solves the problem. For me are F12 and F13 unusable. P965 NEO mobo Intel Core2 6400 GeForce 8800 GT
Got the same with my GeForce 8600 GT. I can reproduce by just starting a game like Urban Terror: Jun 4 12:51:05 darkmatter kernel: irq 16: nobody cared (try booting with the "irqpoll" option) Jun 4 12:51:05 darkmatter kernel: Pid: 0, comm: swapper Tainted: P 2.6.33.5-112.fc13.i686 #1 Jun 4 12:51:05 darkmatter kernel: Call Trace: Jun 4 12:51:05 darkmatter kernel: [<c0479efa>] __report_bad_irq+0x2e/0x6f Jun 4 12:51:05 darkmatter kernel: [<c047a030>] note_interrupt+0xf5/0x14d Jun 4 12:51:05 darkmatter kernel: [<c047a5e3>] handle_fasteoi_irq+0x85/0xa4 Jun 4 12:51:05 darkmatter kernel: [<c0404cd3>] handle_irq+0x3b/0x48 Jun 4 12:51:05 darkmatter kernel: [<c0404558>] do_IRQ+0x41/0x9a Jun 4 12:51:05 darkmatter kernel: [<c0403830>] common_interrupt+0x30/0x38 Jun 4 12:51:05 darkmatter kernel: [<c04091f3>] ? mwait_idle+0x5c/0x67 Jun 4 12:51:05 darkmatter kernel: [<c04024b8>] cpu_idle+0x91/0xad Jun 4 12:51:05 darkmatter kernel: [<c076c2a9>] start_secondary+0x1f5/0x233 Jun 4 12:51:05 darkmatter kernel: handlers: Jun 4 12:51:05 darkmatter kernel: [<c06771a3>] (usb_hcd_irq+0x0/0x6a) Jun 4 12:51:05 darkmatter kernel: [<fbae149c>] (nv_kern_isr+0x0/0x54 [nvidia]) Jun 4 12:51:05 darkmatter kernel: Disabling IRQ #16
Same here. GeForce 8600 GTS in a HP dc7700 convertable minitower. NVRM: Xid (0001:00): 6, PE0001 NVRM: Xid (0001:00): 6, PE0001 NVRM: Xid (0001:00): 8, Channel 0000007e NVRM: os_pci_init_handle: invalid context! NVRM: os_pci_init_handle: invalid context! NVRM: Xid (0001:00): 8, Channel 0000007e NVRM: os_pci_init_handle: invalid context! NVRM: os_pci_init_handle: invalid context! NVRM: Xid (0001:00): 8, Channel 0000007e [ 10162.293] [mi] EQ overflowing. The server is probably stuck in an infinite loop. [ 10162.294] Backtrace: [ 10162.304] 0: /usr/bin/Xorg (xorg_backtrace+0x3c) [0x80e51dc] [ 10162.304] 1: /usr/bin/Xorg (mieqEnqueue+0x1b7) [0x80e4ae7] [ 10162.304] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xd2) [0x80be302] [ 10162.304] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x1e7000+0x30a2) [0x1ea0a2] [ 10162.304] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x1e7000+0x3349) [0x1ea349] [ 10162.305] 5: /usr/bin/Xorg (0x8047000+0x69aa0) [0x80b0aa0] [ 10162.305] 6: /usr/bin/Xorg (0x8047000+0x11f8f4) [0x81668f4] [ 10162.305] 7: (vdso) (__kernel_sigreturn+0x0) [0xd50400] [ 10162.305] 8: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0xd51000+0x382e5c) [0x10d3e5c]
FYI, my DC7700 machine has: Intel Q965 Express Chipset Intel Core 2 Duo Dual Core Processor 2 full-height PCI, 1 full-height PCI Express x1, 1 full height PCI Express x16
I meet this bug on one of PC: AsRock P43DE motherboard Core2quade 9400 Nvidia 9800GTX /var/log/messages Jun 10 04:41:31 aleo acpid: client connected from 1749[0:0] Jun 10 04:41:31 aleo acpid: 1 client rule loaded Jun 10 04:41:48 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000001 Count 00000000 Jun 10 04:41:53 aleo kernel: NVRM: Xid (0005:00): 8, Channel 0000007f Jun 10 04:41:53 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:41:53 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:42:03 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:42:03 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:42:04 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000000 Count 00000003 Jun 10 04:42:04 aleo kernel: Clocksource tsc unstable (delta = 14058518999 ns) Jun 10 04:42:04 aleo kernel: Switching to clocksource acpi_pm Jun 10 04:42:11 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000001 Count 00000001 Jun 10 04:42:12 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000000 Count 00000004 Jun 10 04:42:17 aleo kernel: NVRM: Xid (0005:00): 8, Channel 0000007f Jun 10 04:42:17 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:42:17 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:42:27 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:42:27 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:42:34 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000001 Count 00000002 Jun 10 04:42:35 aleo kernel: NVRM: Xid (0005:00): 16, Head 00000000 Count 00000005 Jun 10 04:42:51 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:42:51 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:42:51 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:42:51 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:43:14 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:43:14 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:43:14 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:43:14 aleo kernel: NVRM: os_pci_init_handle: invalid context! Jun 10 04:43:27 aleo kernel: NVRM: os_pci_init_handle: invalid context! /var/log/Xorg.0.log [ 33.397] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x00003f08) [ 40.397] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x00003f08) [ 56.134] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x000052fc) [ 63.134] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x000052fc) [ 80.526] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x000094bc) [ 87.526] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x000094bc) [ 115.549] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0xdfff2fff, 0x00009f48) [ 122.549] (WW) NVIDIA(0): WAIT (1, 7, 0x8000, 0xdfff2fff, 0x00009f48)
Happens to me as well. Interestingly the Bug only appeared after I switched my Mainboard because it doesn't work anymore. Both with Fedora 13 and NVIDIA Geforce 8800 GT. I couldn't create a kernel log so far, because of the crash (it freezes after a few minutes or whenever I start an OpenGL program like glxgears or nvidia-settings (after a few seconds). Before I had a MSI P35 Neo2-FR/FIR (Intel P35 chipset) I now have a Gigabyte GA-EP45T-UD3LR (Intel P45) Maybe it's related to the chipset driver? (Just a guess).
If this problem associate with chipset, then my chipset - Intel P43
In my system the chipset is intel X48. It´s a Sun Ultra 24. I attach the output of lspci -v
00:00.0 Host bridge: Intel Corporation 82X38/X48 Express DRAM Controller (rev 01) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, fast devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel driver in use: x38_edac Kernel modules: x38_edac 00:01.0 PCI bridge: Intel Corporation 82X38/X48 Express Host-Primary PCI Express Bridge (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Capabilities: [88] Subsystem: Sun Microsystems Computer Corp. Device 5351 Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [a0] Express Root Port (Slot+), MSI 00 Kernel driver in use: pcieport 00:03.0 Communication controller: Intel Corporation 82X38/X48 Express MEI Controller (rev 01) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, fast devsel, latency 0, IRQ 10 Memory at f9fffc00 (64-bit, non-prefetchable) [size=16] Capabilities: [50] Power Management version 3 Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+ 00:06.0 PCI bridge: Intel Corporation 82X38/X48 Express Host-Secondary PCI Express Bridge (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: fa000000-feafffff Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff Capabilities: [88] Subsystem: Sun Microsystems Computer Corp. Device 5351 Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [a0] Express Root Port (Slot+), MSI 00 Kernel driver in use: pcieport 00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network Connection (rev 02) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, fast devsel, latency 0, IRQ 27 Memory at f9fc0000 (32-bit, non-prefetchable) [size=128K] Memory at f9ffe000 (32-bit, non-prefetchable) [size=4K] I/O ports at dc00 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] Vendor Specific Information: Len=06 <?> Kernel driver in use: e1000e Kernel modules: e1000e 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) (prog-if 00 [UHCI]) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, medium devsel, latency 0, IRQ 10 I/O ports at d880 [size=32] Capabilities: [50] Vendor Specific Information: Len=06 <?> Kernel driver in use: uhci_hcd 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02) (prog-if 00 [UHCI]) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, medium devsel, latency 0, IRQ 15 I/O ports at d800 [size=32] Capabilities: [50] Vendor Specific Information: Len=06 <?> Kernel driver in use: uhci_hcd 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) (prog-if 20 [EHCI]) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, medium devsel, latency 0, IRQ 14 Memory at f9fff800 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] Vendor Specific Information: Len=06 <?> Kernel driver in use: ehci_hcd 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, fast devsel, latency 0, IRQ 3 Memory at f9ff4000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Kernel driver in use: HDA Intel Kernel modules: snd-hda-intel 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) (prog-if 00 [UHCI]) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, medium devsel, latency 0, IRQ 7 I/O ports at d480 [size=32] Capabilities: [50] Vendor Specific Information: Len=06 <?> Kernel driver in use: uhci_hcd 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) (prog-if 00 [UHCI]) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, medium devsel, latency 0, IRQ 4 I/O ports at d400 [size=32] Capabilities: [50] Vendor Specific Information: Len=06 <?> Kernel driver in use: uhci_hcd 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02) (prog-if 00 [UHCI]) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, medium devsel, latency 0, IRQ 14 I/O ports at d080 [size=32] Capabilities: [50] Vendor Specific Information: Len=06 <?> Kernel driver in use: uhci_hcd 00:1d.3 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02) (prog-if 00 [UHCI]) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, medium devsel, latency 0, IRQ 10 I/O ports at d000 [size=32] Capabilities: [50] Vendor Specific Information: Len=06 <?> Kernel driver in use: uhci_hcd 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) (prog-if 20 [EHCI]) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, medium devsel, latency 0, IRQ 7 Memory at f9fff400 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] Vendor Specific Information: Len=06 <?> Kernel driver in use: ehci_hcd 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=32 Memory behind bridge: feb00000-febfffff Capabilities: [50] Subsystem: Sun Microsystems Computer Corp. Device 5351 00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, medium devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel modules: iTCO_wdt 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) (prog-if 01 [AHCI 1.0]) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 26 I/O ports at cc00 [size=8] I/O ports at c880 [size=4] I/O ports at c800 [size=8] I/O ports at c480 [size=4] I/O ports at c400 [size=32] Memory at f9ffd800 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit- Capabilities: [70] Power Management version 3 Capabilities: [a8] SATA HBA v1.0 Capabilities: [b0] Vendor Specific Information: Len=06 <?> Kernel driver in use: ahci 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: medium devsel, IRQ 14 Memory at f9fff000 (64-bit, non-prefetchable) [size=256] I/O ports at 0400 [size=32] Kernel driver in use: i801_smbus Kernel modules: i2c-i801 00:1f.6 Signal processing controller: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem (rev 02) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: fast devsel, IRQ 14 Memory at fed08000 (64-bit, non-prefetchable) [size=4K] Capabilities: [50] Power Management version 3 02:00.0 VGA compatible controller: nVidia Corporation Quadro FX 370 (rev a1) (prog-if 00 [VGA controller]) Subsystem: nVidia Corporation Device 0491 Flags: bus master, fast devsel, latency 0, IRQ 10 Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Memory at d0000000 (64-bit, prefetchable) [size=256M] Memory at fa000000 (64-bit, non-prefetchable) [size=32M] I/O ports at ec00 [size=128] [virtual] Expansion ROM at feae0000 [disabled] [size=128K] Capabilities: [60] Power Management version 2 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Kernel driver in use: nvidia Kernel modules: nvidia, nouveau, nvidiafb 03:04.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) (prog-if 10 [OHCI]) Subsystem: Sun Microsystems Computer Corp. Device 5351 Flags: bus master, medium devsel, latency 64, IRQ 4 Memory at febff800 (32-bit, non-prefetchable) [size=2K] Memory at febf8000 (32-bit, non-prefetchable) [size=16K] Capabilities: [44] Power Management version 2 Kernel driver in use: firewire_ohci Kernel modules: firewire-ohci
I switched from the Gigabyte 965P-DS4 to a MSI P43T-C51 and it is still the same. As soon as the proprietary Nvidia driver comes into play it is just a matter of time and the system freezes. I am sorry for excluding other chipsets in the first place, I thougt only P965-Boards are affected. Note that I could not find the "irq 16: nobody cared" line in /var/log/messages when the freeze occured on the MSI-Board. Please have a look at "logs-GTX280-MSI-P43T-C51.tar.gz" for reference.
Created attachment 425327 [details] logs of freezing MSI P43T-C51 (BIOS V2.5) with Nvidia GTX280 MSI P43T-C51 (BIOS V2.5) with Nvidia GTX280 kernel 2.6.33.5-112-fc13.i686 kmod-nvidia-195.36.24.2.fc13.1 (rpmfusion) -> random freezes nvidia-bug-report.log /var/log/messages /var/log/dmesg /var/log/Xorg.0.log
I'm a bit afraid we wont get any help from either the Fedora or NVidia people on this issue so I suggest we pool our resources and try to figure this out by our selves. After all, we do have the source of the kernel it self. We know this is a Fedora only issue and only in F12 and F13 and apparently it hits only Intel based chipsets. So I installed the F13 kernel srpm and took a look. [root@rikkilap SOURCES]# ls -l | grep -i intel -rw-r--r--. 1 root root 553 Jan 11 21:10 drm-intel-big-hammer.patch -rw-r--r--. 1 root root 1830 Apr 19 21:34 drm-intel-gen5-dither.patch -rw-r--r--. 1 root root 1015 Apr 19 21:31 drm-intel-make-lvds-work.patch -rw-r--r--. 1 root root 475052 May 6 17:25 drm-intel-next.patch -rw-r--r--. 1 root root 4188 Apr 29 18:38 drm-intel-sdvo-fix-2.patch -rw-r--r--. 1 root root 3918 Apr 26 15:20 drm-intel-sdvo-fix.patch -rw-r--r--. 1 root root 1228 Mar 3 2009 hda_intel-prealloc-4mb-dmabuffer.patch -rw-r--r--. 1 root root 2923 May 26 16:03 linux-2.6-intel-iommu-igfx.patch -rw-r--r--. 1 root root 909 Feb 6 22:41 neuter_intel_microcode_load.patch [root@rikkilap SPECS]# grep -ic ^Patch kernel.spec 141 Since this is a Fedora only issue, one of those 141 patches has to be to blame. I'm going to start rebuilding kernels and try to isolate which of the patches is to blame. Does anyone have any ideas on what could be a good place to start? Prehaps these? [root@rikkilap SOURCES]# ls -l *i915* -rw-r--r--. 1 root root 1625 May 27 01:37 drm-i915-fix-non-ironlake-965-class-crashes.patch -rw-r--r--. 1 root root 10430 May 27 01:37 drm-i915-use-pipe_control-instruction-on-ironlake-and-sandy-bridge.patch
I think pretty sure I'm suffering from this bug too. - The problems started for me after the upgrade to Fedora 12. - Fedora 11 was rock stable. - Fedora 13 still has the issue. - First, I thought it was nvidia driver related. Then I removed all nvidia stuff, and I'm using nouveau now, but the bug still is there. - Sometimes, my system runs for hours without a crash. - Sometimes, it hangs already during boot. - The more hardware is connected, the sooner the crash occurs (e.g. external disk, OpenMoko phone, both via USB). - I have the impression that I have less frequent freezes after using MSI interrupts for my audio (and nvidia, when I was still using this driver). - See this bug, which actually is about the same issue: bug #588036, I refer there to some other bugs and reports that look related. - I'm using an ASUS P5B-MX motherboard. - Sometimes the system freezes without leaving any logging about errors. - Sometimes there are errors logged. - Let me know if logs during the crashes are useful, I can attach them to this bug. This is my hardware (lspci -v): 00:00.0 Host bridge: Intel Corporation 82946GZ/PL/GL Memory Controller Hub (rev 02) Subsystem: ASUSTeK Computer Inc. Device 823b Flags: bus master, fast devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=09 <?> 00:01.0 PCI bridge: Intel Corporation 82946GZ/PL/GL PCI Express Root Port (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: fa000000-feafffff Prefetchable memory behind bridge: 00000000e0000000-00000000efffffff Capabilities: [88] Subsystem: ASUSTeK Computer Inc. Device 823b Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [a0] Express Root Port (Slot+), MSI 00 Capabilities: [100] Virtual Channel Capabilities: [140] Root Complex Link Kernel driver in use: pcieport Kernel modules: shpchp 00:1b.0 Audio device: Intel Corporation N10/ICH 7 Family High Definition Audio Controller (rev 01) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: bus master, fast devsel, latency 0, IRQ 27 Memory at f9ffc000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [130] Root Complex Link Kernel driver in use: HDA Intel Kernel modules: snd-hda-intel 00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: 00001000-00001fff Memory behind bridge: 80000000-801fffff Prefetchable memory behind bridge: 0000000080200000-00000000803fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8179 Capabilities: [a0] Power Management version 2 Capabilities: [100] Virtual Channel Capabilities: [180] Root Complex Link Kernel driver in use: pcieport Kernel modules: shpchp 00:1c.1 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 2 (rev 01) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: 00002000-00002fff Memory behind bridge: feb00000-febfffff Prefetchable memory behind bridge: 0000000080400000-00000000805fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8179 Capabilities: [a0] Power Management version 2 Capabilities: [100] Virtual Channel Capabilities: [180] Root Complex Link Kernel driver in use: pcieport Kernel modules: shpchp 00:1d.0 USB Controller: Intel Corporation N10/ICH7 Family USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: bus master, medium devsel, latency 0, IRQ 23 I/O ports at d480 [size=32] Kernel driver in use: uhci_hcd 00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: bus master, medium devsel, latency 0, IRQ 19 I/O ports at d800 [size=32] Kernel driver in use: uhci_hcd 00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: bus master, medium devsel, latency 0, IRQ 18 I/O ports at d880 [size=32] Kernel driver in use: uhci_hcd 00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 01) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: bus master, medium devsel, latency 0, IRQ 16 I/O ports at dc00 [size=32] Kernel driver in use: uhci_hcd 00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 01) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: bus master, medium devsel, latency 0, IRQ 23 Memory at f9ffbc00 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Kernel driver in use: ehci_hcd 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=04, subordinate=04, sec-latency=32 Capabilities: [50] Subsystem: ASUSTeK Computer Inc. Device 8179 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: bus master, medium devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel modules: leds-ss4200, iTCO_wdt, intel-rng 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) (prog-if 8a [Master SecP PriP]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: bus master, medium devsel, latency 0, IRQ 18 I/O ports at 01f0 [size=8] I/O ports at 03f4 [size=1] I/O ports at 0170 [size=8] I/O ports at 0374 [size=1] I/O ports at ffa0 [size=16] Kernel driver in use: ata_piix Kernel modules: ata_generic, pata_acpi 00:1f.2 IDE interface: Intel Corporation N10/ICH7 Family SATA IDE Controller (rev 01) (prog-if 8f [Master SecP SecO PriP PriO]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19 I/O ports at d400 [size=8] I/O ports at d080 [size=4] I/O ports at d000 [size=8] I/O ports at cc00 [size=4] I/O ports at c880 [size=16] Capabilities: [70] Power Management version 2 Kernel driver in use: ata_piix Kernel modules: ata_generic, pata_acpi 00:1f.3 SMBus: Intel Corporation N10/ICH 7 Family SMBus Controller (rev 01) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: medium devsel, IRQ 19 I/O ports at 0400 [size=32] Kernel driver in use: i801_smbus Kernel modules: i2c-i801 01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GTS] (rev a1) (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Device 8241 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at fa000000 (64-bit, non-prefetchable) [size=32M] I/O ports at ec00 [size=128] Expansion ROM at feae0000 [disabled] [size=128K] Capabilities: [60] Power Management version 2 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nouveau Kernel modules: nouveau, nvidiafb 02:00.0 Ethernet controller: Atheros Communications L1 Gigabit Ethernet (rev b0) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Flags: bus master, fast devsel, latency 0, IRQ 28 Memory at febc0000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at feba0000 [disabled] [size=128K] Capabilities: [40] Power Management version 2 Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [58] Express Endpoint, MSI 00 Capabilities: [6c] Vital Product Data Capabilities: [100] Advanced Error Reporting Kernel driver in use: atl1 Kernel modules: atl1
Thanks for all your efforts Richard. I tried 2.6.33.5-124.fc13.i686 together with kmod-nvidia 195.36.24 from rpmfusion lately and nothing changed. Then I installed Ubuntu 10.4 and used it for several days. Now as the freezes occur randomly I am not absolutely sure, but as far as I could see Ubuntu is not affected, so I would like to confirm this again.. Unfortunately all I can contribute is trying most recent Fedora kernels and reporting back, although I fear that is not very helpful at all.
(In reply to comment #30) > Thanks for all your efforts Richard. I tried 2.6.33.5-124.fc13.i686 together > with kmod-nvidia 195.36.24 from rpmfusion lately and nothing changed. Then I > installed Ubuntu 10.4 and used it for several days. Now as the freezes occur > randomly I am not absolutely sure, but as far as I could see Ubuntu is not > affected, so I would like to confirm this again.. > > Unfortunately all I can contribute is trying most recent Fedora kernels and > reporting back, although I fear that is not very helpful at all. I'm using the latest kernel-2.6.34.1 from koji and the latest official nvidia 256.35 drivers and the system seems to be stable for the last two days http://koji.fedoraproject.org/koji/buildinfo?buildID=182791
It's great to hear some good news. It takes me just under 2 hours just to compile my PAE kernel. NVidia 256.35? rpmfusion only has 195.36. Does NVidia ship something much newer than rpmfusion does?
(In reply to comment #32) > rpmfusion only has 195.36. Does NVidia ship something much newer than > rpmfusion does? I find it on http://atrpms.net/dist/f13/ , but no time for test.
This morning I removed (rpm -e) all rpmfusion nvidia software from my machine and downloaded the 256.35 install kit directly from NVidia's website since rpmfusion still has no 256.35 rpm's. [root@morticia ~]# uptime 23:48:59 up 7:11, 8 users, load average: 0.03, 0.11, 0.16 Not a single crash today :) I know I meight be a bit premature but it's been ages since I made it past 10 minutes uptime :)
Spoke too soon. With the 256.35 drivers, the situation is much better. Did get a crash: Jul 16 13:22:36 morticia kernel: NVRM: os_pci_init_handle: invalid context! Jul 16 13:22:36 morticia kernel: NVRM: os_pci_init_handle: invalid context! Jul 16 13:22:49 morticia kernel: NVRM: Xid (0001:00): 8, Channel 0000007f Jul 16 13:22:49 morticia kernel: NVRM: os_pci_init_handle: invalid context! Jul 16 13:22:49 morticia kernel: NVRM: os_pci_init_handle: invalid context! Jul 16 13:22:49 morticia kernel: NVRM: os_pci_init_handle: invalid context! Jul 16 13:22:49 morticia kernel: NVRM: os_pci_init_handle: invalid context! Jul 16 13:22:50 morticia kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00a8ae8a Jul 16 13:22:50 morticia kernel: NVRM: Xid (0001:00): 16, Head 00000001 Count 00a8705f Jul 16 13:25:25 morticia kernel: INFO: task Xorg:2390 blocked for more than 120 seconds. Jul 16 13:25:25 morticia kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 16 13:25:25 morticia kernel: Xorg D 0000928d 0 2390 2387 0x00400084 Jul 16 13:25:25 morticia kernel: f4e83d64 00003086 322447cc 0000928d 00000000 c0a48394 c0a4cf40 c0a4cf40 Jul 16 13:25:25 morticia kernel: c0a4cf40 f35dc25c 00000000 f2627c3c f2f5f6c8 00000000 f25d7180 0000928d Jul 16 13:25:25 morticia kernel: f35dbfc0 f2f5f6d4 00000000 00000001 f2f5f6d0 f34ccc64 7fffffff f35dbfc0 Jul 16 13:25:25 morticia kernel: Call Trace: Jul 16 13:25:25 morticia kernel: [<c0781d05>] schedule_timeout+0x22/0xad Jul 16 13:25:25 morticia kernel: [<c07612bb>] ? sk_wake_async+0x19/0x32 Jul 16 13:25:25 morticia kernel: [<c0781bdb>] wait_for_common+0xbe/0x108 Jul 16 13:25:25 morticia kernel: [<c0439877>] ? default_wake_function+0x0/0xd Jul 16 13:25:25 morticia kernel: [<c0781c97>] wait_for_completion+0x12/0x14 Jul 16 13:25:25 morticia kernel: [<f9c3cc0b>] os_acquire_sema+0x33/0x59 [nvidia] Jul 16 13:25:25 morticia kernel: [<f9794a74>] ? _nv000472rm+0xb/0x31 [nvidia] Jul 16 13:25:25 morticia kernel: [<f9c1509b>] _nv021309rm+0xa/0x21 [nvidia] Jul 16 13:25:25 morticia kernel: [<f97d51b0>] ? _nv003206rm+0x1e1/0x22f [nvidia] Jul 16 13:25:25 morticia kernel: [<f97d521d>] ? _nv002032rm+0x1f/0x23 [nvidia] Jul 16 13:25:25 morticia kernel: [<f97bb546>] ? _nv001711rm+0x2b/0x4e [nvidia] Jul 16 13:25:25 morticia kernel: [<f9c21e92>] ? _nv002113rm+0x5c2/0x5fb [nvidia] Jul 16 13:25:25 morticia kernel: [<f9c1e95f>] ? rm_ioctl+0x3e/0x6d [nvidia] Jul 16 13:25:25 morticia kernel: [<f9c3afdf>] ? nv_kern_ioctl+0x2bf/0x314 [nvidia] Jul 16 13:25:25 morticia kernel: [<f9c3b065>] ? nv_kern_unlocked_ioctl+0x16/0x1b [nvidia] Jul 16 13:25:25 morticia kernel: [<f9c3b065>] ? nv_kern_unlocked_ioctl+0x16/0x1b [nvidia] Jul 16 13:25:25 morticia kernel: [<c04dadf9>] ? vfs_ioctl+0x27/0x91 Jul 16 13:25:25 morticia kernel: [<f9c3b04f>] ? nv_kern_unlocked_ioctl+0x0/0x1b [nvidia] Jul 16 13:25:25 morticia kernel: [<c04db39a>] ? do_vfs_ioctl+0x48e/0x4cc Jul 16 13:25:25 morticia kernel: [<c0572635>] ? selinux_file_ioctl+0x3e/0x41 Jul 16 13:25:25 morticia kernel: [<c04db419>] ? sys_ioctl+0x41/0x61 Jul 16 13:25:25 morticia kernel: [<c040889f>] ? sysenter_do_call+0x12/0x28
(In reply to comment #24) > In my system the chipset is intel X48. It´s a Sun Ultra 24. > I attach the output of lspci -v Same here. [root@lberes log]# uname -rv 2.6.33.8-149.fc13.i686.PAE #1 SMP Tue Aug 17 22:39:27 UTC 2010 [root@lberes log]# [root@lberes log]# rpm -qa | grep -i kmod-nvidia kmod-nvidia-2.6.33.8-149.fc13.i686.PAE-195.36.31-1.fc13.5.i686 [root@lberes log]#
It might be a workaround to enabled 'PEG force x1' in your BIOS. I experienced random crashes both with nvidia and nouveau drivers. Currently, I'm using nouveau, and since I enabled this BIOS setting I did not have a single crash for nearly two months. Maybe it works for the nvidia driver as well. See also bug #588036.
Hi folks, This is by no means a Fedora-only issue. I run Ubuntu, and after a motherboard (+cpu) switch, this started happening to me too. Maybe I can provide some info, as I'd very much like this issue to be resolved. I had no issues at all on the old motherboard: - Old MB: Asus P5W DH Deluxe (Intel 975x chip-set) + Core 2 Duo E6600 CPU On new motherboard I get the behavior described in this bug. - New MB: Asus P6T SE (Intel X58 chip-set) + Core i7 930 CPU Same Ubuntu installation in both cases. - Kernel: 2.6.31-20 - nVidia driver: 185.18.36 Problem did not go away after upgrading nVidia driver to 256.53. The log messages "NVRM: os_pci_init_handle: invalid context!" is printed from the nVidia driver kernel interface code in kernel/os-interface.c (in the unpacked nvidia driver directory). This is an error condition - when the os_pci_init_handle() function is called in an interrupt context, from the looks of it. That's as far as I've come. And it seems like running glxgears is a good way to force the bug to manifest. Cheers!
Addition: enabling "Sync to VBlank" in nvidia-settings OpenGL Settings seems to have removed the problem. Running Starcraft II in a window (in Wine), glxgears, and playing a video simultaneously now for 30 minutes, plus playing QuakeLive, with nothing in the syslog and no problems.
We don't provide or support the nvidia driver.