Description of problem: i do not understand this message nor does google: Message from syslogd@desktop at Aug 23 00:32:52 ... kernel:[173439.660918] NMI: PCI system error (SERR) for reason b1 on CPU 0. Message from syslogd@desktop at Aug 23 00:32:52 ... kernel:[173439.660924] Dazed and confused, but trying to continue Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
(In reply to comment #0) > Description of problem: > > i do not understand this message nor does google: > > Message from syslogd@desktop at Aug 23 00:32:52 ... > kernel:[173439.660918] NMI: PCI system error (SERR) for reason b1 on CPU 0. > > Message from syslogd@desktop at Aug 23 00:32:52 ... > kernel:[173439.660924] Dazed and confused, but trying to continue Is there anything else in /var/log/messages around these lines? Like a backtrace, etc?
Can you attach the output of 'lspci -vvv'? That might point to the device that triggered the NMI. Cheers, Don
Created attachment 520198 [details] lspci output
@Josh this wasnt a file this was a wall message. and it had the same message for each core on the cpu
(In reply to comment #3) > Created attachment 520198 [details] > lspci output Hi Mohammed, Thanks fr the output. I realized I forgot to ask for the 'lspci -t' output to (you don't need to reproduce the error to get this). This output shows how the devices and bridges are connected. There seems to be a couple of pcie bridges generating some PCI errors, just trying to figure out which device(s) they belong to. Cheers, Don
output of lspci -t -[0000:00]-+-00.0 +-02.0 +-16.0 +-19.0 +-1a.0 +-1b.0 +-1c.0-[02]-- +-1c.1-[03]----00.0 +-1c.3-[05-0c]-- +-1c.4-[0d]--+-00.0 | \-00.3 +-1d.0 +-1f.0 +-1f.2 \-1f.3
Hmm, I don't see anything obvious. A couple of Master Aborts and Correctable Errors from the Firewire and SDHCI controller, but I can't see that setting an NMI. What were/are you doing leading up to that message? Can you attach a 'dmesg' output, to see if there is some other messages that lead up to it (ie a network or storage issue). Though a lot of these messages I see are from video or scsi cards. It doesn't seem you have a scsi card, so I wouldn't be surprised if it was video related. Cheers, Don
Created attachment 522742 [details] lspci -vvv output from a Thinkpad T60p
Also seen on my Thinkpad T60p (lspci -vvv output above), apparently triggered when unlocking the screensaver (blank screen). Message from syslogd@(none) at Sep 12 13:03:48 ... kernel:[12015.443028] NMI: PCI system error (SERR) for reason b1 on CPU 0. Message from syslogd@(none) at Sep 12 13:03:48 ... kernel:[12015.443028] Dazed and confused, but trying to continue [root@nexus6wifi ~]# cat /sys/module/pcie_aspm/parameters/policy default performance [powersave] [root@nexus6wifi ~]# lspci -t -[0000:00]-+-00.0 +-01.0-[01]----00.0 +-1b.0 +-1c.0-[02]----00.0 +-1c.1-[03]----00.0 +-1c.2-[04-0b]-- +-1c.3-[0c-13]-- +-1d.0 +-1d.1 +-1d.2 +-1d.3 +-1d.7 +-1e.0-[15-18]----00.0 +-1f.0 +-1f.1 +-1f.2 \-1f.3 Could it be related to this discussion on lkml? https://lkml.org/lkml/2011/3/20/102
The only thing I can see that might be giving problems is the wireless card, but that is just a Master Abort and may not be the real reason. The link you mentioned above should be included in 2.6.39. Not sure what version you have. Otherwise setting "pcie_aspm=off" might have the same effect. Cheers, Don
don at the time of the wall message i did look in to /var/log/messages and could not find anything. i did not look at the dmesg output. and since then, it did not reoccur. i do not exactly remember exactly what i was doing to cause this error.
i have this same error in fedora 16 too... [root@f16dell xfoss]# Message from syslogd@f16dell at Oct 1 18:52:19 ... kernel:[12562.165361] NMI: PCI system error (SERR) for reason b1 on CPU 0. Message from syslogd@f16dell at Oct 1 18:52:19 ... kernel:[12562.165374] Dazed and confused, but trying to continue
(In reply to comment #12) > i have this same error in fedora 16 too... > > [root@f16dell xfoss]# > Message from syslogd@f16dell at Oct 1 18:52:19 ... > kernel:[12562.165361] NMI: PCI system error (SERR) for reason b1 on CPU 0. > > Message from syslogd@f16dell at Oct 1 18:52:19 ... > kernel:[12562.165374] Dazed and confused, but trying to continue Can you attach the output of 'lspci -vvv' and also try adding 'pcie_aspm=off' to the kernel command line? Cheers, Don
Created attachment 526170 [details] i added pcie_aspm=off in the grub
(In reply to comment #14) > Created attachment 526170 [details] > i added pcie_aspm=off in the grub Can you run lspci -vvv as root so I can see the capability flags too? Also did using pcie_aspm=off make the problem go away? Cheers, Don