Bug 732584
Summary: | dazed and confused kernel | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mohammed Arafa <bugzilla> | ||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 15 | CC: | dzickus, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, piruthiviraj, tjwhaynes | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2012-06-06 19:00:32 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Mohammed Arafa
2011-08-23 01:47:31 UTC
(In reply to comment #0) > Description of problem: > > i do not understand this message nor does google: > > Message from syslogd@desktop at Aug 23 00:32:52 ... > kernel:[173439.660918] NMI: PCI system error (SERR) for reason b1 on CPU 0. > > Message from syslogd@desktop at Aug 23 00:32:52 ... > kernel:[173439.660924] Dazed and confused, but trying to continue Is there anything else in /var/log/messages around these lines? Like a backtrace, etc? Can you attach the output of 'lspci -vvv'? That might point to the device that triggered the NMI. Cheers, Don Created attachment 520198 [details]
lspci output
@Josh this wasnt a file this was a wall message. and it had the same message for each core on the cpu (In reply to comment #3) > Created attachment 520198 [details] > lspci output Hi Mohammed, Thanks fr the output. I realized I forgot to ask for the 'lspci -t' output to (you don't need to reproduce the error to get this). This output shows how the devices and bridges are connected. There seems to be a couple of pcie bridges generating some PCI errors, just trying to figure out which device(s) they belong to. Cheers, Don output of lspci -t -[0000:00]-+-00.0 +-02.0 +-16.0 +-19.0 +-1a.0 +-1b.0 +-1c.0-[02]-- +-1c.1-[03]----00.0 +-1c.3-[05-0c]-- +-1c.4-[0d]--+-00.0 | \-00.3 +-1d.0 +-1f.0 +-1f.2 \-1f.3 Hmm, I don't see anything obvious. A couple of Master Aborts and Correctable Errors from the Firewire and SDHCI controller, but I can't see that setting an NMI. What were/are you doing leading up to that message? Can you attach a 'dmesg' output, to see if there is some other messages that lead up to it (ie a network or storage issue). Though a lot of these messages I see are from video or scsi cards. It doesn't seem you have a scsi card, so I wouldn't be surprised if it was video related. Cheers, Don Created attachment 522742 [details]
lspci -vvv output from a Thinkpad T60p
Also seen on my Thinkpad T60p (lspci -vvv output above), apparently triggered when unlocking the screensaver (blank screen). Message from syslogd@(none) at Sep 12 13:03:48 ... kernel:[12015.443028] NMI: PCI system error (SERR) for reason b1 on CPU 0. Message from syslogd@(none) at Sep 12 13:03:48 ... kernel:[12015.443028] Dazed and confused, but trying to continue [root@nexus6wifi ~]# cat /sys/module/pcie_aspm/parameters/policy default performance [powersave] [root@nexus6wifi ~]# lspci -t -[0000:00]-+-00.0 +-01.0-[01]----00.0 +-1b.0 +-1c.0-[02]----00.0 +-1c.1-[03]----00.0 +-1c.2-[04-0b]-- +-1c.3-[0c-13]-- +-1d.0 +-1d.1 +-1d.2 +-1d.3 +-1d.7 +-1e.0-[15-18]----00.0 +-1f.0 +-1f.1 +-1f.2 \-1f.3 Could it be related to this discussion on lkml? https://lkml.org/lkml/2011/3/20/102 The only thing I can see that might be giving problems is the wireless card, but that is just a Master Abort and may not be the real reason. The link you mentioned above should be included in 2.6.39. Not sure what version you have. Otherwise setting "pcie_aspm=off" might have the same effect. Cheers, Don don at the time of the wall message i did look in to /var/log/messages and could not find anything. i did not look at the dmesg output. and since then, it did not reoccur. i do not exactly remember exactly what i was doing to cause this error. i have this same error in fedora 16 too... [root@f16dell xfoss]# Message from syslogd@f16dell at Oct 1 18:52:19 ... kernel:[12562.165361] NMI: PCI system error (SERR) for reason b1 on CPU 0. Message from syslogd@f16dell at Oct 1 18:52:19 ... kernel:[12562.165374] Dazed and confused, but trying to continue (In reply to comment #12) > i have this same error in fedora 16 too... > > [root@f16dell xfoss]# > Message from syslogd@f16dell at Oct 1 18:52:19 ... > kernel:[12562.165361] NMI: PCI system error (SERR) for reason b1 on CPU 0. > > Message from syslogd@f16dell at Oct 1 18:52:19 ... > kernel:[12562.165374] Dazed and confused, but trying to continue Can you attach the output of 'lspci -vvv' and also try adding 'pcie_aspm=off' to the kernel command line? Cheers, Don Created attachment 526170 [details]
i added pcie_aspm=off in the grub
(In reply to comment #14) > Created attachment 526170 [details] > i added pcie_aspm=off in the grub Can you run lspci -vvv as root so I can see the capability flags too? Also did using pcie_aspm=off make the problem go away? Cheers, Don |