Bug 732584 - dazed and confused kernel
Summary: dazed and confused kernel
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 15
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-23 01:47 UTC by Mohammed Arafa
Modified: 2012-06-06 19:00 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-06 19:00:32 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
lspci output (26.28 KB, text/plain)
2011-08-28 03:24 UTC, Mohammed Arafa
no flags Details
lspci -vvv output from a Thinkpad T60p (30.91 KB, text/plain)
2011-09-12 17:23 UTC, Toby Haynes
no flags Details
i added pcie_aspm=off in the grub (11.92 KB, text/plain)
2011-10-04 03:21 UTC, Piruthiviraj Natarajan
no flags Details

Description Mohammed Arafa 2011-08-23 01:47:31 UTC
Description of problem:

i do not understand this message nor does google:

Message from syslogd@desktop at Aug 23 00:32:52 ...
 kernel:[173439.660918] NMI: PCI system error (SERR) for reason b1 on CPU 0.

Message from syslogd@desktop at Aug 23 00:32:52 ...
 kernel:[173439.660924] Dazed and confused, but trying to continue

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Josh Boyer 2011-08-23 10:50:52 UTC
(In reply to comment #0)
> Description of problem:
> 
> i do not understand this message nor does google:
> 
> Message from syslogd@desktop at Aug 23 00:32:52 ...
>  kernel:[173439.660918] NMI: PCI system error (SERR) for reason b1 on CPU 0.
> 
> Message from syslogd@desktop at Aug 23 00:32:52 ...
>  kernel:[173439.660924] Dazed and confused, but trying to continue

Is there anything else in /var/log/messages around these lines?  Like a backtrace, etc?

Comment 2 Don Zickus 2011-08-24 11:41:46 UTC
Can you attach the output of 'lspci -vvv'?  That might point to the device that triggered the NMI.

Cheers,
Don

Comment 3 Mohammed Arafa 2011-08-28 03:24:29 UTC
Created attachment 520198 [details]
lspci output

Comment 4 Mohammed Arafa 2011-08-28 03:28:30 UTC
@Josh 
this wasnt a file this was a wall message. and it had the same message for each core on the cpu

Comment 5 Don Zickus 2011-08-30 17:09:42 UTC
(In reply to comment #3)
> Created attachment 520198 [details]
> lspci output

Hi Mohammed,

Thanks fr the output.  I realized I forgot to ask for the 'lspci -t' output to (you don't need to reproduce the error to get this).  This output shows how the devices and bridges are connected.  There seems to be a couple of pcie bridges generating some PCI errors, just trying to figure out which device(s) they belong to.

Cheers,
Don

Comment 6 Mohammed Arafa 2011-09-03 07:03:01 UTC
output of lspci -t
-[0000:00]-+-00.0
           +-02.0
           +-16.0
           +-19.0
           +-1a.0
           +-1b.0
           +-1c.0-[02]--
           +-1c.1-[03]----00.0
           +-1c.3-[05-0c]--
           +-1c.4-[0d]--+-00.0
           |            \-00.3
           +-1d.0
           +-1f.0
           +-1f.2
           \-1f.3

Comment 7 Don Zickus 2011-09-06 12:17:01 UTC
Hmm, I don't see anything obvious.  A couple of Master Aborts and Correctable Errors from the Firewire and SDHCI controller, but I can't see that setting an NMI.  

What were/are you doing leading up to that message?  Can you attach a 'dmesg' output, to see if there is some other messages that lead up to it (ie a network or storage issue).  Though a lot of these messages I see are from video or scsi cards.  It doesn't seem you have a scsi card, so I wouldn't be surprised if it was video related.

Cheers,
Don

Comment 8 Toby Haynes 2011-09-12 17:23:29 UTC
Created attachment 522742 [details]
lspci -vvv output from a Thinkpad T60p

Comment 9 Toby Haynes 2011-09-12 17:26:06 UTC
Also seen on my Thinkpad T60p (lspci -vvv output above), apparently triggered when unlocking the screensaver (blank screen).

Message from syslogd@(none) at Sep 12 13:03:48 ...
 kernel:[12015.443028] NMI: PCI system error (SERR) for reason b1 on CPU 0.

Message from syslogd@(none) at Sep 12 13:03:48 ...
 kernel:[12015.443028] Dazed and confused, but trying to continue

[root@nexus6wifi ~]# cat /sys/module/pcie_aspm/parameters/policy
default performance [powersave] 
[root@nexus6wifi ~]# lspci -t
-[0000:00]-+-00.0
           +-01.0-[01]----00.0
           +-1b.0
           +-1c.0-[02]----00.0
           +-1c.1-[03]----00.0
           +-1c.2-[04-0b]--
           +-1c.3-[0c-13]--
           +-1d.0
           +-1d.1
           +-1d.2
           +-1d.3
           +-1d.7
           +-1e.0-[15-18]----00.0
           +-1f.0
           +-1f.1
           +-1f.2
           \-1f.3

Could it be related to this discussion on lkml?

https://lkml.org/lkml/2011/3/20/102

Comment 10 Don Zickus 2011-09-12 18:42:02 UTC
The only thing I can see that might be giving problems is the wireless card, but that is just a Master Abort and may not be the real reason.

The link you mentioned above should be included in 2.6.39.  Not sure what version you have.

Otherwise setting "pcie_aspm=off" might have the same effect.

Cheers,
Don

Comment 11 Mohammed Arafa 2011-09-13 11:36:54 UTC
don

at the time of the wall message i did look in to /var/log/messages and could not find anything. i did not look at the dmesg output. 

and since then, it did not reoccur. 

i do not exactly remember exactly what i was doing to cause this error.

Comment 12 Piruthiviraj Natarajan 2011-10-01 15:14:41 UTC
i have this same error in fedora 16 too...

[root@f16dell xfoss]# 
Message from syslogd@f16dell at Oct  1 18:52:19 ...
 kernel:[12562.165361] NMI: PCI system error (SERR) for reason b1 on CPU 0.

Message from syslogd@f16dell at Oct  1 18:52:19 ...
 kernel:[12562.165374] Dazed and confused, but trying to continue

Comment 13 Don Zickus 2011-10-03 14:01:04 UTC
(In reply to comment #12)
> i have this same error in fedora 16 too...
> 
> [root@f16dell xfoss]# 
> Message from syslogd@f16dell at Oct  1 18:52:19 ...
>  kernel:[12562.165361] NMI: PCI system error (SERR) for reason b1 on CPU 0.
> 
> Message from syslogd@f16dell at Oct  1 18:52:19 ...
>  kernel:[12562.165374] Dazed and confused, but trying to continue

Can you attach the output of 'lspci -vvv' and also try adding 'pcie_aspm=off' to the kernel command line?

Cheers,
Don

Comment 14 Piruthiviraj Natarajan 2011-10-04 03:21:32 UTC
Created attachment 526170 [details]
i added pcie_aspm=off in the grub

Comment 15 Don Zickus 2011-10-04 15:42:05 UTC
(In reply to comment #14)
> Created attachment 526170 [details]
> i added pcie_aspm=off in the grub

Can you run lspci -vvv as root so I can see the capability flags too?  Also did using pcie_aspm=off make the problem go away?

Cheers,
Don


Note You need to log in before you can comment on or make changes to this bug.