| Summary: | Opps 0002 [#11] SMP, possible bug - 2.6.35.11-83.fc14.x86_64 | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Naoki <naoki> |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 14 | CC: | gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-08-29 17:35:14 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Naoki
2011-03-01 06:31:24 UTC
(In reply to comment #1) > [100451.853379] Oops: 0002 [#11] SMP ^^^ This is the 11th oops the machine has encountered since it was booted. What did the other oops messages look like? Hi Chuck, some more info below. But I should mention I've seen many, many, ECC errors coming up [255413.887870] ECC/ChipKill ECC error. [255413.909287] EDAC amd64 MC1: CE ERROR_ADDRESS= 0x29a98f4f0 [255413.942145] EDAC amd64 MC1: Failed to translate InputAddr to csrow for address 0x29a98f4f0 [255413.992149] EDAC MC1: CE - no information available: amd64_edac [255419.036310] Northbridge Error, node 1, core: 0 [255419.064142] ECC/ChipKill ECC error. [255419.085562] EDAC amd64 MC1: CE ERROR_ADDRESS= 0x292b8f4f0 [255419.118425] EDAC amd64 MC1: Failed to translate InputAddr to csrow for address 0x292b8f4f0 [255419.168433] EDAC MC1: CE - no information available: amd64_edac [255420.206042] Northbridge Error, node 1, core: 1 [255420.233892] ECC/ChipKill ECC error. [255420.255315] EDAC amd64 MC1: CE ERROR_ADDRESS= 0x2a118f4f8 [255420.288178] EDAC amd64 MC1: Failed to translate InputAddr to csrow for address 0x2a118f4f8 [255420.338184] EDAC MC1: CE - no information available: amd64_edac I'd like to figure out which DIMM is the problem but am stumped at the "Failed to translate" which leaves me in the dark a bit. # grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count /sys/devices/system/edac/mc/mc0/csrow4/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow4/ch1_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow2/ch0_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow2/ch1_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow3/ch0_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow3/ch1_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow4/ch0_ce_count:0 /sys/devices/system/edac/mc/mc1/csrow4/ch1_ce_count:0 [root.prod ~]# zgrep -i Oops /var/log/messages*gz /var/log/messages-20110306.gz:Mar 2 14:56:33 pdbsearch11 kernel: [84164.552150] Oops: 0002 [#1] SMP /var/log/messages-20110306.gz:Mar 2 14:56:33 pdbsearch11 kernel: [84167.120384] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 14:57:38 pdbsearch11 kernel: [84230.093109] Oops: 0002 [#2] SMP /var/log/messages-20110306.gz:Mar 2 14:57:39 pdbsearch11 kernel: [84232.580624] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 14:59:31 pdbsearch11 kernel: [84342.718259] Oops: 0002 [#3] SMP /var/log/messages-20110306.gz:Mar 2 14:59:31 pdbsearch11 kernel: [84345.349656] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84375.416134] Oops: 0002 [#4] SMP /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84375.462131] Oops: 0002 [#5] SMP /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84377.032740] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84378.492327] Oops: 0002 [#6] SMP /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84375.416134] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:00:10 pdbsearch11 kernel: [84383.558089] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:01:03 pdbsearch11 kernel: [84435.091327] Oops: 0002 [#7] SMP /var/log/messages-20110306.gz:Mar 2 15:01:03 pdbsearch11 kernel: [84437.345607] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:05:03 pdbsearch11 kernel: [84674.879419] Oops: 0002 [#8] SMP /var/log/messages-20110306.gz:Mar 2 15:05:03 pdbsearch11 kernel: [84677.133804] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:05:14 pdbsearch11 kernel: [84678.274255] Oops: 0002 [#9] SMP /var/log/messages-20110306.gz:Mar 2 15:05:14 pdbsearch11 kernel: [84678.511385] Oops: 0002 [#10] SMP /var/log/messages-20110306.gz:Mar 2 15:05:14 pdbsearch11 kernel: [84678.533136] Oops: 0002 [#11] SMP /var/log/messages-20110306.gz:Mar 2 15:05:14 pdbsearch11 kernel: [84679.933911] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:05:14 pdbsearch11 kernel: [84679.934021] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:05:14 pdbsearch11 kernel: [84679.934372] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:05:14 pdbsearch11 kernel: [84679.936170] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:05:15 pdbsearch11 kernel: [84688.601699] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 15:05:16 pdbsearch11 kernel: [84689.805922] [<ffffffff8146b22e>] oops_end+0xbf/0xc7 /var/log/messages-20110306.gz:Mar 2 14:56:33 pdbsearch11 kernel: [84164.527752] BUG: unable to handle kernel paging request at ffff8801d002c000 /var/log/messages-20110306.gz:Mar 2 14:56:33 pdbsearch11 kernel: [84166.547148] BUG: scheduling while atomic: snmpd/1382/0x10000001 /var/log/messages-20110306.gz:Mar 2 14:56:33 pdbsearch11 kernel: [84166.861164] [<ffffffff8103ffbe>] __schedule_bug+0x5f/0x64 /var/log/messages-20110306.gz:Mar 2 14:57:38 pdbsearch11 kernel: [84230.092206] BUG: unable to handle kernel paging request at ffff8801d002d000 /var/log/messages-20110306.gz:Mar 2 14:57:38 pdbsearch11 kernel: [84231.969912] BUG: scheduling while atomic: glusterfsd/1445/0x10000001 /var/log/messages-20110306.gz:Mar 2 14:57:38 pdbsearch11 kernel: [84232.289088] [<ffffffff8103ffbe>] __schedule_bug+0x5f/0x64 /var/log/messages-20110306.gz:Mar 2 14:59:31 pdbsearch11 kernel: [84342.717329] BUG: unable to handle kernel paging request at ffff8801d002e000 /var/log/messages-20110306.gz:Mar 2 14:59:31 pdbsearch11 kernel: [84344.777429] BUG: scheduling while atomic: ntpd/1406/0x10000001 /var/log/messages-20110306.gz:Mar 2 14:59:31 pdbsearch11 kernel: [84345.090420] [<ffffffff8103ffbe>] __schedule_bug+0x5f/0x64 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84375.410627] BUG: unable to handle kernel paging request at ffff8801d002f000 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84375.462131] BUG: unable to handle kernel paging request at ffff8801d0041000 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84377.032686] BUG: scheduling while atomic: munin-update/1698/0x10000001 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84377.032706] [<ffffffff8103ffbe>] __schedule_bug+0x5f/0x64 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84378.492306] BUG: unable to handle kernel paging request at ffff8801d0042000 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84375.416134] BUG: scheduling while atomic: crond/1693/0x10000002 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84375.416134] [<ffffffff8103ffbe>] __schedule_bug+0x5f/0x64 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84382.982250] BUG: scheduling while atomic: crond/1693/0x10000002 /var/log/messages-20110306.gz:Mar 2 15:00:09 pdbsearch11 kernel: [84383.296261] [<ffffffff8103ffbe>] __schedule_bug+0x5f/0x64 /var/log/messages-20110306.gz:Mar 2 15:01:03 pdbsearch11 kernel: [84435.090381] BUG: unable to handle kernel paging request at ffff8801d0030000 /var/log/messages-20110306.gz:Mar 2 15:01:03 pdbsearch11 kernel: [84436.772360] BUG: scheduling while atomic: crond/1703/0x10000002 /var/log/messages-20110306.gz:Mar 2 15:01:03 pdbsearch11 kernel: [84437.086367] [<ffffffff8103ffbe>] __schedule_bug+0x5f/0x64 /var/log/messages-20110306.gz:Mar 2 15:05:03 pdbsearch11 kernel: [84674.878472] BUG: unable to handle kernel paging request at ffff8801d0031000 /var/log/messages-20110306.gz:Mar 2 15:05:03 pdbsearch11 kernel: [84676.560558] BUG: scheduling while atomic: crond/1704/0x10000002 /var/log/messages-20110306.gz:Mar 2 15:05:03 pdbsearch11 kernel: [84676.874558] [<ffffffff8103ffbe>] __schedule_bug+0x5f/0x64 /var/log/messages-20110306.gz:Mar 2 15:05:14 pdbsearch11 kernel: [84678.274136] BUG: unable to handle kernel paging request at ffff8801d0032000 /var/log/messages-20110306.gz:Mar 2 15:05:14 pdbsearch11 kernel: [84678.511385] BUG: unable to handle kernel paging request at ffff8801d0043000 /var/log/messages-20110306.gz:Mar 2 15:05:14 pdbsearch11 kernel: [84678.533136] BUG: unable to handle kernel paging request at ffff8801d0044000 This appears to be hardware related. There isn't much we can do about this. |