Bug 187470
Summary: | kernel-smp-2.6.16-1.2069_FC4 crashes with invalid opcode | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Piotr Gackiewicz <gacek> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5 | CC: | adrian, axel.thimm, bookreviewer, jonstanley, wtogami | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | MassClosed | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-01-20 04:38:54 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Piotr Gackiewicz
2006-03-31 07:32:24 UTC
I have a mail server which isn't coping with the 2.6.16 kernel upgrade. First it printed warnings about SELinux filesystem labels being missing and quota files being missing, and hung rebuilding those. We rebooted with SELinux disabled and it ran ok for a bit over an hour before crashing with a kernel panic. So we've reverted to 2.6.15-1.1833 for the moment. I can't get much debug info as it's a production server. I've had a second FC4 server barf on 2.6.16-1.2069smp - it starts printing errors immediately after the Nash message, along with a "resume in 119 seconds... resume in 118 seconds" countdown. I've never seen that before. The new kernel runs fine on my desktop - the only unusual thing I can think of about our servers is that they both have everything on i2o raid arrays. I had the same error (invalid opcode: 0000 [1] SMP) today upgrading to kernel-smp-2.6.16-1.2096_FC5. The machine is a quad Xeon with 6GB RAM and also an i2o raid arrays. Last entry of /proc/cpuinfo: processor : 7 vendor_id : GenuineIntel cpu family : 15 model : 1 model name : Intel(R) Xeon(TM) CPU 1.40GHz stepping : 1 cpu MHz : 1400.222 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 2800.17 Danny Yee, your bug is probable something else. Please take a digital photo of the screen and enter a new bugzilla. Adrian, the key line from the first bug report was: "Kernel BUG at include/linux/list.h:168" Does your crash have that line? list.h line 168 is list_del(). Probably there is some kind of race condition. I have the "Kernel BUG at include/linux/list.h:168" line in my oops. I should probably also mention that the error happens during bringup of eth0 (e1000). At least that is the last message of the initscripts before the crash. The way to verify for sure whether it's the e1000 is to boot with the kernel parameter "single". In single user mode rename your e1000.ko file to e1000.ko.ORIG. cd /lib/modules/kernel-smp-2.6.16-1.2096_FC5/kernel/drivers/net/e1000/ mv e1000.ko e1000.ko.ORIG Start normal startup by typing `init 3`. If it still crashes you know that it's not an e1000 bug. This is a production server and I am not very often at the location the server is hosted. So I cannot do this test very soon if at all. Created attachment 128870 [details]
photo of console after crash on boot
I still can't get any 2.6.16 kernel to work. I've just tried it with 2.6.16-1.2107_FC4, 2.6.16-1.2107_FC4smp, 2.6.16-1.2108_FC4, and 2.6.16-1.2108_FC4smp. Danny Lee, I still think you're bug is not related to the list_del() race condition. Please create a new bugzilla entry. Post the dmesg from 2.6.15 and your lspci and what motherboard you are using and attach that photo again. kernel-smp-2.6.17-1.2142_FC4 seems to run OK. I had a look on ChangeLog, but did not spot any changes related to this bug. Moreover, I did upgrade BIOS on my Tyan motherboard. Can someone comment that? Was that faulty Tyan BIOS or SMP race as someone suspected? It was a bug in the i2o code. It got fixed. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=189570 Oddly enough Danny Yee did have the same bug as you did. As soon as I saw his dmesg I realized it was the i2o issue... https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=191357 [This comment added as part of a mass-update to all open FC4 kernel bugs] FC4 has now transitioned to the Fedora legacy project, which will continue to release security related updates for the kernel. As this bug is not security related, it is unlikely to be fixed in an update for FC4, and has been migrated to FC5. Please retest with Fedora Core 5. Thank you. A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you. (this is a mass-close to kernel bugs in NEEDINFO state) As indicated previously there has been no update on the progress of this bug therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue still occurs for you and I will try to assist in its resolution. Thank you for taking the time to report the initial bug. If you believe that this bug was closed in error, please feel free to reopen this bug. |