Bug 169816
Summary: | SMP kernel randomly crashes when coming up on Pentium EM64T with hyperthreading enabled | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Karl Auerbach <karl> | ||||
Component: | kernel | Assignee: | Dave Jones <davej> | ||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4 | CC: | dan, pfrields, wtogami, wuwej | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-02-21 02:29:01 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Karl Auerbach
2005-10-03 23:38:41 UTC
can you try running memtest86 overnight, and see if that picks up anything ? Random crashes are often a sign of hardware problems. Also check that there's sufficient cooling, and a strong enough power supply. OK, I can try memtest - Which version of memtest would you like me to run (and might you have an bootable .iso for it [the machine does not have a floppy drive])? However, given that the problem occurs reliably and during boot-up when hyperthreading is enabled, it seems that it is probably unrelated to memory flaws. Also I was a bit overbroad when I said that the failures were random - there are several kinds of failures, but they seem to recur at the same several spots in the boot sequence. If memory were flakey then the system ought to be crashing in non-hyperthreading mode as well. However the box runs rock solid in non-hyperthread mode. As for cooling - the CPU is running at about 40C, stable. And the power supply is pretty mongo. This is a Supermicro server box so it's got reasonably studly engineering margins. I found an memtest iso at http://www.memtest86.com/ It's running now with default settings - it's gone through one full pass without errors so far. OK, memtest has run overnight (with hyperthreading enabled) - 85 full passes. Zero errors. So I think we can rule out memory and processor. I believe that we've got a problem related to the kernel's handling of the Pentium 4 EMT64 with hyperthreading. I am able to localize the problem down to a small set of kernel configs: With the following make kernel resident rather than loadable modules, the system comes up fine. (I suspect that CONFIG_SCSI_QLA2XXX snuck in by accident.) < CONFIG_SCSI=y --- > CONFIG_SCSI=m 773c773 < CONFIG_SCSI_SATA=y --- > CONFIG_SCSI_SATA=m 776c776 < CONFIG_SCSI_ATA_PIIX=y --- > CONFIG_SCSI_ATA_PIIX=m 800c800 < CONFIG_SCSI_QLA2XXX=y --- > CONFIG_SCSI_QLA2XXX=m 2.6.14-1.1637_FC4 has been released as an update for FC4. Please retest with this update, as a large amount of code has been changed in this release, which may have fixed your problem. Thank you. It's not a happy camper. On the intel P4/64 box the message "i8042.c: Can't read CTR while initializing i8042" still pops out on some boots and not on others. That same system still crashes frequently on the way down when rebooting (assuming that it managed to sucessfully come up.) Keyboard input from the hardware keyboard, both USB and PS2, seems lost on the Intel P4/64 box under both the SMP and uniprocessor versions of 1637. But on an AMD/64 dual-core box I get intermittent massive key bounce from 1637 (Keyboard operation returns to normal when I go back to the 1532 version kernel.) I've had to resume 1532 non-smp on the P4/64 box and am putting up with the keyboard bounce of 1637 for the moment on the AMD/64 dual-core box. Things get amazingly better on the 1632 kernel when I change the BIOS setting for the SATA to be AHCI rather than any of the other modes. With AHCI/SATA the system seems to run reliably, there are no 8042 complaints, the keyboard works. (There is still an intermittent crash when halting the kernel in the NFS unmount, but it seems to be in both the SMP and non-SMP kernels.) I can confirm the bug, completely the same behaviour. When HT enabled, the SMP kernel (2.6.14-1.1637_FC4) crashes. Motherboard MSI 945P NEO F CPU Intel Pentium 4 630 EM64T SATA drive I am not able to change SATA mode in BIOS (option is disabled), so I cannot check comment #8. I think that I see the same problem on my machine. When it crashes, it is usually during the module loading (Detecting hardware ... sound, network, ...) motherboard Intel 945PSN (sandusky) CPU Pentium 4 640 SATA HDD, IDE HDD on Promise Ultra66 Created attachment 122644 [details]
boot log with survived crash during module loading
there are three variants of boot - no problem, small crash with continuing and
a crash with kernel panic
With current FC4 kernel 2.6.14-1.1656_FC4smp is the situation still the same. This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you. I don't want to be too optimistic, but it looks like that I can boot without problems with enabled HT in kernels 2.6.15-1.1830_FC4 and 2.6.15-1.1831_FC4. I have not seen this bug for several FC4 kernel releases - the machine that originally had the problem is happily running 1831 with hyperthreading enabled without a hitch. great, thanks for the update. |