From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc3 Firefox/1.0.7 Description of problem: Machine: Supermicro 5014C-MF (P8SCT motherboard - http://www.supermicro.com/products/motherboard/P4/E7221/P8SCT.cfm ) CPU: Intel Pentium 4 640 EM64T, 3.2ghz, 800mhz FSB, 2MB L2 cache Memory: 1G non ECC Drive: SATA During booting the kernel crashes randomly - sometimes it even comes up and runs. Crashes range from simply hanging to stacktrace (I don't know how to capture it during boot-up). Stacktraces appear to be different from crash to crash. Only happens with SMP kernel with Hyperthreading enabled; the same kernel runs fine when hyperthreading is turned off in the BIOS. Non-SMP kernel runs find no matter what the BIOS settings are. Version-Release number of selected component (if applicable): kernel 2.6.13-1 (1526_FC4SMP) How reproducible: Always Steps to Reproduce: 1. Ensure that hyperthreading on Pentium EM64T is enabled 2. Boot SMP kernel 3. Wait for crash or hang (happens 90%+ of time) Additional info: I've gone through the liteny of BIOS settings - disabling legacy USB, etc, to see if anything else changes the situation. It seems entirely dependent on whether hyperthreading is enabled or not.
can you try running memtest86 overnight, and see if that picks up anything ? Random crashes are often a sign of hardware problems. Also check that there's sufficient cooling, and a strong enough power supply.
OK, I can try memtest - Which version of memtest would you like me to run (and might you have an bootable .iso for it [the machine does not have a floppy drive])? However, given that the problem occurs reliably and during boot-up when hyperthreading is enabled, it seems that it is probably unrelated to memory flaws. Also I was a bit overbroad when I said that the failures were random - there are several kinds of failures, but they seem to recur at the same several spots in the boot sequence. If memory were flakey then the system ought to be crashing in non-hyperthreading mode as well. However the box runs rock solid in non-hyperthread mode. As for cooling - the CPU is running at about 40C, stable. And the power supply is pretty mongo. This is a Supermicro server box so it's got reasonably studly engineering margins.
I found an memtest iso at http://www.memtest86.com/ It's running now with default settings - it's gone through one full pass without errors so far.
OK, memtest has run overnight (with hyperthreading enabled) - 85 full passes. Zero errors. So I think we can rule out memory and processor. I believe that we've got a problem related to the kernel's handling of the Pentium 4 EMT64 with hyperthreading.
I am able to localize the problem down to a small set of kernel configs: With the following make kernel resident rather than loadable modules, the system comes up fine. (I suspect that CONFIG_SCSI_QLA2XXX snuck in by accident.) < CONFIG_SCSI=y --- > CONFIG_SCSI=m 773c773 < CONFIG_SCSI_SATA=y --- > CONFIG_SCSI_SATA=m 776c776 < CONFIG_SCSI_ATA_PIIX=y --- > CONFIG_SCSI_ATA_PIIX=m 800c800 < CONFIG_SCSI_QLA2XXX=y --- > CONFIG_SCSI_QLA2XXX=m
2.6.14-1.1637_FC4 has been released as an update for FC4. Please retest with this update, as a large amount of code has been changed in this release, which may have fixed your problem. Thank you.
It's not a happy camper. On the intel P4/64 box the message "i8042.c: Can't read CTR while initializing i8042" still pops out on some boots and not on others. That same system still crashes frequently on the way down when rebooting (assuming that it managed to sucessfully come up.) Keyboard input from the hardware keyboard, both USB and PS2, seems lost on the Intel P4/64 box under both the SMP and uniprocessor versions of 1637. But on an AMD/64 dual-core box I get intermittent massive key bounce from 1637 (Keyboard operation returns to normal when I go back to the 1532 version kernel.) I've had to resume 1532 non-smp on the P4/64 box and am putting up with the keyboard bounce of 1637 for the moment on the AMD/64 dual-core box.
Things get amazingly better on the 1632 kernel when I change the BIOS setting for the SATA to be AHCI rather than any of the other modes. With AHCI/SATA the system seems to run reliably, there are no 8042 complaints, the keyboard works. (There is still an intermittent crash when halting the kernel in the NFS unmount, but it seems to be in both the SMP and non-SMP kernels.)
I can confirm the bug, completely the same behaviour. When HT enabled, the SMP kernel (2.6.14-1.1637_FC4) crashes. Motherboard MSI 945P NEO F CPU Intel Pentium 4 630 EM64T SATA drive I am not able to change SATA mode in BIOS (option is disabled), so I cannot check comment #8.
I think that I see the same problem on my machine. When it crashes, it is usually during the module loading (Detecting hardware ... sound, network, ...) motherboard Intel 945PSN (sandusky) CPU Pentium 4 640 SATA HDD, IDE HDD on Promise Ultra66
Created attachment 122644 [details] boot log with survived crash during module loading there are three variants of boot - no problem, small crash with continuing and a crash with kernel panic
With current FC4 kernel 2.6.14-1.1656_FC4smp is the situation still the same.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
I don't want to be too optimistic, but it looks like that I can boot without problems with enabled HT in kernels 2.6.15-1.1830_FC4 and 2.6.15-1.1831_FC4.
I have not seen this bug for several FC4 kernel releases - the machine that originally had the problem is happily running 1831 with hyperthreading enabled without a hitch.
great, thanks for the update.