Red Hat Bugzilla – Bug 169816
SMP kernel randomly crashes when coming up on Pentium EM64T with hyperthreading enabled
Last modified: 2015-01-04 17:22:28 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc3 Firefox/1.0.7
Description of problem:
Machine: Supermicro 5014C-MF (P8SCT motherboard - http://www.supermicro.com/products/motherboard/P4/E7221/P8SCT.cfm )
CPU: Intel Pentium 4 640 EM64T, 3.2ghz, 800mhz FSB, 2MB L2 cache
Memory: 1G non ECC
During booting the kernel crashes randomly - sometimes it even comes up and runs.
Crashes range from simply hanging to stacktrace (I don't know how to capture it during boot-up). Stacktraces appear to be different from crash to crash.
Only happens with SMP kernel with Hyperthreading enabled; the same kernel runs fine when hyperthreading is turned off in the BIOS. Non-SMP kernel runs find no matter what the BIOS settings are.
Version-Release number of selected component (if applicable):
kernel 2.6.13-1 (1526_FC4SMP)
Steps to Reproduce:
1. Ensure that hyperthreading on Pentium EM64T is enabled
2. Boot SMP kernel
3. Wait for crash or hang (happens 90%+ of time)
I've gone through the liteny of BIOS settings - disabling legacy USB, etc, to see if anything else changes the situation. It seems entirely dependent on whether hyperthreading is enabled or not.
can you try running memtest86 overnight, and see if that picks up anything ?
Random crashes are often a sign of hardware problems.
Also check that there's sufficient cooling, and a strong enough power supply.
OK, I can try memtest - Which version of memtest would you like me to run (and
might you have an bootable .iso for it [the machine does not have a floppy drive])?
However, given that the problem occurs reliably and during boot-up when
hyperthreading is enabled, it seems that it is probably unrelated to memory flaws.
Also I was a bit overbroad when I said that the failures were random - there are
several kinds of failures, but they seem to recur at the same several spots in
the boot sequence.
If memory were flakey then the system ought to be crashing in non-hyperthreading
mode as well. However the box runs rock solid in non-hyperthread mode.
As for cooling - the CPU is running at about 40C, stable. And the power supply
is pretty mongo. This is a Supermicro server box so it's got reasonably studly
I found an memtest iso at http://www.memtest86.com/
It's running now with default settings - it's gone through one full pass without
errors so far.
OK, memtest has run overnight (with hyperthreading enabled) - 85 full passes.
So I think we can rule out memory and processor.
I believe that we've got a problem related to the kernel's handling of the
Pentium 4 EMT64 with hyperthreading.
I am able to localize the problem down to a small set of kernel configs:
With the following make kernel resident rather than loadable modules, the system
comes up fine. (I suspect that CONFIG_SCSI_QLA2XXX snuck in by accident.)
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.
It's not a happy camper.
On the intel P4/64 box the message "i8042.c: Can't read CTR while initializing
i8042" still pops out on some boots and not on others.
That same system still crashes frequently on the way down when rebooting
(assuming that it managed to sucessfully come up.)
Keyboard input from the hardware keyboard, both USB and PS2, seems lost on the
Intel P4/64 box under both the SMP and uniprocessor versions of 1637. But on an
AMD/64 dual-core box I get intermittent massive key bounce from 1637 (Keyboard
operation returns to normal when I go back to the 1532 version kernel.)
I've had to resume 1532 non-smp on the P4/64 box and am putting up with the
keyboard bounce of 1637 for the moment on the AMD/64 dual-core box.
Things get amazingly better on the 1632 kernel when I change the BIOS setting
for the SATA to be AHCI rather than any of the other modes.
With AHCI/SATA the system seems to run reliably, there are no 8042 complaints,
the keyboard works.
(There is still an intermittent crash when halting the kernel in the NFS
unmount, but it seems to be in both the SMP and non-SMP kernels.)
I can confirm the bug, completely the same behaviour. When HT enabled, the SMP
kernel (2.6.14-1.1637_FC4) crashes.
Motherboard MSI 945P NEO F
CPU Intel Pentium 4 630 EM64T
I am not able to change SATA mode in BIOS (option is disabled), so I cannot
check comment #8.
I think that I see the same problem on my machine. When it crashes, it is
usually during the module loading (Detecting hardware ... sound, network, ...)
motherboard Intel 945PSN (sandusky)
CPU Pentium 4 640
SATA HDD, IDE HDD on Promise Ultra66
Created attachment 122644 [details]
boot log with survived crash during module loading
there are three variants of boot - no problem, small crash with continuing and
a crash with kernel panic
With current FC4 kernel 2.6.14-1.1656_FC4smp is the situation still the same.
This is a mass-update to all currently open kernel bugs.
A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
I don't want to be too optimistic, but it looks like that I can boot without
problems with enabled HT in kernels 2.6.15-1.1830_FC4 and 2.6.15-1.1831_FC4.
I have not seen this bug for several FC4 kernel releases - the machine that
originally had the problem is happily running 1831 with hyperthreading enabled
without a hitch.
great, thanks for the update.