Bug 124949
Summary: | System freezes during regular use when using SMP kernel | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Cushing Whitney <cushing> | ||||||||||
Component: | kernel | Assignee: | Dave Jones <davej> | ||||||||||
Status: | CLOSED CANTFIX | QA Contact: | |||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 3 | CC: | alan, len.brown, pfrields, starback | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | i386 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2005-10-03 00:20:23 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Cushing Whitney
2004-06-01 15:56:29 UTC
Created attachment 100743 [details]
output of lspci -v
Created attachment 100744 [details]
dmesg output when running with UP kernel
Created attachment 100745 [details]
dmesg output when running with SMP kernel
Created attachment 100747 [details]
Annotated syslog output
This annotated output from syslog spans the system booting into the SMP kernel,
freezing a few minutes later, and then being rebooted into the UP kernel.
installed kernel 2.6.6-1.427. SMP kernel still hangs while playing music or during screensavers. UP kernel still runs flawlessly. My Athlon MP box throws a fit if it isnt booted with acpi=off, but doesn't get as far as yours without that. Might be worth a first try though Thanks, that helped a bit. acpi=off by itself had no effect, but combined with noapic, it increased stability somewhat. Crashes now seem to be occurring within 12-24 hours when using the SMP kernel as opposed to within 1-2 hours before. Also, upgrade to 2.6.6-1.435.2.3 had no effect. I have a similar problem running the smp version of kernel-smp-2.6.5-1.358 on my Pentium 4 with hyperthreading. However, I did not get a hard freeze. I first noticed it when my keyboard stopped working. It took me a while to realize that it was not my wireless keyboard flaking out, but instead some code in a deadlock or infinite loop. It looked to me as though one of the "processors" was stuck a %100 executing something. However, I could still do any normal operations, except type. I believe too that other processes scheduled to use the busy CPU were also sleeping waiting for some CPU time. The solution to the problem is to reboot. Although, a couple times, the keyboard would start working again which is why I initially thought it was my keyboard. I have since been using the non-smp kernel, but it tends to freeze hard every once in a while, which is not that helpful. Alex can you file yours as a separate bug - the two don't initially sound related bugs. In the new bug if you can attach the output of lspci -v that would also be useful. Finally if the board is Intel E75xx based you might want to try turning off USB legacy support in the bios and/or booting with acpi=off. I don't think this one is acpi however Just an additional data point, I tried using nmi_watchdog to force an oops in case the processor was locking up. However, setting nmi_watchdog=2 in the kernel startup options didn't generate anything (oops or not) when the system froze. Installed 2.6.8-1.521. Still seeing same problem with SMP kernel and no problem with UP kernel... My system freezes too.... :-( I'm running FC2 on a dual Xeon 2Ghz, SE7500CW2 motherboard server based. Kernel 2.6.8-1.610smp with noapic acpi=ht. I got the messages below in the error log right afer booting: kernel: SMP mptable: checksum error! kernel: BIOS bug, MP table errors detected!... kernel: ... disabling SMP support. (tell your hw vendor) After 3 days running fine, it hung 10 minutes ago. I got no error messages in log regarding this crash.. Re: SE7500CW2 > SMP mptable: checksum error! Please verify that you're running the latest BIOS: http://support.intel.com/support/motherboards/server/se7500cw2/ If you still have a problem, you'll probably want to file a separate bug, as it is unlikely you've got the same problem as Cushing. Also, if you need either "noapic" or "acpi=ht" to make your machine run properly, that is also a bug. Installed 2.6.9-1.6_FC2. Basically the same results, but with one new (interesting?) datapoint: 2.6.9-1.6_FC2 - fine 2.6.9-1.6_FC2 noapic acpi=off - fine 2.6.9-1.6_FC2smp - freeze 2.6.9-1.6_FC2smp noapic acpi=off - freeze However, I just found out about the maxcpus and nosmp kernel boot params. Just to test, I tried 2.6.9-1.6_FC2smp with maxcpus=1. Even without the the noapic and acpi=off directives, the result was a stable system with no freezes (albeit with only one processor running). Is this important, or does maxcpus=1 just end up recreating the equivalent of a UP kernel? I plan on testing with the nosmp directive this weekend. Upgraded to Fedora Core 3. Still had the same problem with my default setup. However, upon further research, This is probably an issue with NFS. One detail that I hadn't mentioned before (since it didn't seem relevant) is that my MP3 are shared from my home server via NFS. After noticing reports of SMP-unsafe behavior in NFS, I decided to try my system without any mounts in the equation. The system is totally stable (2+ days so far) when playing MP3s off of a local disk, versus lockups within 2-3 hours when being retrieved over NFS. The NFS server is running RH7.3. Any ideas about how to get back my mounts without sacrificing stability? Is CIFS more stable under SMP than NFS? An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you. This bug has been automatically closed as part of a mass update. It had been in NEEDINFO state since July 2005. If this bug still exists in current errata kernels, please reopen this bug. There are a large number of inactive bugs in the database, and this is the only way to purge them. Thank you. |