Bug 124949 - System freezes during regular use when using SMP kernel
Summary: System freezes during regular use when using SMP kernel
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 3
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-06-01 15:56 UTC by Cushing Whitney
Modified: 2015-01-04 22:06 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-03 00:20:23 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
output of lspci -v (4.17 KB, text/plain)
2004-06-01 15:57 UTC, Cushing Whitney
no flags Details
dmesg output when running with UP kernel (10.78 KB, text/plain)
2004-06-01 16:01 UTC, Cushing Whitney
no flags Details
dmesg output when running with SMP kernel (14.98 KB, text/plain)
2004-06-01 16:09 UTC, Cushing Whitney
no flags Details
Annotated syslog output (48.15 KB, text/plain)
2004-06-01 16:27 UTC, Cushing Whitney
no flags Details

Description Cushing Whitney 2004-06-01 15:56:29 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040518 Firefox/0.8

Description of problem:
When running under normal desktop load using the SMP-enabled kernel,
the system freezes on a regular basis. The system becomes complete
frozen (won't respond to input, pings, ctrl-alt-del, etc) and requires
a power cycle to recover. The freezes seems to occur most often when
either one of the following two things occur.

1. Rhythmbox is playing MP3s
2. Xflame screensaver is running

The system will usually operate for some random period of time
(usually between 15 minutes and an hour) before freezing. 

System Specs:
Gigabyte GA-7DPXDW+ motherboard (AMD 760MPX chipset)
2x Athlon MP 2400+
2x 512 MB Registered/ECC dimms
80 GB Seagate ATA-100 drive on primary IDE channel
USB logitech keyboard
USB MS optical mouse

Troubleshooting information:

When the system freezes, there is no indication in the syslog of any
problem (no oops or other error message).

Attempted fixes:
I tried making sure there was a PS/2 mouse attached to the machine (as
suggested in
http://www.uwsg.iu.edu/hypermail/linux/kernel/0209.3/0382.html) but it
had no effect. Neither did booting with 'noapic'. However, booting the
UP kernel results in a machine that can play MP3s in rhythmbox for
hours (36 so far) without freezing.


Version-Release number of selected component (if applicable):
kernel-smp-2.6.5-1.358

How reproducible:
Always

Steps to Reproduce:
1. Boot SMP kernel
2. Start rhythmbox or set xflame as screensaver
3. Wait for hang
    

Actual Results:  System hangs

Expected Results:  System doesn't hang

Additional info:

Comment 1 Cushing Whitney 2004-06-01 15:57:44 UTC
Created attachment 100743 [details]
output of lspci -v

Comment 2 Cushing Whitney 2004-06-01 16:01:24 UTC
Created attachment 100744 [details]
dmesg output when running with UP kernel

Comment 3 Cushing Whitney 2004-06-01 16:09:43 UTC
Created attachment 100745 [details]
dmesg output when running with SMP kernel

Comment 4 Cushing Whitney 2004-06-01 16:27:53 UTC
Created attachment 100747 [details]
Annotated syslog output

This annotated output from syslog spans the system booting into the SMP kernel,
freezing a few minutes later, and then being rebooted into the UP kernel.

Comment 5 Cushing Whitney 2004-06-13 21:07:31 UTC
installed kernel 2.6.6-1.427. SMP kernel still hangs while playing
music or during screensavers. UP kernel still runs flawlessly.


Comment 6 Alan Cox 2004-06-19 12:24:57 UTC
My Athlon MP box throws a fit if it isnt booted with acpi=off, but
doesn't get as far as yours without that.  Might be worth a first try
though


Comment 7 Cushing Whitney 2004-07-05 21:44:32 UTC
Thanks, that helped a bit. acpi=off by itself had no effect, but
combined with noapic, it increased stability somewhat. Crashes now
seem to be occurring within 12-24 hours when using the SMP kernel as
opposed to within 1-2 hours before. Also, upgrade to 2.6.6-1.435.2.3
had no effect.

Comment 8 Alex Ward 2004-07-14 15:03:34 UTC
I have a similar problem running the smp version of kernel-smp-2.6.5-1.358 on my 
Pentium 4 with hyperthreading.  However, I did not get a hard freeze.  I first noticed it 
when my keyboard stopped working.  It took me a while to realize that it was not my 
wireless keyboard flaking out, but instead some code in a deadlock or infinite loop.  It 
looked to me as though one of the "processors"  was stuck a %100 executing something.  
However, I could still do any normal operations, except type.  I believe too that other 
processes scheduled to use the busy CPU were also sleeping waiting for some CPU time.  
The solution to the problem is to reboot. Although, a couple times, the keyboard would 
start working again which is why I initially thought it was my keyboard.  I have since been 
using the non-smp kernel, but it tends to freeze hard every once in a while, which is not 
that helpful. 

Comment 9 Alan Cox 2004-07-14 15:19:18 UTC
Alex can you file yours as a separate bug - the two don't initially
sound related bugs.
In the new bug if you can attach the output of lspci -v that would
also be useful. Finally if the board is Intel E75xx based you might
want to try turning off USB legacy support in the bios and/or booting
with acpi=off. I don't think this one is acpi however


Comment 10 Cushing Whitney 2004-07-29 16:40:53 UTC
Just an additional data point, I tried using nmi_watchdog to force an
oops in case the processor was locking up. However, setting
nmi_watchdog=2 in the kernel startup options didn't generate anything
(oops or not) when the system froze.

Comment 11 Cushing Whitney 2004-08-30 14:33:57 UTC
Installed 2.6.8-1.521. Still seeing same problem with SMP kernel and
no problem with UP kernel...

Comment 12 Marcelo 2004-10-22 14:36:29 UTC
My system freezes too.... :-(

I'm running FC2 on a dual Xeon 2Ghz, SE7500CW2 motherboard server
based. Kernel 2.6.8-1.610smp with noapic acpi=ht.

I got the messages below in the error log right afer booting:

kernel: SMP mptable: checksum error!
kernel: BIOS bug, MP table errors detected!...
kernel: ... disabling SMP support. (tell your hw vendor)

After 3 days running fine, it hung 10 minutes ago.
I got no error messages in log regarding this crash..

Comment 13 Len Brown 2004-11-30 17:17:08 UTC
Re: SE7500CW2
> SMP mptable: checksum error!

Please verify that you're running the latest BIOS:
http://support.intel.com/support/motherboards/server/se7500cw2/

If you still have a problem, you'll probably want to file
a separate bug, as it is unlikely you've got the same problem
as Cushing.

Also, if you need either "noapic" or "acpi=ht" to make
your machine run properly, that is also a bug.


Comment 14 Cushing Whitney 2004-12-07 17:22:38 UTC
Installed 2.6.9-1.6_FC2. Basically the same results, but with one new
(interesting?) datapoint:

2.6.9-1.6_FC2 - fine
2.6.9-1.6_FC2 noapic acpi=off - fine
2.6.9-1.6_FC2smp - freeze
2.6.9-1.6_FC2smp noapic acpi=off - freeze

However, I just found out about the maxcpus and nosmp kernel boot
params. Just to test, I tried 2.6.9-1.6_FC2smp with maxcpus=1. Even
without the the noapic and acpi=off directives, the result was a
stable system with no freezes (albeit with only one processor
running). Is this important, or does maxcpus=1 just end up recreating
the equivalent of a UP kernel? 

I plan on testing with the nosmp directive this weekend.


Comment 15 Cushing Whitney 2005-03-29 15:44:52 UTC
Upgraded to Fedora Core 3. Still had the same problem with my default setup. 

However, upon further research, This is probably an issue with NFS. One detail
that I hadn't mentioned before (since it didn't seem relevant) is that my MP3
are shared from my home server via NFS. After noticing reports of SMP-unsafe
behavior in NFS, I decided to try my system without any mounts in the equation. 

The system is totally stable (2+ days so far) when playing MP3s off of a local
disk, versus lockups within 2-3 hours when being retrieved over NFS. The NFS
server is running RH7.3.

Any ideas about how to get back my mounts without sacrificing stability? Is CIFS
more stable under SMP than NFS?



Comment 16 Dave Jones 2005-07-15 19:39:30 UTC
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 17 Dave Jones 2005-10-03 00:20:23 UTC
This bug has been automatically closed as part of a mass update.
It had been in NEEDINFO state since July 2005.
If this bug still exists in current errata kernels, please reopen this bug.

There are a large number of inactive bugs in the database, and this is the only
way to purge them.

Thank you.


Note You need to log in before you can comment on or make changes to this bug.