Description of problem: My server just toasted for the third of fourth time in a few days with a total lockup. The only error message I was able to get from the console (via the serial port) was: BUG: spinlock lockup on CPU#2, httpd/26200, ffff8102fe8a5600 (Not tainted) BUG: spinlock lockup on CPU#3, classifieds.cgi/27066, ffff8102fe8a5600 (Not tainted) These lockups have all occurred when there has been heavy disk activity such as tar'ing or cp'ing lots of files. Doing so seems to cause the load to spike considerably and then the server hangs. In one previous case I saw a kernel trace on the console which included some references to ext3 and jdb, but I've not been able to catch a copy of it. The most recent crash resulted only in the above error messages being displayed on the serial console. Version-Release number of selected component (if applicable): kernel.2.6.20-1.2312.fc5 also had the same crashes with ext3 kernel messages for: kernel.2.6.19-1.2288.2.4.fc5 I've had a similar crash for 2.6.20-1.2307.fc5, but was unable to retrieve any messages from the console for this one. That's what I'm running now. How reproducible: tar or cp lots of files, look at the console. Steps to Reproduce: 1. Boot up 2. tar lots of files 3. watch it crash Actual results: The server locks up and needs to be rebooted Expected results: The server doesn't fall over, and has the hundreds of days of rock-solid uptime that the zealots claim makes Linux so much better than windows. Additional info: I realize that this sucks as far as error messages, but I'll update if I can get a backtrace. This is a 2x Dual-Core AMD Opteron(tm) Processor 2220 SE box from Penguin with an Adaptec raid controller using aacraid driver (1.1-5[2423]-mh3). If anyone has any tips on boot cmdline params to use to make this thing more stable, I'd be appreciative. dmesg for 2.6.20-1.2307.fc5 attached Colin
Created attachment 152826 [details] dmesg for 2.6.20-1.2307.fc5
Which Adaptec card is the server using? Does it have the latest BIOS/firmware update installed? (Version 8832 is over a year old.)
Its an Adaptec 2130S with BIOS v8832. I had looked into upgrading half a year ago when I first got this server as I'd been having problems with XFS crashes, but at the newer BIOS's came with big warnings about being unstable, and the most common response was that it was XFS. Now it is running ext3 and had been stable for a month. It just crashed a few minutes ago with this stack trace, but looking at the trace, it appears to be a network card related issue this time. Our monitoring shows that the load was twice the usual, but still only averaging about 1.2 for an hour. BUG: spinlock bad magic on CPU#0, swapper/0 (Not tainted) lock: ffff81019cee3470, .magic: ffffffff, .owner: /0, .owner_cpu: -1662110600 Call Trace: <IRQ> [<ffffffff802076e5>] _raw_spin_lock+0x1e/0xe9 [<ffffffff80260adb>] _spin_lock_irqsave+0x9/0xe [<ffffffff8022d9cc>] __wake_up+0x22/0x4f [<ffffffff80251927>] sk_stream_write_space+0x5c/0x82 [<ffffffff8021b86b>] tcp_rcv_established+0x851/0x8fe [<ffffffff8023a0d8>] tcp_v4_do_rcv+0x1b5/0x4cf [<ffffffff80227188>] tcp_v4_rcv+0x95d/0x9f1 [<ffffffff80428a04>] ip_local_deliver_finish+0x0/0x1fd [<ffffffff802335a5>] ip_local_deliver+0x1b1/0x275 [<ffffffff80234679>] ip_rcv+0x497/0x4de [<ffffffff802201cf>] netif_receive_skb+0x34f/0x3d9 [<ffffffff880f707d>] :forcedeth:nv_napi_poll+0x438/0x54a [<ffffffff8020c4d3>] net_rx_action+0xa8/0x1ad [<ffffffff880f56d9>] :forcedeth:nv_nic_irq+0x1a7/0x23e [<ffffffff80211fa5>] __do_softirq+0x55/0xc3 [<ffffffff8025b23c>] call_softirq+0x1c/0x28 [<ffffffff802685b7>] do_softirq+0x2c/0x85 [<ffffffff8026875c>] do_IRQ+0x14c/0x16d [<ffffffff80266fb1>] default_idle+0x0/0x3d [<ffffffff8025a631>] ret_from_intr+0x0/0xa <EOI> [<ffffffff80266fda>] default_idle+0x29/0x3d [<ffffffff80246a20>] cpu_idle+0x8c/0xaf [<ffffffff805e4792>] start_kernel+0x236/0x23b [<ffffffff805e415c>] _sinittext+0x15c/0x160
Incidentally, I'm not trying to be an idiot here, I just didn't get coffee yet. That should read "but at the time, the newer RAID card BIOS's came with dire warnings about being unstable". I also realize that this stack trace shows nothing to do with a file system issue. But it has been crashed each time a heavy load was placed on it via tar/cp, and this has happened with a number of the recent RH kernel RPM's, hence my original post. I'm not sure if this stack trace simply hinders this situation, but since I had it, I thought I would include it anyway. Unfortunately this is currently a production server so much as I would like to try breaking it, I can't at the moment as its replacement is still being shipped. Once it arrives I'll probably end up shipping customers on to that box and then I'll be able to test more readily. Colin.
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.