From Bugzilla Helper: User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/106.2 (KHTML, like Gecko) Safari/100.1 Description of problem: After a few minutes of NFS client activity, the system deadlocks. Only responce from the system is via ping. (no vga, mouse, CTRL-ALT-DEL, etc.) Must power cycle system to reboot. Reproducable on uniprocessor kernel also, just takes longer. Version-Release number of selected component (if applicable): errata kernels 2135, 2138, 2140 How reproducible: Always Steps to Reproduce: 1. mount server:/somefs /mnt/nfs 2. tar -cvf /dev/null /mnt/nfs 3. wait a few minutes Actual Results: system deadlocks Expected Results: system should complete the copy, and continue to live. Additional info: current FC1 load, with all current (as of 8 Jan 04) applied, including kernel-2.4.22-1.2140) Machines tested include a Dell PowerEdge Workstaion 330 (uniprocessor xeon) and IBM Intellistaion Z Pro (uniprocessor xeon w/ hyperthreading) Suggestions for better methods of data collection welcome, I can't get an OOPS since the screen blanks... :/
I have a dual 2.2 XEON (hyperthread disabled) with FC1.0, kernel 2115. It worked for over 25 days with NFS and SMB. It finally hung last night after I disconnected the cable to the NFS server and did a DF. Message was posted to the Fedora mailing list. I am upgrading to 2140 kernel and will re-test (and will use Stress). Please CC me on resolution.
ncb.edu : from reading your report it sounds like 2115 was ok, and this is a regression in the errata kernels. Is that correct ? Wade, any luck with the errata kernel ? It's possible there are two bugs here. There are a number of other similar bugs with SMP deadlocks.
Not sure if 2115 was good or not. Will try that, and vanilla 2.4.22. Should I build 2.4.22 with gcc from gcc-3.3.2-1.i386.rpm or from gcc32-3.2.3 -6.i386.rpm? Another data point. It may be during unmount that hangs occur - if that happens to tickle any neurons...
use gcc32 for kernel builds. Though you could just grab the binaries in this case.. Sounds like this may be another instance of bug #109497
I built a 2.4.24 kernel (but used stock gcc 3.3). It has been up since yesterday, but not heavily loaded. Should I rebuild using gcc32? I assume this is simply editing 2.4.24/Makefile and changing HOSTCC = gcc32, correct? Anything I should turn off in BIOS (e.g., USB, ACPI, Hyperthread)? If there are other SMP deadlocks in the current "stable" kernel, any idea what they are and the schedule to resolve. Also, I assume I should enable nmi_watchdog=1 and sysreq? Also: I would recommend this bug be marked as a duplicate of bug 109497 as that bug thread discusses similar problems (I posted there as well). BTW, this reminds me of the SMP problems I had in the 2.2.13-15 timeframe....
*** This bug has been marked as a duplicate of 109497 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.