Description of problem: Running the test 'mtest01 -p80 -w' from Linux Test Project consecutively a bunch of times locks up my machine. * Happens after 15 to 30 minutes * Network is down, input isn't possible anymore * No output, nothing in the logs Hardware: * Motherboard Tyan S2927 * 2x Opteron 2212 * 8GB RAM + 8GB swap Version-Release number of selected component (if applicable): LTP version: 20080131 Failing kernel package: * kernel-2.6.23.15-137.fc8 Other kernels tested that don't show this behavior: * kernel-2.6.25-0.40.rc1.git2.fc9 * kernel-2.6.24-0.102.rc5.git3.fc9 (built from CVS) * kernel-2.6.24.2-4.fc8 (built from CVS) Steps to Reproduce: * Fetch LTP sources * Run make; make install * Do something along the lines while true; do /root/ltp-full-20080131/testcases/bin/mtest01 -p80 -w sleep 5 done Additional info: I gave up tracking this down in detail. An update to a 2.6.24 based kernel which seems to be planned would fix this anyway. The box is available for further testing, though.
2.6.24.2-7 has been submitted to updates-testing.
Damn. * 2.6.24.2-7 failed after 18 min * Retesting with kernel-2.6.24.2-4.fc8 failed, too * Currently running the test on kernel-2.6.25-0.40.rc1.git2.fc9 again - looks good so far (35 mins) - I'll keep it running at least another 9 hours Maybe I forgot to rebuild LTP with the appropriate kernel-headers installed. Will do some more investigation as my time permits.
(In reply to comment #2) > * Currently running the test on kernel-2.6.25-0.40.rc1.git2.fc9 again It ran for more than 12 hours without failure. > Maybe I forgot to rebuild LTP with the appropriate kernel-headers installed. Verified this: It doesn't matter which one is installed. No clue why 2.6.24-4 did work once. Over the weekend I ran a bisect between 2.6.24 and 2.6.25-rc1 (Linus' tree) with configs derived off of Fedora kernel ones. I think I've found the commit that fixed this problem, although it's quite surprising to me: commit 66ac831e03879c3c7dae76f793e6094e407081d2 Author: Greg Kroah-Hartman <gregkh> Date: Fri Nov 2 13:20:40 2007 -0700 kset: convert efivars to use kset_create for the vars sub-subsystem. Dynamically create the kset instead of declaring it statically. This one ran for more than 9 hours without failure, whereas the commit before (89a07e34b16d9dcdf0a9ada3ca0c9a506b490c8f) failed during the first iteration, and the third one before (334c6307543a2b8af730a422f466d5f9442b606a) failed after 1 hour. I have no clue why efivars.c does matter on a system without EFI. I'm going to put Matt Domsch on CC to get more info. I saved the bisect log and a hand pasted scribbling of the proceedings in case you need it. All of the 14 kernel builds are still available on the test box, too.
(In reply to comment #3) > I have no clue why efivars.c does matter on a system without EFI. I'm going to > put Matt Domsch on CC to get more info. Sorry for the noise Matt, I messed something up. It cannot and it does not have something to do with EFI. The bug gets triggered with configs derived from Fedora 2.6.24, and goes away with configs derived from Fedora 2.6.25-rc1. Obviously, I switched between those in the middle of the bisect. Back to try and fail.
Now I'm absolutely sure about what... well, seems to hide the problem. With the following debug options turned on my box doesn't lock up anymore: CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_LOCKDEP=y CONFIG_STACKTRACE=y Any suggestions how to proceed further?
(In reply to comment #5) > Now I'm absolutely sure about what... well, seems to hide the problem. With the > following debug options turned on my box doesn't lock up anymore: > > CONFIG_DEBUG_SPINLOCK=y > CONFIG_DEBUG_MUTEXES=y > CONFIG_DEBUG_LOCK_ALLOC=y > CONFIG_LOCKDEP=y > CONFIG_STACKTRACE=y > > Any suggestions how to proceed further? Try disabling them one at a time until it breaks...
(In reply to comment #6) > Try disabling them one at a time until it breaks... First I disabled CONFIG_DEBUG_LOCK_ALLOC, CONFIG_LOCKDEP, and CONFIG_STACKTRACE, as these depend on the other two. This configuration ran for 9+ hours. Then I additionally disabled CONFIG_DEBUG_SPINLOCK and this one broke after 10 minutes. So I think CONFIG_DEBUG_SPINLOCK it is.
X86 spinlocks have been rewritten in 2.6.25-rc. Can you try that with DEBUG_SPINLOCK disabled? 2.6.24.3-12.fc8.i686 seems to be okay, it ran for over an hour without any problems.
(In reply to comment #8) > X86 spinlocks have been rewritten in 2.6.25-rc. Can you try that with > DEBUG_SPINLOCK disabled? Linus' tree as of yesterday seems to work. It ran for more than 10 hours now. (commit 076d84bbdb396360d16aaa108c55aa1e24ad47a3) > 2.6.24.3-12.fc8.i686 seems to be okay, it ran for over an hour without any problems. I'll try that one during the weekend.
(In reply to comment #9) > > 2.6.24.3-12.fc8.i686 seems to be okay, it ran for over an hour without any > problems. No, 2.6.24.3-12.fc8.x86_64 locked up after 35 minutes here. Looks like another round of bisect...
But i686 seems to be okay.
Can you try the NMI watchdog to see if it catches the hang? (Try booting with either "nmi_watchdog=1" or "nmi_watchdog=2" option.)
Created attachment 296710 [details] Serial console output Yes! nmi_watchdog catched it. See attachment. Bisect does indeed indicate that the spinlock changes from Nick Piggin are getting 2.6.25-rc to work (commit 314cdbefd1fd0a7acf3780e9628465b77ea6a836). At least I'm two builds away from it now.
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.