Description of problem:
Running the test 'mtest01 -p80 -w' from Linux Test Project consecutively a bunch
of times locks up my machine.
* Happens after 15 to 30 minutes
* Network is down, input isn't possible anymore
* No output, nothing in the logs
* Motherboard Tyan S2927
* 2x Opteron 2212
* 8GB RAM + 8GB swap
Version-Release number of selected component (if applicable):
LTP version: 20080131
Failing kernel package:
Other kernels tested that don't show this behavior:
* kernel-2.6.24-0.102.rc5.git3.fc9 (built from CVS)
* kernel-18.104.22.168-4.fc8 (built from CVS)
Steps to Reproduce:
* Fetch LTP sources
* Run make; make install
* Do something along the lines
while true; do
/root/ltp-full-20080131/testcases/bin/mtest01 -p80 -w
I gave up tracking this down in detail. An update to a 2.6.24 based kernel which
seems to be planned would fix this anyway. The box is available for further
22.214.171.124-7 has been submitted to updates-testing.
* 126.96.36.199-7 failed after 18 min
* Retesting with kernel-188.8.131.52-4.fc8 failed, too
* Currently running the test on kernel-2.6.25-0.40.rc1.git2.fc9 again
- looks good so far (35 mins)
- I'll keep it running at least another 9 hours
Maybe I forgot to rebuild LTP with the appropriate kernel-headers installed.
Will do some more investigation as my time permits.
(In reply to comment #2)
> * Currently running the test on kernel-2.6.25-0.40.rc1.git2.fc9 again
It ran for more than 12 hours without failure.
> Maybe I forgot to rebuild LTP with the appropriate kernel-headers installed.
Verified this: It doesn't matter which one is installed. No clue why 2.6.24-4
did work once.
Over the weekend I ran a bisect between 2.6.24 and 2.6.25-rc1 (Linus' tree) with
configs derived off of Fedora kernel ones. I think I've found the commit that
fixed this problem, although it's quite surprising to me:
Author: Greg Kroah-Hartman <firstname.lastname@example.org>
Date: Fri Nov 2 13:20:40 2007 -0700
kset: convert efivars to use kset_create for the vars sub-subsystem.
Dynamically create the kset instead of declaring it statically.
This one ran for more than 9 hours without failure, whereas the commit before
(89a07e34b16d9dcdf0a9ada3ca0c9a506b490c8f) failed during the first iteration,
and the third one before (334c6307543a2b8af730a422f466d5f9442b606a) failed after
I have no clue why efivars.c does matter on a system without EFI. I'm going to
put Matt Domsch on CC to get more info.
I saved the bisect log and a hand pasted scribbling of the proceedings in case
you need it. All of the 14 kernel builds are still available on the test box, too.
(In reply to comment #3)
> I have no clue why efivars.c does matter on a system without EFI. I'm going to
> put Matt Domsch on CC to get more info.
Sorry for the noise Matt, I messed something up. It cannot and it does not have
something to do with EFI.
The bug gets triggered with configs derived from Fedora 2.6.24, and goes away
with configs derived from Fedora 2.6.25-rc1. Obviously, I switched between those
in the middle of the bisect.
Back to try and fail.
Now I'm absolutely sure about what... well, seems to hide the problem. With the
following debug options turned on my box doesn't lock up anymore:
Any suggestions how to proceed further?
(In reply to comment #5)
> Now I'm absolutely sure about what... well, seems to hide the problem. With the
> following debug options turned on my box doesn't lock up anymore:
> Any suggestions how to proceed further?
Try disabling them one at a time until it breaks...
(In reply to comment #6)
> Try disabling them one at a time until it breaks...
First I disabled CONFIG_DEBUG_LOCK_ALLOC, CONFIG_LOCKDEP, and CONFIG_STACKTRACE,
as these depend on the other two. This configuration ran for 9+ hours. Then I
additionally disabled CONFIG_DEBUG_SPINLOCK and this one broke after 10 minutes.
So I think CONFIG_DEBUG_SPINLOCK it is.
X86 spinlocks have been rewritten in 2.6.25-rc. Can you try that with
184.108.40.206-12.fc8.i686 seems to be okay, it ran for over an hour without any problems.
(In reply to comment #8)
> X86 spinlocks have been rewritten in 2.6.25-rc. Can you try that with
> DEBUG_SPINLOCK disabled?
Linus' tree as of yesterday seems to work. It ran for more than 10 hours now.
> 220.127.116.11-12.fc8.i686 seems to be okay, it ran for over an hour without any
I'll try that one during the weekend.
(In reply to comment #9)
> > 18.104.22.168-12.fc8.i686 seems to be okay, it ran for over an hour without any
No, 22.214.171.124-12.fc8.x86_64 locked up after 35 minutes here. Looks like another
round of bisect...
But i686 seems to be okay.
Can you try the NMI watchdog to see if it catches the hang? (Try booting with
either "nmi_watchdog=1" or "nmi_watchdog=2" option.)
Created attachment 296710 [details]
Serial console output
Yes! nmi_watchdog catched it. See attachment.
Bisect does indeed indicate that the spinlock changes from Nick Piggin are
getting 2.6.25-rc to work (commit 314cdbefd1fd0a7acf3780e9628465b77ea6a836). At
least I'm two builds away from it now.
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '8'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 8's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 8 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.