Bug 433192 - Consecutively running LTP mtest01 results in lock up
Consecutively running LTP mtest01 results in lock up
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
8
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
http://prdownloads.sourceforge.net/lt...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-17 06:22 EST by Frank Arnold
Modified: 2009-01-09 02:39 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-09 02:39:45 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Serial console output (11.60 KB, text/plain)
2008-03-04 03:38 EST, Frank Arnold
no flags Details

  None (edit)
Description Frank Arnold 2008-02-17 06:22:19 EST
Description of problem:
Running the test 'mtest01 -p80 -w' from Linux Test Project consecutively a bunch
of times locks up my machine.
* Happens after 15 to 30 minutes
* Network is down, input isn't possible anymore
* No output, nothing in the logs

Hardware:
* Motherboard Tyan S2927
* 2x Opteron 2212
* 8GB RAM + 8GB swap

Version-Release number of selected component (if applicable):
LTP version: 20080131

Failing kernel package:
* kernel-2.6.23.15-137.fc8

Other kernels tested that don't show this behavior:
* kernel-2.6.25-0.40.rc1.git2.fc9
* kernel-2.6.24-0.102.rc5.git3.fc9 (built from CVS)
* kernel-2.6.24.2-4.fc8 (built from CVS)

Steps to Reproduce:
* Fetch LTP sources
* Run make; make install
* Do something along the lines
  while true; do
       /root/ltp-full-20080131/testcases/bin/mtest01 -p80 -w
       sleep 5
  done

Additional info:
I gave up tracking this down in detail. An update to a 2.6.24 based kernel which
seems to be planned would fix this anyway. The box is available for further
testing, though.
Comment 1 Chuck Ebbert 2008-02-20 14:04:09 EST
2.6.24.2-7 has been submitted to updates-testing.
Comment 2 Frank Arnold 2008-02-21 04:12:51 EST
Damn.

* 2.6.24.2-7 failed after 18 min
* Retesting with kernel-2.6.24.2-4.fc8 failed, too
* Currently running the test on kernel-2.6.25-0.40.rc1.git2.fc9 again
  - looks good so far (35 mins)
  - I'll keep it running at least another 9 hours

Maybe I forgot to rebuild LTP with the appropriate kernel-headers installed.
Will do some more investigation as my time permits.
Comment 3 Frank Arnold 2008-02-25 08:12:50 EST
(In reply to comment #2)
> * Currently running the test on kernel-2.6.25-0.40.rc1.git2.fc9 again
It ran for more than 12 hours without failure.

> Maybe I forgot to rebuild LTP with the appropriate kernel-headers installed.
Verified this: It doesn't matter which one is installed. No clue why 2.6.24-4
did work once.

Over the weekend I ran a bisect between 2.6.24 and 2.6.25-rc1 (Linus' tree) with
configs derived off of Fedora kernel ones. I think I've found the commit that
fixed this problem, although it's quite surprising to me:

    commit 66ac831e03879c3c7dae76f793e6094e407081d2
    Author: Greg Kroah-Hartman <gregkh@suse.de>
    Date:   Fri Nov 2 13:20:40 2007 -0700
    kset: convert efivars to use kset_create for the vars sub-subsystem.
    Dynamically create the kset instead of declaring it statically.

This one ran for more than 9 hours without failure, whereas the commit before
(89a07e34b16d9dcdf0a9ada3ca0c9a506b490c8f) failed during the first iteration,
and the third one before (334c6307543a2b8af730a422f466d5f9442b606a) failed after
1 hour.

I have no clue why efivars.c does matter on a system without EFI. I'm going to
put Matt Domsch on CC to get more info.

I saved the bisect log and a hand pasted scribbling of the proceedings in case
you need it. All of the 14 kernel builds are still available on the test box, too.
Comment 4 Frank Arnold 2008-02-26 03:34:57 EST
(In reply to comment #3)
> I have no clue why efivars.c does matter on a system without EFI. I'm going to
> put Matt Domsch on CC to get more info.

Sorry for the noise Matt, I messed something up. It cannot and it does not have
something to do with EFI.
The bug gets triggered with configs derived from Fedora 2.6.24, and goes away
with configs derived from Fedora 2.6.25-rc1. Obviously, I switched between those
in the middle of the bisect.

Back to try and fail.
Comment 5 Frank Arnold 2008-02-28 15:40:26 EST
Now I'm absolutely sure about what... well, seems to hide the problem. With the
following debug options turned on my box doesn't lock up anymore:

CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_STACKTRACE=y

Any suggestions how to proceed further?
Comment 6 Chuck Ebbert 2008-02-28 18:57:32 EST
(In reply to comment #5)
> Now I'm absolutely sure about what... well, seems to hide the problem. With the
> following debug options turned on my box doesn't lock up anymore:
> 
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_LOCKDEP=y
> CONFIG_STACKTRACE=y
> 
> Any suggestions how to proceed further?

Try disabling them one at a time until it breaks...
Comment 7 Frank Arnold 2008-02-29 12:55:22 EST
(In reply to comment #6)
> Try disabling them one at a time until it breaks...

First I disabled CONFIG_DEBUG_LOCK_ALLOC, CONFIG_LOCKDEP, and CONFIG_STACKTRACE,
as these depend on the other two. This configuration ran for 9+ hours. Then I
additionally disabled CONFIG_DEBUG_SPINLOCK and this one broke after 10 minutes.

So I think CONFIG_DEBUG_SPINLOCK it is.
Comment 8 Chuck Ebbert 2008-02-29 16:19:04 EST
X86 spinlocks have been rewritten in 2.6.25-rc. Can you try that with
DEBUG_SPINLOCK disabled?

2.6.24.3-12.fc8.i686 seems to be okay, it ran for over an hour without any problems.
Comment 9 Frank Arnold 2008-03-01 01:46:17 EST
(In reply to comment #8)
> X86 spinlocks have been rewritten in 2.6.25-rc. Can you try that with
> DEBUG_SPINLOCK disabled?

Linus' tree as of yesterday seems to work. It ran for more than 10 hours now.
(commit 076d84bbdb396360d16aaa108c55aa1e24ad47a3)

> 2.6.24.3-12.fc8.i686 seems to be okay, it ran for over an hour without any
problems.

I'll try that one during the weekend.
Comment 10 Frank Arnold 2008-03-01 02:52:09 EST
(In reply to comment #9)
> > 2.6.24.3-12.fc8.i686 seems to be okay, it ran for over an hour without any
> problems.

No, 2.6.24.3-12.fc8.x86_64 locked up after 35 minutes here. Looks like another
round of bisect...
Comment 11 Chuck Ebbert 2008-03-03 12:14:46 EST
But i686 seems to be okay.
Comment 12 Chuck Ebbert 2008-03-03 17:42:58 EST
Can you try the NMI watchdog to see if it catches the hang? (Try booting with
either "nmi_watchdog=1" or "nmi_watchdog=2" option.)
Comment 13 Frank Arnold 2008-03-04 03:38:28 EST
Created attachment 296710 [details]
Serial console output

Yes! nmi_watchdog catched it. See attachment.
Bisect does indeed indicate that the spinlock changes from Nick Piggin are
getting 2.6.25-rc to work (commit 314cdbefd1fd0a7acf3780e9628465b77ea6a836). At
least I'm two builds away from it now.
Comment 14 Bug Zapper 2008-11-26 04:50:33 EST
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 15 Bug Zapper 2009-01-09 02:39:45 EST
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.