Bug 165697

Summary: hard hang after using Ingo's "lockupcli.c"
Product: [Fedora] Fedora Reporter: Doug Maxey <dwm>
Component: kernelAssignee: Ingo Molnar <mingo>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: davej, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-05 20:23:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
serial console log of oops prior to hang none

Description Doug Maxey 2005-08-11 12:54:23 UTC
Description of problem: 
Have been having occasional system lockups after the 2.6.10-1.770_FC3 kernel. 
Seems to be pretty consistent in that it does occur, seen in all kernels since.
  However, the time it takes to show up varies from hours to several days. 

After 2 lockups on Monday (of course) while moving between gnome desktops,
finally compiled Ingo's lockupcli and ran it to see if I could confirm that the
nmi_watchdog=1 parameter actually worked on the box.

For about 30 seconds after starting, the system did slow, but continued to take
keyboard and mouse input.  After 30 seconds or so, it finally hung solid.  In
the serial window, just prior to the lockup, the oops message did print.  As
soon as that popped out, the box was hung. 

Version-Release number of selected component (if applicable):
all since 2.6.10-1.770_FC3

How reproducible:
run Ingo's lockupcli.

Steps to Reproduce:
1. compile Ingo's lockupcli
2. run it
3.
  
Actual results:
hard system hang after a few 10's of seconds

Expected results:
No hard system hangs

Additional info:  logs to follow.  This is happening on 2 similar ICH5 based
systems.

Comment 1 Doug Maxey 2005-08-11 12:54:23 UTC
Created attachment 117643 [details]
serial console log of oops prior to hang

Comment 2 Doug Maxey 2005-08-11 13:21:39 UTC
I had previously reported this in bug 154190, but that never went anywhere. 
This seems to be a way to reproduce that I could not previously.

Comment 3 Ingo Molnar 2005-09-15 22:09:51 UTC
well, running lock-up-the-box did lock up the box, but the NMI watchdog got a
traceback of it. So it works as advertised.

do your other hard lockups (_NOT_ the self-induced ones) produce any NMI
watchdog output on the serial console?

Comment 4 Doug Maxey 2005-09-16 17:29:08 UTC
(In reply to comment #3)
> well, running lock-up-the-box did lock up the box, but the NMI watchdog got a
> traceback of it. So it works as advertised.

Ok.  thought it was going to dump me into the debugger or something.  

So the behavior is to dump some info, then hang, correct?

> 
> do your other hard lockups (_NOT_ the self-induced ones) produce any NMI
> watchdog output on the serial console?

Yes and no. :)  Did save one on the laptop that got re-installed.  Will attempt
another capture.

One other odd point with no hard data to back it up: the fails seem to come in
clusters:  I can run 2 weeks or so, and then suddenly,  for a day or two, things
jump the track continiously.  I may get 4 or 5 lockups back to back to back. 
Then it goes away for a coupla weeks.  This happened on Monday of this week
(2005-09-12) again.  Hm.  I see that this was about 30 days ago when this was
opened....

Yes, we can leave this in needinfo until I get a chance to bang on a system with
serial console attached and logging.  Just that these systems are my
workstations, and the boss is rather demanding. :)


Comment 5 Doug Maxey 2005-09-17 22:51:33 UTC
Good news and bad news.

The good news is that the hang is 100% reproducible with a few seconds.  Fire up
gmplayer on your fav video, and start moving the window around with the mouse. 
It will hang in 10-15 seconds.

Bad news is that there is no backtrace.

Comment 6 Doug Maxey 2005-09-17 22:58:26 UTC
One other comment, the hang almost always involves the graphical subsys.  It has
happened when switching between desktops, and raising and lowering something
like firefox when trying to peek at a partially obscured window.  

On other similar systems without graphics, have never observed it.

In comment 5, I let the system sit after the last hang for ~30 minutes before
giving up.


Comment 7 Dave Jones 2006-01-16 22:19:00 UTC
This is a mass-update to all currently open Fedora Core 3 kernel bugs.

Fedora Core 3 support has transitioned to the Fedora Legacy project.
Due to the limited resources of this project, typically only
updates for new security issues are released.

As this bug isn't security related, it has been migrated to a
Fedora Core 4 bug.  Please upgrade to this newer release, and
test if this bug is still present there.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

Thank you.


Comment 8 Dave Jones 2006-02-03 06:39:20 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 9 John Thacker 2006-05-05 20:23:25 UTC
Closing per previous comment.