Bug 445422 - Feature: allow panic on softlockup warnings
Feature: allow panic on softlockup warnings
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
All Linux
medium Severity medium
: rc
: ---
Assigned To: Prarit Bhargava
Martin Jenner
: FutureFeature, Triaged
Depends On:
Blocks: RHEL5u3_relnotes
  Show dependency treegraph
 
Reported: 2008-05-06 15:53 EDT by Issue Tracker
Modified: 2010-10-22 20:45 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
The soft lockup detector can now be configured to trigger a kernel panic instead of a warning message. This makes it possible for users to generate and analyze a crash dump during a soft lockup for forensic purposes. To configure the soft lockup detector to generate a panic, set the kernel parameter soft_lockup to 1. This parameter is set to 0 by default.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:18:31 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
RHEL5 fix for this issue (3.23 KB, patch)
2008-05-13 13:42 EDT, Prarit Bhargava
no flags Details | Diff

  None (edit)
Description Issue Tracker 2008-05-06 15:53:44 EDT
Escalated to Bugzilla from IssueTracker
Comment 1 Issue Tracker 2008-05-06 15:53:46 EDT
We have been hit with a huge number of softlockup problems that have been proving themselves hard to track down. We would like to be able to have the option of having the node panic when it gets a softlockup and then we could analyze the crashdump. It appears that Ingo Molnar has submitted a patch to upstream with the desired behavior. We would like to have it backported to RHEL5.

http://people.redhat.com/mingo/softlockup-patches/softlockup-allow-panic-on-lockup.patch
This event sent from IssueTracker by jwest  [SEG - Feature Request]
 issue 178763
Comment 6 Prarit Bhargava 2008-05-13 13:42:22 EDT
Created attachment 305272 [details]
RHEL5 fix for this issue
Comment 8 Don Zickus 2008-05-13 14:09:58 EDT
Is there a bugzilla tracking the softlockup messages?

My impression is this patch is just a temporary workaround to a real bug.  
The better patch would be to add the info we need to analyze these softlockup
messages in the future.  I see them all the time in my tests and it's hard to
tell if they are meaningful or just false positives (where a piece of code is
known to take forever and therefore needs to kick the timer).

We added some code in 5.2 to allow more info to be displayed, if that isn't
helpful enough perhaps we should add more?
Comment 10 Prarit Bhargava 2008-05-13 15:10:24 EDT
(In reply to comment #8)
> Is there a bugzilla tracking the softlockup messages?
> 

Ben?


> My impression is this patch is just a temporary workaround to a real bug.  


The patch in question would allow a user to get a crashdump for later analysis
and free up the system so that it could continue to do whatever it was doing. 
If this system was a HA system from Stratus, NEC, etc., getting the system back
up and running in a normal mode of operation is critical.

> The better patch would be to add the info we need to analyze these softlockup
> messages in the future.  I see them all the time in my tests and it's hard to
> tell if they are meaningful or just false positives (where a piece of code is
> known to take forever and therefore needs to kick the timer).

But you're right -- this shouldn't be considered a _solution_ to the softlockup
problem and there should be a BZ associated with the softlockup warning messages
that LLNL is seeing.

> 
> We added some code in 5.2 to allow more info to be displayed, if that isn't
> helpful enough perhaps we should add more?
> 

I'm not sure what more we should add -- I suppose that adding stack dumps of the
other processors *might* be helpful.  But that still has the problem that the
other processors have continued on after the softlockup...

P.
Comment 11 Ben Woodard 2008-05-14 19:00:22 EDT
> Is there a bugzilla tracking the softlockup messages?
> 
> Ben?
We have several issues open regarding various softlockup problems that we are
working on.

We are not quite ready to move to 5.2 yet but we are working on backporting the
patches for the softlock messages to our 5.1 kernel in the mean time. (Don't
worry we will get there soon but it takes about 1.5 months after RH does an
official release for us to begin rolling out a release.)
Comment 17 Don Zickus 2008-07-23 14:55:30 EDT
in kernel-2.6.18-99.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 18 Ryan Lerch 2008-08-11 23:41:23 EDT
Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes.
Comment 24 Don Domingo 2008-11-18 16:48:24 EST
this bug is now documented in the RHEL5.3 release notes. you can view a mock build of this document at the following link:

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.3/html-single/Release_Notes/
Comment 25 Don Domingo 2008-11-18 16:48:24 EST
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The soft lockup detector can now be configured to trigger a kernel panic instead of a warning message. This makes it possible for users to generate and analyze a crash dump during a soft lockup for forensic purposes.

To configure the soft lockup detector to generate a panic, set the kernel parameter soft_lockup to 1. This parameter is set to 0 by default.
Comment 28 errata-xmlrpc 2009-01-20 15:18:31 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.