Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 445422

Summary: Feature: allow panic on softlockup warnings
Product: Red Hat Enterprise Linux 5 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.1CC: ddomingo, dzickus, james.brown, lwang, mingo, pzijlstr, tao, woodward
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
The soft lockup detector can now be configured to trigger a kernel panic instead of a warning message. This makes it possible for users to generate and analyze a crash dump during a soft lockup for forensic purposes. To configure the soft lockup detector to generate a panic, set the kernel parameter soft_lockup to 1. This parameter is set to 0 by default.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 20:18:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 454962    
Attachments:
Description Flags
RHEL5 fix for this issue none

Description Issue Tracker 2008-05-06 19:53:44 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2008-05-06 19:53:46 UTC
We have been hit with a huge number of softlockup problems that have been proving themselves hard to track down. We would like to be able to have the option of having the node panic when it gets a softlockup and then we could analyze the crashdump. It appears that Ingo Molnar has submitted a patch to upstream with the desired behavior. We would like to have it backported to RHEL5.

http://people.redhat.com/mingo/softlockup-patches/softlockup-allow-panic-on-lockup.patch
This event sent from IssueTracker by jwest  [SEG - Feature Request]
 issue 178763

Comment 6 Prarit Bhargava 2008-05-13 17:42:22 UTC
Created attachment 305272 [details]
RHEL5 fix for this issue

Comment 8 Don Zickus 2008-05-13 18:09:58 UTC
Is there a bugzilla tracking the softlockup messages?

My impression is this patch is just a temporary workaround to a real bug.  
The better patch would be to add the info we need to analyze these softlockup
messages in the future.  I see them all the time in my tests and it's hard to
tell if they are meaningful or just false positives (where a piece of code is
known to take forever and therefore needs to kick the timer).

We added some code in 5.2 to allow more info to be displayed, if that isn't
helpful enough perhaps we should add more?


Comment 10 Prarit Bhargava 2008-05-13 19:10:24 UTC
(In reply to comment #8)
> Is there a bugzilla tracking the softlockup messages?
> 

Ben?


> My impression is this patch is just a temporary workaround to a real bug.  


The patch in question would allow a user to get a crashdump for later analysis
and free up the system so that it could continue to do whatever it was doing. 
If this system was a HA system from Stratus, NEC, etc., getting the system back
up and running in a normal mode of operation is critical.

> The better patch would be to add the info we need to analyze these softlockup
> messages in the future.  I see them all the time in my tests and it's hard to
> tell if they are meaningful or just false positives (where a piece of code is
> known to take forever and therefore needs to kick the timer).

But you're right -- this shouldn't be considered a _solution_ to the softlockup
problem and there should be a BZ associated with the softlockup warning messages
that LLNL is seeing.

> 
> We added some code in 5.2 to allow more info to be displayed, if that isn't
> helpful enough perhaps we should add more?
> 

I'm not sure what more we should add -- I suppose that adding stack dumps of the
other processors *might* be helpful.  But that still has the problem that the
other processors have continued on after the softlockup...

P.

Comment 11 Ben Woodard 2008-05-14 23:00:22 UTC
> Is there a bugzilla tracking the softlockup messages?
> 
> Ben?
We have several issues open regarding various softlockup problems that we are
working on.

We are not quite ready to move to 5.2 yet but we are working on backporting the
patches for the softlock messages to our 5.1 kernel in the mean time. (Don't
worry we will get there soon but it takes about 1.5 months after RH does an
official release for us to begin rolling out a release.)

Comment 17 Don Zickus 2008-07-23 18:55:30 UTC
in kernel-2.6.18-99.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 18 Ryan Lerch 2008-08-12 03:41:23 UTC
Tracking this bug for the Red Hat Enterprise Linux 5.3 Release Notes.

Comment 24 Don Domingo 2008-11-18 21:48:24 UTC
this bug is now documented in the RHEL5.3 release notes. you can view a mock build of this document at the following link:

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.3/html-single/Release_Notes/

Comment 25 Don Domingo 2008-11-18 21:48:24 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The soft lockup detector can now be configured to trigger a kernel panic instead of a warning message. This makes it possible for users to generate and analyze a crash dump during a soft lockup for forensic purposes.

To configure the soft lockup detector to generate a panic, set the kernel parameter soft_lockup to 1. This parameter is set to 0 by default.

Comment 28 errata-xmlrpc 2009-01-20 20:18:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html