Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 552675

Summary: ipmi_watchdog deadlock
Product: Red Hat Enterprise Linux 5 Reporter: dann frazier <dannf>
Component: kernelAssignee: Tony Camuso <tcamuso>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: dwa, emcnabb, jarod, peterm, sandy.garza, syeghiay, tao
Target Milestone: rc   
Target Release: 5.5   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 07:46:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 540569    
Attachments:
Description Flags
subset of upstream commit, sufficient to fix the problem for us
none
reproducer none

Description dann frazier 2010-01-05 20:34:01 UTC
Description of problem:
System deadlocks in ipmi-watchdog ioctl.

Version-Release number of selected component (if applicable):
2.6.18-128.el5

How reproducible:
100%

Steps to Reproduce:
1. modprobe ipmi_watchdog nowayout=0 start_now=0 action=reset
2. Run the attached program, which should hang.
3.
  
Actual results:
Program hangs with message:
  If I hang here, my ipmi_watchdog is buggy...

sysrq shows the process's call stack:

Call Trace:
 [<ffffffff80063d20>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80063d6a>] .text.lock.mutex+0xf/0x14
 [<ffffffff884812f1>] :ipmi_watchdog:ipmi_set_timeout+0x14/0x74
 [<ffffffff8006321c>] wait_for_completion+0x8f/0xa2
 [<ffffffff884813a5>] :ipmi_watchdog:ipmi_heartbeat+0x54/0x13b
 [<ffffffff80063bb6>] mutex_lock+0xd/0x1d
 [<ffffffff8848133a>] :ipmi_watchdog:ipmi_set_timeout+0x5d/0x74
 [<ffffffff8003a1fc>] tty_ldisc_deref+0x68/0x7b
 [<ffffffff884816a6>] :ipmi_watchdog:ipmi_ioctl+0x14a/0x178
 [<ffffffff80041b62>] do_ioctl+0x55/0x6b
 [<ffffffff8002fd1e>] vfs_ioctl+0x248/0x261
 [<ffffffff8004c0a3>] sys_ioctl+0x59/0x78
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Expected results:
Program does not hang.

Additional info:
- Our test box was an hp bl460c g6 which has an iLO2 BMC, but it has also been reproduced on non-HP IPMI-compliant hardware as well.

This issue does not exist w/ current upstream kernels. It appears to have been fixed with:

commit 612b5a8d3a57d07698ceec0e307a84f38b241fe2
Author: Corey Minyard <minyard>
Date:   Thu Oct 18 03:07:10 2007 -0700

    IPMI: new NMI handling
    
    Convert over to the new NMI handling for getting IPMI watchdog timeouts via an
    NMI.  This add config options to know if there is the ability to receive NMIs
    and if it has an NMI post processing call.  Then it modifies the IPMI watchdog
    to take advantage of this so that it can know if an NMI comes in.
    
    It also adds testing that the IPMI NMI watchdog works.
    
    Signed-off-by: Corey Minyard <minyard>
    Signed-off-by: Andrew Morton <akpm>
    Signed-off-by: Linus Torvalds <torvalds>

Though, only a small portion of that patch is necessary to solve the problem in RHEL5.

Comment 1 dann frazier 2010-01-05 20:35:45 UTC
Created attachment 381833 [details]
subset of upstream commit, sufficient to fix the problem for us

Comment 3 David Aquilina 2010-01-07 21:10:11 UTC
Dann, 

It doesn't look like the reproducer program has been attached, could you please do so? 

Thanks!

Comment 4 dann frazier 2010-01-07 22:30:58 UTC
Created attachment 382345 [details]
reproducer

oops - here it is

Comment 6 RHEL Program Management 2010-02-01 14:42:56 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Sandy Garza 2010-02-17 14:45:58 UTC
Tony Camuso verified this is in RHEL 5.5.

Comment 14 Sandy Garza 2010-03-15 16:11:50 UTC
Successfully verified by Tony Camsuo with RHEL 5.5, Snapshot 3.

Comment 16 errata-xmlrpc 2010-03-30 07:46:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html