Bug 514678

Summary: ipmi_watchdog driver and ipmi init script broken on systems with certain types of watchdog hardware
Product: Red Hat Enterprise Linux 5 Reporter: IBM Bug Proxy <bugproxy>
Component: OpenIPMIAssignee: Jan Safranek <jsafrane>
Status: CLOSED ERRATA QA Contact: BaseOS QE <qe-baseos-auto>
Severity: high Docs Contact:
Priority: high    
Version: 5.4CC: curtis, jjarvis, mcermak, rvokal
Target Milestone: rc   
Target Release: 5.5   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-02 07:20:32 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description IBM Bug Proxy 2009-07-30 02:00:24 EDT
=Comment: #0=================================================
Roger Mach <bigmach@us.ibm.com> - 
---Problem Description---
ipmi_watchdog driver and ipmi init script broken on systems with certain types of watchdog hardware
 
Contact Information = Roger Mach <bigmach@us.ibm.com> and Carol Hebert <hebertc@us.ibm.com> 
 
---Additional Hardware Info---
On-chip watchdog hardware supported by i6300esb driver or an Intel TCO Watchdog Timer device
supported by the iTCO_wdt driver

 
---uname output---
2.6.18-152.el5xen
 
Machine Type = HS20 blade, x3550 M2 

---Debugger---
A debugger is not configured
 
---Steps to Reproduce---
 Configure the ipmi_watchdog to load by setting IPMI_WATCHDOG=yes in /etc/sysconfig/ipmi and then
load ipmi service with "service ipmi start".  Observe that the ipmi drivers and ipmi_watchdog driver
appear to load, however these errors can be seen in dmesg:

IPMI Watchdog: Unable to register misc device
IPMI Watchdog: driver initialized

Subsequent testing proves that although the ipmi_watchdog driver appears to be
loaded, "ipmitool mc watchdog get" shows that the config settings in the
/etc/sysconfig/ipmi file have not been set:

# grep WATCHDOG /etc/sysconfig/ipmi
## Description: Enable IPMI_WATCHDOG if you want the IPMI watchdog
# Enable IPMI_WATCHDOG if you want the IPMI watchdog
IPMI_WATCHDOG=yes
IPMI_WATCHDOG_OPTIONS="timeout=60 action=power_cycle start_now=1"

# ipmitool mc watchdog get
Watchdog Timer Use:     BIOS FRB2 (0x01)
Watchdog Timer Is:      Stopped
Watchdog Timer Actions: No action (0x00)
Pre-timeout interval:   0 seconds
Timer Expiration Flags: 0x02
Initial Countdown:      0 sec
Present Countdown:      0 sec
[root@elm3a27 PAE]# lsmod |grep ipmi
ipmi_watchdog          21129  0
ipmi_devintf           13129  0
ipmi_si                42829  0
ipmi_msghandler        39153  3 ipmi_watchdog,ipmi_devintf,ipmi_si

The problem is that the i6300esb driver has the same major/minor device numbers
and is already using /dev/watchdog as of boot-time:
# lsmod |grep i6300
i6300esb               10841  0

# modinfo i6300esb
filename:      
/lib/modules/2.6.18-152.el5xen/kernel/drivers/char/watchdog/i6300esb.ko
alias:          char-major-10-130
license:        GPL
description:    Watchdog driver for Intel 6300ESB chipsets
author:         Ross Biro and David Härdeman
srcversion:     2A37792AAD84EC032278ECA
alias:          pci:v00008086d000025ABsv*sd*bc*sc*i*
depends:
vermagic:       2.6.18-152.el5xen SMP mod_unload 686 REGPARM 4KSTACKS gcc-4.1
parm:           heartbeat:Watchdog heartbeat in seconds. (1<heartbeat<2046,
default=30) (int)
parm:           nowayout:Watchdog cannot be stopped once started
(default=CONFIG_WATCHDOG_NOWAYOUT) (int)
module_sig:    
883f3504a27af664a752b13a67179611257f509b6e87ea7e62e95c15ef1c26a12e3c75ff23aca409e33dcbcb4b2b5a9bbc6880e99fd6082d91b292c


Note that if the ipmi drivers are unloaded (must use "service ipmi stop-all" or
an equivalent to unload the ipmi_watchdog driver) and then if the i6300esb
driver is unloaded (the /dev/watchdog device node is removed at that point) and
then if the ipmi drivers are reloaded ("service ipmi start"), all appears to be
working well with the ipmi_watchdog driver.

So, it seems there are two problems with ipmi on systems that have a second
on-board watchdog chip:

1)  the init.d/ipmi script does not properly return a failure when the
ipmi_watchdog driver is improperly loaded.

2) the ipmi driver does not return a failure to the startup script for
reporting to the user when the ipmi_watchdog driver can not be fully loaded. 
Additionally, the ipmi_watchdog driver remains "partially" loaded so a user
might think it was operational but it is not.

This can also be reproduced on platforms with an Intel TCO Watchdog Timer device (supported by the
iTCO_wdt driver, which supports the ICH10 TCO device):
00:1f.0 ISA bridge: Intel Corporation 82801JIB (ICH10) LPC Interface Controller
 
---System Management Component Data--- 
System management type: BMC supported by OpenIPMI driver 
 
Note that this problem was originally reported in a comment to LTC bugzilla 50564, Mirrored to Red
Hat bugzilla 475536.  That bugzilla has been updated to point at this one.
=Comment: #2=================================================
Roger Mach <bigmach@us.ibm.com> - 
A release note or Tech Tip is needed to help customers recognize when they
will need to move any on-board (very spartan) watchdog driver out of the way to
allow use of the ipmi_watchdog driver and how to accomplish this move/switch.
Comment 1 IBM Bug Proxy 2009-08-18 03:20:48 EDT
------- Comment From kamaleshb@in.ibm.com 2009-08-18 03:13 EDT-------
Hello Redhat,

Any updates on the bug ?
Thanks.
Comment 3 Jan Safranek 2009-08-25 04:27:51 EDT
I have finally found HW where I can reproduce this.

(In reply to comment #0)
> 1)  the init.d/ipmi script does not properly return a failure when the
> ipmi_watchdog driver is improperly loaded.

Question is, *how* can the init.d/ipmi tell, if ipmi_watchdog driver is improperly loaded. Modprobe does not say anything and returns exit code 0. I don't think that watching dmesg for "Unable to register misc device" is way to go. Do you have better idea?

However, it might be possible to check, if there already is /dev/watchdog present in the system and print something like:

Starting ipmi drivers: [  OK  ]
Starting ipmi_watchdog driver:
/dev/watchdog is already present, ipmi_watchdog might not be initialized correctly   [WARNING]

Does it sound acceptable to you?
Comment 4 IBM Bug Proxy 2009-08-31 12:30:30 EDT
------- Comment From bigmach@us.ibm.com 2009-08-31 12:24 EDT-------
(In reply to comment #9)
> I have finally found HW where I can reproduce this.
>
> (In reply to comment #0)
> > 1)  the init.d/ipmi script does not properly return a failure when the
> > ipmi_watchdog driver is improperly loaded.
>
> Question is, *how* can the init.d/ipmi tell, if ipmi_watchdog driver is
> improperly loaded. Modprobe does not say anything and returns exit code 0. I
> don't think that watching dmesg for "Unable to register misc device" is way to
> go. Do you have better idea?
>
> However, it might be possible to check, if there already is /dev/watchdog
> present in the system and print something like:
>
> Starting ipmi drivers: [  OK  ]
> Starting ipmi_watchdog driver:
> /dev/watchdog is already present, ipmi_watchdog might not be initialized
> correctly   [WARNING]
>
> Does it sound acceptable to you?

Yes, I believe checking for /dev/watchdog is a good approach.  However, I think the init script should fail (and unload the ipmi modules) instead of simply issuing a warning, which would force the user to resolve the conflict.  Otherwise we are relying on the user noticing an error message during boot, which could be easily overlooked.
Comment 5 IBM Bug Proxy 2009-09-09 01:30:38 EDT
------- Comment From kamaleshb@in.ibm.com 2009-09-09 01:29 EDT-------
Hello Redhat,

Does the above approach sounds ok ?
Thanks.
Comment 6 Jan Safranek 2009-09-14 12:14:52 EDT
ipmi service has sophisticated error handling and it seems it's used to return exit code, which indicates what went wrong during service startup. E.g. when /dev/watchdog cannot be created from whatever reasons, IPMI module stays loaded and exit code is 8 (+ appropriate [FAILED] is displayed on console). I'd follow this approach.

I changed the [WARNING] to [FAILED] and sent a patch upstream, let's follow the discussion there:

http://sourceforge.net/mailarchive/forum.php?thread_name=20090914160902.29060.78511.stgit%40honza-ntb&forum_name=openipmi-developer
Comment 7 IBM Bug Proxy 2009-10-07 05:00:39 EDT
------- Comment From kamaleshb@in.ibm.com 2009-10-07 04:56 EDT-------
Hello Redhat,

Patch has been committed upstream (http://openipmi.cvs.sourceforge.net/viewvc/openipmi/OpenIPMI/ipmi.init?view=diff&r1=1.10&r2=1.11), which release would be having this patch ?
Thanks.
Comment 8 Jan Safranek 2009-10-07 06:03:21 EDT
(In reply to comment #7)
> Patch has been committed upstream
> (http://openipmi.cvs.sourceforge.net/viewvc/openipmi/OpenIPMI/ipmi.init?view=diff&r1=1.10&r2=1.11),
> which release would be having this patch ?

Well... the next one, I guess? :)

I already have request to 'include latest version of OpenIPMI', see bug #514816, so if there is new OpenIPMI release and everything goes well, I'll include it in RHEL 5.5

If not, I am going to include this patch anyway, it's really simple and harmless.
Comment 10 RHEL Product and Program Management 2009-11-06 13:55:42 EST
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".
Comment 15 IBM Bug Proxy 2009-11-11 18:00:22 EST
------- Comment From kumarr@linux.ibm.com 2009-11-11 17:58 EDT-------
(In reply to comment #17)
> This request was evaluated by Red Hat Product Management for
> inclusion, but this component is not scheduled to be updated in
> the current Red Hat Enterprise Linux release. If you would like
> this request to be reviewed for the next minor release, ask your
> support representative to set the next rhel-x.y flag to "?".

Hi RedHat,

This bug along with other OpenIPMI fixes are important to us. Could you please confirm that this and RH Bugzilla 514816 will be included in RHEL 5.5?

Thanks!
Comment 16 Jan Safranek 2009-11-12 06:10:39 EST
(In reply to comment #15)
> This bug along with other OpenIPMI fixes are important to us. Could you please
> confirm that this and RH Bugzilla 514816 will be included in RHEL 5.5?

This bug should be fixed in 5.5. In general, all bugs which are in ON_QA state will be there.

Bug #514816 is about updating OpenIPMI-2.0.16 to newer release. But there has not been any new release so far, 2.0.16 is still the latest one and the bug was closed.
Comment 19 errata-xmlrpc 2009-12-02 07:20:32 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1629.html