Bug 1128454 - bmc-watchdog kills IBM x3650
Summary: bmc-watchdog kills IBM x3650
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: freeipmi
Version: 6.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Ales Ledvinka
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-10 14:22 UTC by Tore H. Larsen
Modified: 2016-09-20 04:33 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-11 12:37:15 UTC


Attachments (Terms of Use)

Description Tore H. Larsen 2014-08-10 14:22:09 UTC
Description of problem:

bmc-watchdog service enabled by default.
When RHEL6.5 (no updates) are installed on IBM x3650

Version-Release number of selected component (if applicable):

	Manufacturer: IBM
	Product Name: IBM System x3650 -[7979KRG]-

[root@arc-orca1 log]# which bmc-watchdog
/usr/sbin/bmc-watchdog
[root@arc-orca1 log]# rpm -qf /usr/sbin/bmc-watchdog
freeipmi-bmc-watchdog-1.2.1-3.el6.x86_64
[root@arc-orca1 log]# 


How reproducible:

        Seen on 2 of 4 machines so far.

Steps to Reproduce:
1. Install RHEL 6.5 on IBM x3650 mt 7979
2. chkconfig bmc-watchdog on   (default)
3. wait.

Actual results:

      Should not reset machine. Load was not high, false positive.

Expected results:

      

Additional info:

Comment 1 Tore H. Larsen 2014-08-10 14:22:58 UTC
Logs:

[root@arc-orca1 log]# egrep -e "bmc-watchdog|iklog" messages
Aug 10 09:39:50 arc-orca1 /usr/sbin/bmc-watchdog[4172]: fiid_obj_get: 'present_countdown_value': data not available
Aug 10 12:35:13 arc-orca1 /usr/sbin/bmc-watchdog[4052]: fiid_obj_get: 'timer_state': data not available
[root@arc-orca1 log]# egrep -e "bmc-watchdog|imklog" messages
Aug 10 09:39:50 arc-orca1 /usr/sbin/bmc-watchdog[4172]: fiid_obj_get: 'present_countdown_value': data not available
Aug 10 09:57:06 arc-orca1 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Aug 10 11:33:30 arc-orca1 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Aug 10 12:35:13 arc-orca1 /usr/sbin/bmc-watchdog[4052]: fiid_obj_get: 'timer_state': data not available
Aug 10 12:52:35 arc-orca1 kernel: imklog 5.8.10, log source = /proc/kmsg started.

Comment 3 Ales Ledvinka 2014-08-11 08:12:50 UTC
Could you please provide following:
IPMI kernel modules probed/load or unload times from boot
number of instances of bmc-watchdog startup and times
configuration of driver-type in freeipmi.conf if present.

Comment 4 Tore H. Larsen 2014-08-11 08:27:42 UTC
I noticed after submitting bug that ipmi wasn't loaded. It was turned off in customer kickstart cfg. 

After turning bmc-watchdog of I have not seen any events

[root@arc-orca4 ~]# chkconfig --list | egrep -e "bmc-|ipmi"
bmc-watchdog   	0:off	1:off	2:off	3:off	4:off	5:off	6:off
ipmi           	0:off	1:off	2:on	3:on	4:on	5:on	6:off
ipmidetectd    	0:off	1:off	2:off	3:on	4:off	5:on	6:off
ipmievd        	0:off	1:off	2:off	3:off	4:off	5:off	6:off


Maybe bmc-watchdog should require ipmi to start  too? 

[root@arc-orca4 ~]# head /etc/init.d/bmc-watchdog 
#!/bin/sh
#
# chkconfig: - 99 01
# description: bmc-watchdog startup script
#
### BEGIN INIT INFO
# Provides: bmc-watchdog
# Required-Start: $network $remote_fs $syslog
# Required-Stop:  $network $remote_fs $syslog
# Default-Start:  3 5

Comment 5 Ales Ledvinka 2014-08-11 08:38:34 UTC
Could you confirm:


A) that the message:
fiid_obj_get: 'timer_state': data not available

appears only when bmc-watchdog is started prior all the related kernel IPMI modules are loaded?


B) And that the watchdog reboot happens only once between the first installation of freeipmi-bmc-watchdog and on first run of bmc-watchdog within the operation window between the installation and first reboot?

C) Also could you confirm that the /etc/modprobe.d/freeipmi-modalias.conf file prevents subsequent bmc-watchdog reboots? with all of the relevant and present IPMI kernel modules loaded ( ipmi_si ipmi_devintf ipmi_msghandler )

D) And that the line:
fiid_obj_get: 'timer_state': data not available

appears no more with the bmc-watchdog using kernel IPMI interface.



If this is exactly the case (please report any differing behavior). The workaround to avoid the "data not available" and the first reboot should be:

install freeipmi-bmc-watchdog
modprobe ipmi_si ipmi_devintf ipmi_msghandler
service bmc-watchdog start

Comment 6 Tore H. Larsen 2014-08-11 08:51:33 UTC
A)Yes, do not see the fiid_obj_get error if I started ipmi (and loaded relevant modules) in advance.

[root@arc-orca4 log]# chkconfig bmc-watchdog on
[root@arc-orca4 log]# service bmc-watchdog start
Starting bmc-watchdog:                                     [  OK  ]

[root@arc-orca4 log]# dmesg | tail -5
lo: Disabled Privacy Extensions
fuse init (API version 7.13)
 rport-1:0-2: blocked FC remote port time out: removing rport
 rport-2:0-2: blocked FC remote port time out: removing rport
ipmi device interface

[root@arc-orca4 log]# tail -2 /var/log/messages

Aug 10 12:54:38 arc-orca4 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2014-08-10-12:54:36-4207'
Aug 10 14:25:34 arc-orca4 kernel: ipmi device interface

B) No, it happened as long as "chkconfig ipmi off ; chkconfig bmc-watchdog on", not only after initial install and initial reboot.

C) No such file here.

[root@arc-orca4 log]# ll /etc/modprobe.d/*ipmi*
ls: cannot access /etc/modprobe.d/*ipmi*: No such file or directory
[root@arc-orca4 log]# 

Workaround 1:  disable bmc-watchdog  

Possible Workaround 2:  enable ipmi (and load relevant modules) and start bmc-watchdog.

Conclusion:  false positive and bmc-watchdog should require ipmi first.

Comment 7 Ales Ledvinka 2014-08-11 12:34:04 UTC
C) correct. checked that 1.2.1-3 does not have it. 1.2.1-6 has the module aliases file.

B) expected because of C)


Note You need to log in before you can comment on or make changes to this bug.