Bug 153958

Summary: OpenIPMI kernel drivers in 2.6.9 RHEL4 crash after loaded overlnight
Product: Red Hat Enterprise Linux 4 Reporter: Shawn Starr <sstarr>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED UPSTREAM QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, minyard, pfrields, riel, wwlinuxengineering
Target Milestone: ---   
Target Release: ---   
Hardware: ia32e   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-03 22:01:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shawn Starr 2005-04-06 02:35:58 UTC
Hey Dave, has anyone reported this problem in RHEL4? Im about to test the 
OpenIPMI drivers in a 2.6.12-rc1/rc2 kernel soon to see if this is 
reproducable. I am also testing this on EM64T mode to see if this occurs as 
well.

Description of problem:
IPMI kernel drivers in RHEL4 crash, cannot unload if left loaded overnight.

Version-Release number of selected component (if applicable): final 2.6.9 
kernel in RHEL4: 2.6.9-5.0.3.ELsmp on x86 mode.

How reproducible:
Load all the OpenIPMI kernel modules on a Dell PE1850, run some IPMItool 
commands, leave driver loaded overnight, rmmod cannot unload, hangs. 

Steps to Reproduce:
1.modprobe all the OpenIPMI kernel modules
2.install IPMItool RPM from ipmitool.sf.net 
3.Run some ipmitool commands with -I open, display some sensor other info
  
Actual results:
IPMI Runs as expected, kernel drivers are unstable after long periods of being 
loaded.

Expected results:
IPMI Runs as expected, kernel modules remain stable after long periods of being 
loaded.

Additional info:
This was done an Dell PE1850 in x86 mode not EM64T mode.

dmesg snippet:
ipmi message handler version v33
IPMI System Interface driver version v33, KCS version v33, SMIC version v33, BT 
version v33
ipmi_si: Found SMBIOS-specified state machine at I/O address 0xca8
 IPMI kcs interface initialized
ipmi device interface version v33
Copyright (C) 2004 MontaVista Software - IPMI Powerdown via sys_reboot version 
v33.
IPMI poweroff: Found a chassis style poweroff function
IPMI Watchdog: driver version v33

I have confirmed from Dell Engineering the firmware of BMC is fine.

Comment 1 Shawn Starr 2005-06-01 14:23:17 UTC
lspci info:

00:00.0 Host bridge: Intel Corp. E7520 Memory Controller Hub (rev 09)
00:02.0 PCI bridge: Intel Corp. E7525/E7520/E7320 PCI Express Port A (rev 09)
00:04.0 PCI bridge: Intel Corp. E7525/E7520 PCI Express Port B (rev 09)
00:05.0 PCI bridge: Intel Corp. E7520 PCI Express Port B1 (rev 09)
00:06.0 PCI bridge: Intel Corp. E7520 PCI Express Port C (rev 09)
00:1d.0 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI Controller 
#1 (rev 02)
00:1d.1 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI Controller 
#2 (rev 02)
00:1d.2 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB UHCI #3 (rev 02)
00:1d.7 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB2 EHCI 
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge 
(rev 02)
00:1f.1 IDE interface: Intel Corp. 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 
02)
01:00.0 PCI bridge: Intel Corp. 6700PXH PCI Express-to-PCI Bridge A (rev 09)
01:00.2 PCI bridge: Intel Corp. 6700PXH PCI Express-to-PCI Bridge B (rev 09)
02:0b.0 Network controller: MYRICOM Inc. Myrinet 2000 Scalable Cluster 
Interconnect (rev 04)
03:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-
MPT Dual Ultra320 SCSI (rev 08)
03:0c.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)
06:00.0 PCI bridge: Intel Corp. 6700PXH PCI Express-to-PCI Bridge A (rev 09)
06:00.2 PCI bridge: Intel Corp. 6700PXH PCI Express-to-PCI Bridge B (rev 09)
07:07.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller 
(rev 05)
08:08.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller 
(rev 05)
0a:05.0 Class ff00: Dell Remote Access Card 4 Daughter Card
0a:05.1 Class ff00: Dell Remote Access Card 4 Daughter Card Virtual UART
0a:05.2 Class ff00: Dell Remote Access Card 4 Daughter Card SMIC interface
0a:06.0 IDE interface: Silicon Image, Inc. (formerly CMD Technology Inc) 
PCI0680 Ultra ATA-133 Host Controller (rev 02)
0a:0d.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 
7000/VE]

The kernel will hang if the ipmi_si kernel driver is loaded.

Comment 2 Matt Domsch 2005-06-01 14:35:13 UTC
The hang on insmod of ipmi_si happens when a Dell DRAC4 card is present in the 
system.  Dell Engineering is investigating this.  It appears to be coming from 
the wait_event() call in ipmi_si.c:ipmi_register_smi().

                        /* Wait for the channel info to be read. */
                        up_read(&interfaces_sem);
                        wait_event((*intf)->waitq,
                                   ((*intf)->curr_channel>=IPMI_MAX_CHANNELS));
                        down_read(&interfaces_sem);

which never gets the completion.

The 2.4.x kernel openipmi driver v35, and 2.6.x kernel driver v33 fail 
similarly.  I haven't yet been able to try with the 2.6.12-rc4-mm2 + patches 
as posted to LKML, but suspect similar behavior would occur.

Comment 3 Shawn Starr 2005-06-01 14:40:19 UTC
The hang occurs two ways:

I should clarify this in the bug

1) sometimes rebooting the machine, and then trying to load the ipmi_si driver 
will just hang with insmod

2) sometimes if the driver is successfully loaded, it will work for a period of 
time, with the DRAC4 card visible, but after a period of time, rmmod will hang 
and openipmi will hang trying to communicate to the BMC

its not always going to hang, when loading the driver with insmod

Comment 4 Shawn Starr 2005-06-01 14:49:24 UTC
s/openipmi/ipmitool userland tools.

Comment 5 Matt Domsch 2005-06-02 17:30:35 UTC
In my lab, tests with PE2800 RHEL3 U5 kernel and RHEL4 U1 beta kernel, with 
and without DRAC4/i (small add-in daughtercard) succeed to insmod no problems.

I'll test with PE1850 next.

Comment 6 Matt Domsch 2005-06-03 21:59:35 UTC
I believe this to be a bug in the BMC firmware, which has been corrected in an 
internal build, and will be released to users in August.  Individual customers 
needing the fixed firmware before general release must call Dell tech support, 
and ask the technician to "escalate to Engineering" to obtain the BMC firmware 
which addresses the stuck attention bit problem.  Customers will be required 
to sign a Dell beta-code NDA and must have support from Dell.

Comment 7 Matt Domsch 2005-06-03 22:01:49 UTC
As this is not a Red Hat kernel bug, I am going to close this issue.