Bug 969554 - ipmi_si hangs 1/4 of Power Edge 6100 machines.
ipmi_si hangs 1/4 of Power Edge 6100 machines.
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.4
alpha Linux
high Severity high
: rc
: ---
Assigned To: Tony Camuso
Red Hat Kernel QE team
: Regression
: 969555 (view as bug list)
Depends On:
Blocks: 832743
  Show dependency treegraph
 
Reported: 2013-05-31 15:09 EDT by Ben Woodard
Modified: 2015-11-18 10:21 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-10 14:29:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ben Woodard 2013-05-31 15:09:10 EDT
Description of problem:
the statically included ipmi module prevents about 1/4 of our poweredge 6100 nodes from booting without the ipmi_si.trydmi=0 kernel parameter. 


Version-Release number of selected component (if applicable):
358.6.1 or whatever it currently is in 6.4.z


How reproducible:
1/4 of the time on our production cluster Sierra.

2013-05-30 14:00:37 start_kcs_transaction - 18 01
2013-05-30 14:00:37 KCS: State = 1, 4
2013-05-30 14:00:37 KCS: State = 2, e
2013-05-30 14:00:37 KCS: State = 2, 8c
2013-05-30 14:00:37 KCS: State = 3, 86
2013-05-30 14:00:37 KCS: State = 3, 84
2013-05-30 14:00:37 KCS: State = 4, 8e
2013-05-30 14:00:37 KCS: State = 4, 8c
2013-05-30 14:00:37 KCS: State = 5, 86
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45

<foraker> 2013-05-30 11:31:17 ipmi_si: Trying ACPI-specified kcs state machine at i/o add
<foraker> ress 0xca2, slave address 0x0, irq 0
<foraker> 2013-05-30 11:31:18 Refined TSC clocksource calibration: 2800.099 MHz.
<foraker> 2013-05-30 11:31:18 Switching to clocksource tsc
<foraker> 2013-05-30 11:35:17 INFO: task swapper:1 blocked for more than 120 seconds.
<foraker> 2013-05-30 11:35:17 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<foraker> 2013-05-30 11:35:17 swapper       D 0000000000000000     0     1      0 0x00000
<foraker> 000
<foraker> 2013-05-30 11:35:17  ffff88063b6d9c30 0000000000000046 0000000000000000 ffff880
<foraker> 639281800
<foraker> 2013-05-30 11:35:17  ffff880338c58000 ffff88063b6d9cb0 ffff88063b6d9c30 ffffffff812d535e
<foraker> 2013-05-30 11:35:17  ffff88033b7fdab8 ffff88063b6d9fd8 000000000000fb88 ffff88033b7fdab8
<foraker> 2013-05-30 11:35:17 Call Trace:
<foraker> 2013-05-30 11:35:17  [<ffffffff812d535e>] ? i_ipmi_request+0x28e/0x780
<foraker> 2013-05-30 11:35:17  [<ffffffff812d5965>] get_guid+0x115/0x140
<foraker> 2013-05-30 11:35:17  [<ffffffff81082364>] ? mod_timer+0x144/0x220
<foraker> 2013-05-30 11:35:17  [<ffffffff81096fd0>] ? autoremove_wake_function+0x0/0x40
<foraker> 2013-05-30 11:35:17  [<ffffffff812d86c3>] ipmi_register_smi+0x413/0xd80
<foraker> 2013-05-30 11:35:17  [<ffffffff81081410>] ? process_timeout+0x0/0x10
<foraker> 2013-05-30 11:35:17  [<ffffffff812dbe6b>] ? clear_obf+0x1b/0x20
<foraker> 2013-05-30 11:35:17  [<ffffffff812dc309>] ? kcs_event+0x3c9/0x630
<foraker> 2013-05-30 11:35:17  [<ffffffff812dab2a>] try_smi_init+0x52a/0x880
<foraker> 2013-05-30 11:35:17  [<ffffffff811f9e06>] ? sysfs_create_file+0x26/0x30
<foraker> 2013-05-30 11:35:17  [<ffffffff814fbb70>] init_ipmi_si+0x77a/0x965
<foraker> 2013-05-30 11:35:17  [<ffffffff814fb3f6>] ? init_ipmi_si+0x0/0x965
<foraker> 2013-05-30 11:35:17  [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0
<foraker> 2013-05-30 11:35:17  [<ffffffff81c278d0>] kernel_init+0x2a2/0x2fe
<foraker> 2013-05-30 11:35:17  [<ffffffff8100c0ca>] child_rip+0xa/0x20
<foraker> 2013-05-30 11:35:17  [<ffffffff81c2762e>] ? kernel_init+0x0/0x2fe
<foraker> 2013-05-30 11:35:17  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Comment 1 Ben Woodard 2013-05-31 15:10:21 EDT
Subject: 	Public BZs
Date: 	Wed, 20 Mar 2013 08:59:52 -0700
From: 	Mark A.Grondona <mgrondona@llnl.gov>
To: 	woodard@redhat.com
CC: 	foraker1@llnl.gov


Ben:

We would like all our BZs to be marked public by default.
Feel free to paste this into any of our cases that are escalated to BZs.

Thanks,
mark
Comment 2 Ben Woodard 2013-05-31 15:17:02 EDT
*** Bug 969555 has been marked as a duplicate of this bug. ***
Comment 4 Ben Woodard 2013-05-31 16:31:45 EDT
More of the log:
2013-05-30 14:00:37 start_kcs_transaction - 18 01
2013-05-30 14:00:37 KCS: State = 1, 4
2013-05-30 14:00:37 KCS: State = 2, e
2013-05-30 14:00:37 KCS: State = 2, 8c
2013-05-30 14:00:37 KCS: State = 3, 86
2013-05-30 14:00:37 KCS: State = 3, 84
2013-05-30 14:00:37 KCS: State = 4, 8e
2013-05-30 14:00:37 KCS: State = 4, 8c
2013-05-30 14:00:37 KCS: State = 5, 86
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 5
2013-05-30 14:00:37 start_kcs_transaction - 18 2f
2013-05-30 14:00:37 KCS: State = 1, 4
2013-05-30 14:00:37 KCS: State = 2, e
2013-05-30 14:00:37 KCS: State = 2, 8c
2013-05-30 14:00:37 KCS: State = 3, 86
2013-05-30 14:00:37 KCS: State = 3, 84
2013-05-30 14:00:37 KCS: State = 4, 8e
2013-05-30 14:00:37 KCS: State = 4, 8c
2013-05-30 14:00:37 KCS: State = 5, 86
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 5
2013-05-30 14:00:37 start_kcs_transaction - 18 30 08
2013-05-30 14:00:37 KCS: State = 1, 4
2013-05-30 14:00:37 KCS: State = 2, e
2013-05-30 14:00:37 KCS: State = 2, 8c
2013-05-30 14:00:37 KCS: State = 3, 86
2013-05-30 14:00:37 KCS: State = 3, 84
2013-05-30 14:00:37 KCS: State = 3, 86
2013-05-30 14:00:37 KCS: State = 3, 84
2013-05-30 14:00:37 KCS: State = 4, 8e
2013-05-30 14:00:37 KCS: State = 4, 8c
2013-05-30 14:00:37 KCS: State = 5, 86
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 5
2013-05-30 14:00:37 KCS: State = 0, 4
2013-05-30 14:00:37 start_kcs_transaction - 18 31
2013-05-30 14:00:37 KCS: State = 1, 4
2013-05-30 14:00:37 KCS: State = 2, e
2013-05-30 14:00:37 KCS: State = 2, 8c
2013-05-30 14:00:37 KCS: State = 3, 86
2013-05-30 14:00:37 KCS: State = 3, 84
2013-05-30 14:00:37 KCS: State = 4, 8e
2013-05-30 14:00:37 KCS: State = 4, 8c
2013-05-30 14:00:37 KCS: State = 5, 86
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 5
2013-05-30 14:00:37 start_kcs_transaction - 18 35
2013-05-30 14:00:37 KCS: State = 1, 4
2013-05-30 14:00:37 KCS: State = 2, e
2013-05-30 14:00:37 KCS: State = 2, 8c
2013-05-30 14:00:37 KCS: State = 3, 86
2013-05-30 14:00:37 KCS: State = 3, 84
2013-05-30 14:00:37 KCS: State = 4, 8e
2013-05-30 14:00:37 KCS: State = 4, 8c
2013-05-30 14:00:37 KCS: State = 5, 86
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 45
2013-05-30 14:00:37 KCS: State = 5, 46
2013-05-30 14:00:37 KCS: State = 5, 5
2013-05-30 14:00:37 KCS: State = 0, 4

The "18 31" (get msg flags) and "18 35" (read event msg buffer) spin
basically indefinitely.  I believe it spins until a soft lockup occurs.

We have yet to figure out why 1/4 of the nodes are afflicted but not 100% of the nodes. It defies obvious class sorting. There is no apparent difference between the SEL on the machines that were affected and the ones that were not. They both have about 100 errors. Any guesses or clues that you can think of? We can try to verify these hypothesis

Also 6.2 is affected but with 6.2 we didn't notice the problem because we didn't load the impi_si module. Most of our use was using FreeIPMI over the network rather than within using it within the node. One of the FreeIPMI developers is on LLNL's staff and he said that he didn't think that he used those two IPMI commands. At one point, I thought that found a case that might have been the same on RHEL5 but I can't seem to find it now. So this problem might go way back and has just come to the surface now that we statically build IPMI into the kernel rather than leaving it as a rarely loaded kernel module.
Comment 5 Jim Foraker 2013-06-03 13:05:15 EDT
     Innnteresting.  This is what I get on one of the affected nodes:

# sierra12 /root > ipmi-raw 0x00 0x06 0x31
rcvd: 31 00 02

     0x31 is "get message flags", and the response of 0x02 means the event
message buffer is full. So rightfully, the driver next attempts to read a
message from the event message buffer:

# sierra12 /root > ipmi-raw 0x00 0x06 0x35
rcvd: 35 80

     The 0x80 response is "data not available (queue / buffer empty)"
according to the spec.  And so we get stuck in an endless loop trying to
clear a buffer that's simultaneously full and empty.  So to delve a bit
further, I did a "Get BMC global enables":

# sierra12 /root > ipmi-raw 0x00 0x06 0x2f
rcvd: 2F 00 0D

     The 0x0d tells us that the event message buffer is indeed enabled.
So let's try clearing the event message flag and reading the result:


# sierra12 /root > ipmi-raw 0x00 0x06 0x30 0x02
rcvd: 30 00
# sierra12 /root > ipmi-raw 0x00 0x06 0x31
rcvd: 31 00 02

     Nope, the bit is still stuck.  OK, well, let's try something drastic.
Let's tell the BMC to do a cold reset.


# sierra12 /root > ipmi-raw 0x00 0x06 0x02
rcvd: 02 00

     This caused IPMI commands to start timing out for a while (30-60
seconds maybe, I wasn't paying enough attention).  But when things started
responding again,

# sierra12 /root > ipmi-raw 0x00 0x06 0x01
rcvd: 01 00 25 01 01 26 02 BF 59 50 00 34 00 6E 6D 00 00
# sierra12 /root > ipmi-raw 0x00 0x06 0x31
rcvd: 31 00 00

     Voila, our stuck flag bit is cleared!

     I tried a warm reset as well, which didn't seem to be enough to clear out the issue.  The spec seems to warn against doing a cold reset on a running system, so if there are any suggestions for less invasive things to try to clear the stuck bit, I'll gladly test them.
Comment 6 Tony Camuso 2013-06-03 13:15:19 EDT
Good sleuthing, Jim.

Unfortunately, I don't have a workaround yet. I'm still trying to understand where the problem exists in the source code so I can devise a patch. Once I have that level of understanding, I may be able to suggest a workaround, if indeed one is feasible.
Comment 7 Tony Camuso 2013-06-18 12:49:17 EDT
Jim, Ben,

Two questions.

1. On what platform (vendor/model) did you see this problem?

2. How did you generate the logs?
Comment 9 Tony Camuso 2013-06-18 14:55:22 EDT
So, the logs are the output of ...

# ipmi-sel
Comment 10 Jim Foraker 2013-06-19 17:46:02 EDT
Tony,

1) The hardware are Dell PowerEdge C6100's.

2) The log was created by passing 'ipmi_si.kcs_debug=7' on the kernel command line and capturing console output.
Comment 11 Tony Camuso 2013-06-28 06:30:48 EDT
I've had a C6100 for a couple days, and put it into a boot loop with most recent RHEL6 kernel (393). The system has booted over 300 times with no errors or warnings. 

Jim, 

Please provide a core dump from one of the systems that hang. Let the watchdog time out and crash the system.
Comment 12 Ben Woodard 2013-07-01 15:58:43 EDT
Tony, 

Jim is on vacation this week. I'll pass this onto the sysadmin for one of the systems.

What is your bios version?
Comment 13 Tony Camuso 2013-07-02 06:20:48 EDT
Ben,

# dmidecode | grep -e Date -e Vendor -e Version -e Product | head -n 4
	Vendor: Dell Inc.
	Version: 1.51.00
	Release Date: 10/06/2010
	Product Name: C6100
Comment 14 Tony Camuso 2013-09-24 13:15:09 EDT
I have been unable to duplicate the problem on hardware that I have available.
Comment 17 John Paul Adrian Glaubitz 2015-11-17 05:34:12 EST
Hi Tony!

(In reply to Tony Camuso from comment #14)
> I have been unable to duplicate the problem on hardware that I have
> available.

This is most likely related to the fact that it apparently affects only some of the C6100 models.

We have different revisions of the C6100 and the problem does not reproduce on all of them.

The models where ipmi_si crashes have Mac addresses beginning with "00:8c:fa" while the ones where ipmi_si works begin with "00:26:6c". Apparently Dell made some major changes to the BMC controller.

I'll try to dig up the differences between the two C6100 model variants now.

Adrian
Comment 18 Jim Foraker 2015-11-17 18:27:30 EST
I have definitely seen this bug hit C6100s with 00:26:6c MAC addresses.  97% of our most affected cluster has 00:26:6c MACs, and we had an incident where ~25% of that cluster failed to boot due to it.

That said, we have worked around the issue and our hardware is disappearing soon, so LLNL is no longer concerned with this bug.
Comment 19 John Paul Adrian Glaubitz 2015-11-18 05:23:29 EST
(In reply to Jim Foraker from comment #18)
> I have definitely seen this bug hit C6100s with 00:26:6c MAC addresses.  97%
> of our most affected cluster has 00:26:6c MACs, and we had an incident where
> ~25% of that cluster failed to boot due to it.

Yeah, you're right. Our initial observation was wrong.

However, it turned out that your above hint which suggested sending raw IPMI commands did the trick!

Quoting:

> Nope, the bit is still stuck.  OK, well, let's try something drastic.
> Let's tell the BMC to do a cold reset.
>
>
> # sierra12 /root > ipmi-raw 0x00 0x06 0x02
> rcvd: 02 00

Without the IPMI modules being loaded (right after booting the machine), I sent the above IPMI raw byte commands and after a short while, IPMI functionality was restored to normal.

@Tony: It might be a good idea to modify the ipmi_si driver code such that it the kernel sends the above reset byte sequence in order to make the device responsive again, wouldn't it? Maybe you could add an additional parameter to the kernel module to achieve that or have the module send the sequence automatically after the device times out.

> That said, we have worked around the issue and our hardware is disappearing
> soon, so LLNL is no longer concerned with this bug.

Sadly, we don't have sufficient funds here at the university to replace all that hardware soon. In fact, we often buy used hardware to upgrade what we have in the server room. And this is one of Germany's largest universities we talk about ...

Adrian
Comment 20 Tony Camuso 2015-11-18 10:21:09 EST
(In reply to John Paul Adrian Glaubitz from comment #19)

> @Tony: It might be a good idea to modify the ipmi_si driver code such that
> it the kernel sends the above reset byte sequence in order to make the
> device responsive again, wouldn't it? Maybe you could add an additional
> parameter to the kernel module to achieve that or have the module send the
> sequence automatically after the device times out.

Adrian, I will have a test kernel for you shortly. It contains a number of upstream fixes and enhancements. I don't see exactly what you're suggesting, but the BMC and messaging code has some changes in it that might fix this problem, albeit not intentionally or specifically.

Note You need to log in before you can comment on or make changes to this bug.