Red Hat Bugzilla – Bug 187548
IPMI startup race condition
Last modified: 2007-11-30 17:07:09 EST
Description of problem:
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Additional patch necessary to resolve a startup race condition.
It was possible (though extraordinarly unlikely) that a message could
come in before the upper layer was ready to handle it. This patch splits
startup processing of an IPMI interface into two parts, one to get
ready and one to actually start the processes to receive messages from
Corey Minyard, ipmi driver maintainer for kernel.org, provided the fix.
I backported that from the 2.6.16-mm2 kernel onto RHEL3, RHEL4, and
I sent Corey the patches I made, and he acknowledged they look correct.
Raising to Sev 1, as this oops the kipmi0 kernel thread, and crashes rmmod,
if the race is hit.
Severity set to: Urgent
This event sent from IssueTracker by sbenjamin
An updated patch was sent to Dell for review, and test kernels were made
available for regression testing.
ack. Thanks for the cleanup. Now I've got to see how I missed that in my tests...
the diff was in ipmi_kcs_intf.c, which I don't build in my DKMS package. It's
not needed, as that module is obsoleted by the combination of ipmi_si_drv and
ipmi_devintf modules which are being built. But it doesn't hurt anything to be
built, it's just highly unlikely to be used. :-)
Hi, Matt. I believe the patch that Peter has posted for RHEL3 U8 has
a bug that would make an "insmod ipmi_kcs_drv.o" crash. The problem
is that the new 1st arg to ipmi_register_smi() from init_one_kcs() in
drivers/char/ipmi/ipmi_kcs_intf.c refers to a "ipmi_smi_handlers" struct
that doesn't have an initializer for the new "start_processing" field.
Do you have a fix for this? Is it better to add an empty handler func
in ipmi_kcs_intf.c or is it better to make ipmi_register_smi() check for
a non-NULL "start_processing" pointer?
We need to have this resolved tomorrow (Tuesday) if you want this in U8.
Thanks in advance. -ernie
Created attachment 128188 [details]
incremental patch to fix ipmi_kcs_intf.c for missing "start_processing" func
Here's an incremental patch for addressing the missing initialization
for the "start_processing" function pointer in the "ipmi_smi_handlers"
struct referenced in the call from init_one_kcs() to ipmi_register_smi().
Please test and confirm whether this is okay by end-of-Tuesday (today).
Thanks in advance. -ernie
Ernie, this is what I wrote to Peter in email yesterday.
Peter, the ipmi_kcs_intf.c file/module is really obsolete, it's been superceeded
by ipmi_devintf (which isn't KCS-specific). Hence I stopped worrying about
ipmi_kcs_intf.c (or even packaging it in my dkms driver packages). That was
easy enough for me to ignore when we first started fixing up the driver a year
ago to work on Dell systems (most of which are KCS). All our testing for the
last year has been using the ipmi_si_drv and ipmi_devintf modules.
I can see from a RHEL3 sustaining perspective though that you wouldn't want to
break anyone who might be using ipmi_kcs_intf by not building it. But, I
haven't done any work on it, (and with the new baby girl born this week, can't
any time soon).
Perhaps to solve the "not breaking existing customers" problem, we could employ
the kernels obsoletes method at kernel install time, and obsolete ipmi_kcs_intf
module, replacing it with ipmi_si_drv and ipmi_devintf instead? :-) Then we
could avoid building ipmi_kcs_intf all together.
In addition, if you wanted to test existing KCS systems, that includes all of
our 8G servers plus others. See http://linux.dell.com/ipmi.shtml for details.
Red Hat should have plenty of these systems in the test grid.
Thanks for the follow-up, Matt. We've decided to simply provide an
empty handler function for ipmi_kcs_intf.c to avoid the crash.
OK, sounds sane to me. That'll leave the race condition there, though it's
extremely unlikely to trigger (being timer-driven only now; ipmi_kcs_intf.c
doesn't have a kernel thread to speed up command processing, which is where we
saw the one failure that lead to this whole patch). And I expect no one will
use the ipmi_kcs_intf module anyhow, so no worries there. Just note for tech
support, if someone does hit that race with this module, have them switch to
ipmi_devintf and ipmi_si_drv. :-)
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-41.EL).
Patch has been applied in RHEL3-U8(kernel 2.4.21-43).
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.