Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Additional patch necessary to resolve a startup race condition. It was possible (though extraordinarly unlikely) that a message could come in before the upper layer was ready to handle it. This patch splits the startup processing of an IPMI interface into two parts, one to get ready and one to actually start the processes to receive messages from the interface. Corey Minyard, ipmi driver maintainer for kernel.org, provided the fix. I backported that from the 2.6.16-mm2 kernel onto RHEL3, RHEL4, and SLES9. I sent Corey the patches I made, and he acknowledged they look correct. Raising to Sev 1, as this oops the kipmi0 kernel thread, and crashes rmmod, if the race is hit. Severity set to: Urgent This event sent from IssueTracker by sbenjamin issue 83761
An updated patch was sent to Dell for review, and test kernels were made available for regression testing.
ack. Thanks for the cleanup. Now I've got to see how I missed that in my tests...
the diff was in ipmi_kcs_intf.c, which I don't build in my DKMS package. It's not needed, as that module is obsoleted by the combination of ipmi_si_drv and ipmi_devintf modules which are being built. But it doesn't hurt anything to be built, it's just highly unlikely to be used. :-) Whew!
Hi, Matt. I believe the patch that Peter has posted for RHEL3 U8 has a bug that would make an "insmod ipmi_kcs_drv.o" crash. The problem is that the new 1st arg to ipmi_register_smi() from init_one_kcs() in drivers/char/ipmi/ipmi_kcs_intf.c refers to a "ipmi_smi_handlers" struct that doesn't have an initializer for the new "start_processing" field. Do you have a fix for this? Is it better to add an empty handler func in ipmi_kcs_intf.c or is it better to make ipmi_register_smi() check for a non-NULL "start_processing" pointer? We need to have this resolved tomorrow (Tuesday) if you want this in U8. Thanks in advance. -ernie
Created attachment 128188 [details] incremental patch to fix ipmi_kcs_intf.c for missing "start_processing" func Here's an incremental patch for addressing the missing initialization for the "start_processing" function pointer in the "ipmi_smi_handlers" struct referenced in the call from init_one_kcs() to ipmi_register_smi(). Please test and confirm whether this is okay by end-of-Tuesday (today). Thanks in advance. -ernie
Ernie, this is what I wrote to Peter in email yesterday. Peter, the ipmi_kcs_intf.c file/module is really obsolete, it's been superceeded by ipmi_devintf (which isn't KCS-specific). Hence I stopped worrying about ipmi_kcs_intf.c (or even packaging it in my dkms driver packages). That was easy enough for me to ignore when we first started fixing up the driver a year ago to work on Dell systems (most of which are KCS). All our testing for the last year has been using the ipmi_si_drv and ipmi_devintf modules. I can see from a RHEL3 sustaining perspective though that you wouldn't want to break anyone who might be using ipmi_kcs_intf by not building it. But, I haven't done any work on it, (and with the new baby girl born this week, can't any time soon). Perhaps to solve the "not breaking existing customers" problem, we could employ the kernels obsoletes method at kernel install time, and obsolete ipmi_kcs_intf module, replacing it with ipmi_si_drv and ipmi_devintf instead? :-) Then we could avoid building ipmi_kcs_intf all together. -Matt
In addition, if you wanted to test existing KCS systems, that includes all of our 8G servers plus others. See http://linux.dell.com/ipmi.shtml for details. Red Hat should have plenty of these systems in the test grid.
Thanks for the follow-up, Matt. We've decided to simply provide an empty handler function for ipmi_kcs_intf.c to avoid the crash.
OK, sounds sane to me. That'll leave the race condition there, though it's extremely unlikely to trigger (being timer-driven only now; ipmi_kcs_intf.c doesn't have a kernel thread to speed up command processing, which is where we saw the one failure that lead to this whole patch). And I expect no one will use the ipmi_kcs_intf module anyhow, so no worries there. Just note for tech support, if someone does hit that race with this module, have them switch to ipmi_devintf and ipmi_si_drv. :-)
A fix for this problem has just been committed to the RHEL3 U8 patch pool this evening (in kernel version 2.4.21-41.EL).
Patch has been applied in RHEL3-U8(kernel 2.4.21-43).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0437.html