Bug 187548 - IPMI startup race condition
Summary: IPMI startup race condition
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Peter Martuccelli
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: RHEL3U8CanFix 186960
TreeView+ depends on / blocked
 
Reported: 2006-03-31 20:48 UTC by Peter Martuccelli
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version: RHSA-2006-0437
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-07-20 14:01:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
incremental patch to fix ipmi_kcs_intf.c for missing "start_processing" func (538 bytes, patch)
2006-04-25 09:09 UTC, Ernie Petrides
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0437 0 normal SHIPPED_LIVE Important: Updated kernel packages for Red Hat Enterprise Linux 3 Update 8 2006-07-20 13:11:00 UTC

Description Peter Martuccelli 2006-03-31 20:48:35 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Peter Martuccelli 2006-03-31 20:50:27 UTC
Additional patch necessary to resolve a startup race condition.
It was possible (though extraordinarly unlikely) that a message could
come in before the upper layer was ready to handle it.  This patch splits
the
startup processing of an IPMI interface into two parts, one to get
ready and one to actually start the processes to receive messages from
the interface.

Corey Minyard, ipmi driver maintainer for kernel.org, provided the fix.
I backported that from the 2.6.16-mm2 kernel onto RHEL3, RHEL4, and
SLES9.
I sent Corey the patches I made, and he acknowledged they look correct.

Raising to Sev 1, as this oops the kipmi0 kernel thread, and crashes rmmod,
if the race is hit.

Severity set to: Urgent

This event sent from IssueTracker by sbenjamin 
 issue 83761
              



Comment 3 Peter Martuccelli 2006-04-12 13:45:37 UTC
An updated patch was sent to Dell for review, and test kernels were made
available for regression testing.

Comment 4 Matt Domsch 2006-04-21 19:10:19 UTC
ack.  Thanks for the cleanup.  Now I've got to see how I missed that in my tests...

Comment 5 Matt Domsch 2006-04-21 19:15:45 UTC
the diff was in ipmi_kcs_intf.c, which I don't build in my DKMS package.  It's
not needed, as that module is obsoleted by the combination of ipmi_si_drv and
ipmi_devintf modules which are being built.  But it doesn't hurt anything to be
built, it's just highly unlikely to be used. :-)

Whew!

Comment 6 Ernie Petrides 2006-04-25 05:04:42 UTC
Hi, Matt.  I believe the patch that Peter has posted for RHEL3 U8 has
a bug that would make an "insmod ipmi_kcs_drv.o" crash.  The problem
is that the new 1st arg to ipmi_register_smi() from init_one_kcs() in
drivers/char/ipmi/ipmi_kcs_intf.c refers to a "ipmi_smi_handlers" struct
that doesn't have an initializer for the new "start_processing" field.

Do you have a fix for this?  Is it better to add an empty handler func
in ipmi_kcs_intf.c or is it better to make ipmi_register_smi() check for
a non-NULL "start_processing" pointer?

We need to have this resolved tomorrow (Tuesday) if you want this in U8.

Thanks in advance.  -ernie


Comment 7 Ernie Petrides 2006-04-25 09:09:12 UTC
Created attachment 128188 [details]
incremental patch to fix ipmi_kcs_intf.c for missing "start_processing" func

Here's an incremental patch for addressing the missing initialization
for the "start_processing" function pointer in the "ipmi_smi_handlers"
struct referenced in the call from init_one_kcs() to ipmi_register_smi().

Please test and confirm whether this is okay by end-of-Tuesday (today).

Thanks in advance.  -ernie

Comment 8 Matt Domsch 2006-04-25 12:39:35 UTC
Ernie, this is what I wrote to Peter in email yesterday.

Peter, the ipmi_kcs_intf.c file/module is really obsolete, it's been superceeded
by ipmi_devintf (which isn't KCS-specific).  Hence I stopped worrying about
ipmi_kcs_intf.c (or even packaging it in my dkms driver packages).  That was
easy enough for me to ignore when we first started fixing up the driver a year
ago to work on Dell systems (most of which are KCS).  All our testing for the
last year has been using the ipmi_si_drv and ipmi_devintf modules.

I can see from a RHEL3 sustaining perspective though that you wouldn't want to
break anyone who might be using ipmi_kcs_intf by not building it.  But, I
haven't done any work on it, (and with the new baby girl born this week, can't
any time soon).

Perhaps to solve the "not breaking existing customers" problem, we could employ
the kernels obsoletes method at kernel install time, and obsolete ipmi_kcs_intf
module, replacing it with ipmi_si_drv and ipmi_devintf instead? :-)  Then we
could avoid building ipmi_kcs_intf all together.

-Matt


Comment 9 Matt Domsch 2006-04-25 12:48:08 UTC
In addition, if you wanted to test existing KCS systems, that includes all of
our 8G servers plus others.  See http://linux.dell.com/ipmi.shtml for details. 
Red Hat should have plenty of these systems in the test grid.

Comment 10 Ernie Petrides 2006-04-25 22:02:49 UTC
Thanks for the follow-up, Matt.  We've decided to simply provide an
empty handler function for ipmi_kcs_intf.c to avoid the crash.

Comment 11 Matt Domsch 2006-04-26 03:22:30 UTC
OK, sounds sane to me.  That'll leave the race condition there, though it's
extremely unlikely to trigger (being timer-driven only now; ipmi_kcs_intf.c
doesn't have a kernel thread to speed up command processing, which is where we
saw the one failure that lead to this whole patch).  And I expect no one will
use the ipmi_kcs_intf module anyhow, so no worries there.  Just note for tech
support, if someone does hit that race with this module, have them switch to
ipmi_devintf and ipmi_si_drv. :-)

Comment 12 Ernie Petrides 2006-04-26 04:15:56 UTC
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-41.EL).


Comment 14 Raghavendra Biligiri 2006-06-26 07:10:36 UTC
Patch has been applied in RHEL3-U8(kernel 2.4.21-43). 

Comment 16 Red Hat Bugzilla 2006-07-20 14:01:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0437.html



Note You need to log in before you can comment on or make changes to this bug.