Bug 450219 - bonding driver can leave rtnl_lock unbalanced
Summary: bonding driver can leave rtnl_lock unbalanced
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Fabio Olive Leite
QA Contact: Martin Jenner
URL:
Whiteboard:
: 445016 (view as bug list)
Depends On:
Blocks: 451677 451939
TreeView+ depends on / blocked
 
Reported: 2008-06-05 21:57 UTC by Fabio Olive Leite
Modified: 2018-10-20 02:47 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 19:36:51 UTC
Target Upstream Version:


Attachments (Terms of Use)
Patch to correct the rtnl lock usage. (1.39 KB, patch)
2008-06-05 21:57 UTC, Fabio Olive Leite
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Description Fabio Olive Leite 2008-06-05 21:57:41 UTC
Description of problem:

Some error conditions handled by the bonding sysfs interface on 5.2 can leave 
the rtnl_lock unbalanced, either by locking and returning without unlocking, 
or by unlocking when it didn't lock.

If one sees kernel messages about bonding and then RTNL asserts failing, it's 
very likely to be the issues pointed out here.

kernel: bonding: bond1: Unable to update slaves because interface is down.
...
kernel: RTNL: assertion failed at net/core/fib_rules.c (388)
... repeats many times and at several points in the code.

That "unable to update slaves" message is precisely the point where 
bonding_store_slaves() does an unbalanced unlock.

Version-Release number of selected component (if applicable):

First noticed on 2.6.18-92.el5, but the code changes happened over a few 
earlier commits starting at -62.el5 (bug 268001).

How reproducible:

Looking at the code I guess 100%. :)

Steps to Reproduce:

Hits a customer, I could not try on the lab yet. I figure adding a NIC to a 
bond that is down could trigger this easily.
  
Actual results:

RTNL lock cannot be trusted after those errors and I gather networking would 
get quite racy.

Expected results:

RTNL lock nicely managed.

Additional info:

Patch attached, already in a test package on a customer. Will post results as 
soon as the tests are done. Looking at the code for bonding_store_bonds() and 
bonding_store_slaves() it is easy to spot those problems.

Comment 1 Fabio Olive Leite 2008-06-05 21:57:42 UTC
Created attachment 308494 [details]
Patch to correct the rtnl lock usage.

Comment 8 Fabio Olive Leite 2008-06-10 17:00:21 UTC
Patch is posted and ACKed.

Comment 9 RHEL Program Management 2008-06-10 17:26:42 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 David Mair 2008-06-17 21:37:33 UTC
Looks like this has fairly high impact on the customers that have reported the
issue and there's potential for more customers to hit this problem.  Proposing
for EUS as a result.

Comment 16 Don Zickus 2008-07-09 21:11:54 UTC
in kernel-2.6.18-95.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 21 Andy Gospodarek 2008-10-07 13:20:46 UTC
*** Bug 445016 has been marked as a duplicate of this bug. ***

Comment 26 errata-xmlrpc 2009-01-20 19:36:51 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.