Bug 450219 - bonding driver can leave rtnl_lock unbalanced
bonding driver can leave rtnl_lock unbalanced
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
urgent Severity high
: rc
: ---
Assigned To: Fabio Olive Leite
Martin Jenner
: ZStream
: 445016 (view as bug list)
Depends On:
Blocks: 451677 451939
  Show dependency treegraph
 
Reported: 2008-06-05 17:57 EDT by Fabio Olive Leite
Modified: 2010-10-22 21:43 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 14:36:51 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to correct the rtnl lock usage. (1.39 KB, patch)
2008-06-05 17:57 EDT, Fabio Olive Leite
no flags Details | Diff

  None (edit)
Description Fabio Olive Leite 2008-06-05 17:57:41 EDT
Description of problem:

Some error conditions handled by the bonding sysfs interface on 5.2 can leave 
the rtnl_lock unbalanced, either by locking and returning without unlocking, 
or by unlocking when it didn't lock.

If one sees kernel messages about bonding and then RTNL asserts failing, it's 
very likely to be the issues pointed out here.

kernel: bonding: bond1: Unable to update slaves because interface is down.
...
kernel: RTNL: assertion failed at net/core/fib_rules.c (388)
... repeats many times and at several points in the code.

That "unable to update slaves" message is precisely the point where 
bonding_store_slaves() does an unbalanced unlock.

Version-Release number of selected component (if applicable):

First noticed on 2.6.18-92.el5, but the code changes happened over a few 
earlier commits starting at -62.el5 (bug 268001).

How reproducible:

Looking at the code I guess 100%. :)

Steps to Reproduce:

Hits a customer, I could not try on the lab yet. I figure adding a NIC to a 
bond that is down could trigger this easily.
  
Actual results:

RTNL lock cannot be trusted after those errors and I gather networking would 
get quite racy.

Expected results:

RTNL lock nicely managed.

Additional info:

Patch attached, already in a test package on a customer. Will post results as 
soon as the tests are done. Looking at the code for bonding_store_bonds() and 
bonding_store_slaves() it is easy to spot those problems.
Comment 1 Fabio Olive Leite 2008-06-05 17:57:42 EDT
Created attachment 308494 [details]
Patch to correct the rtnl lock usage.
Comment 8 Fabio Olive Leite 2008-06-10 13:00:21 EDT
Patch is posted and ACKed.
Comment 9 RHEL Product and Program Management 2008-06-10 13:26:42 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 12 David Mair 2008-06-17 17:37:33 EDT
Looks like this has fairly high impact on the customers that have reported the
issue and there's potential for more customers to hit this problem.  Proposing
for EUS as a result.
Comment 16 Don Zickus 2008-07-09 17:11:54 EDT
in kernel-2.6.18-95.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 21 Andy Gospodarek 2008-10-07 09:20:46 EDT
*** Bug 445016 has been marked as a duplicate of this bug. ***
Comment 26 errata-xmlrpc 2009-01-20 14:36:51 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.