Description of problem: Some error conditions handled by the bonding sysfs interface on 5.2 can leave the rtnl_lock unbalanced, either by locking and returning without unlocking, or by unlocking when it didn't lock. If one sees kernel messages about bonding and then RTNL asserts failing, it's very likely to be the issues pointed out here. kernel: bonding: bond1: Unable to update slaves because interface is down. ... kernel: RTNL: assertion failed at net/core/fib_rules.c (388) ... repeats many times and at several points in the code. That "unable to update slaves" message is precisely the point where bonding_store_slaves() does an unbalanced unlock. Version-Release number of selected component (if applicable): First noticed on 2.6.18-92.el5, but the code changes happened over a few earlier commits starting at -62.el5 (bug 268001). How reproducible: Looking at the code I guess 100%. :) Steps to Reproduce: Hits a customer, I could not try on the lab yet. I figure adding a NIC to a bond that is down could trigger this easily. Actual results: RTNL lock cannot be trusted after those errors and I gather networking would get quite racy. Expected results: RTNL lock nicely managed. Additional info: Patch attached, already in a test package on a customer. Will post results as soon as the tests are done. Looking at the code for bonding_store_bonds() and bonding_store_slaves() it is easy to spot those problems.
Created attachment 308494 [details] Patch to correct the rtnl lock usage.
Patch is posted and ACKed.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Looks like this has fairly high impact on the customers that have reported the issue and there's potential for more customers to hit this problem. Proposing for EUS as a result.
in kernel-2.6.18-95.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
*** Bug 445016 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html