Created attachment 317421 [details] Patch to fix reported problem Description of problem: This patch fixes three problems related to bridging and bonding: 1) The bonding slave devices of a bridged bond interface must be downed before the device is set down in the bridge section. 2) As far as I can tell, the brctl show command no longer prints "can't get port info" (what grep matches on) if no interfaces are members of the bridge. It should be changed to use something more accurate (such as sysfs). 3) The bond interface should be removed so the system is returned to the state it started at after a network stop. The attached patch fixes these problems. Ii also removes the "exit 0" in the bridge-member handling section so that it can continue through to the bond interface removal. Version-Release number of selected component (if applicable): initscripts-8.45.19.EL-1 How reproducible: Every time, given the proper network configurations. Steps to Reproduce: 1. Configure a bond interface with two ethernet slaves, and a bridge interface with the bond interface as a member. 2. service network start 3. service network stop Actual results: The interfaces aren't shutdown and cleaned up properly. Expected results: The networking configuration should be returned to the state it started in. Additional info: Also see bug #463014 which reports a problem with startup ordering and bridged bond interfaces. Note that the removal of the bond interface currently causes a kernel workqueue thread leak, see bug #463244. However, that is a reason to fix the kernel bug, not to not clean up the bond interface, imo.
Hm. Moving things around in that manner has a likelyhood of screwing up the MAC address checking code. Moreover, I don't see the need to remove bridge devices or bonding devices; after all, it's not as if we remove 'real' network devices by removing the modules.bbbbb
I have looked at the MAC address checking code and the patch and decided that the best approach is probably to follow in the same spirit as the patch discussed in bug #463014 and just move the bridge member handling section toward the end of the ifdown script. I am attaching a patch that does just that and nothing more. As for removing bridging or bonding devices, I will concede that it likely almost never matters if we remove them or not. While trying to debug these bonding/bridging ordering problems I frequently found myself manually trying to get back to a "fresh boot" clean networking state to test properly. Fixing the scripts so that a stop returned you to where you started made this so much simpler. It is true you don't remove the modules for the 'real' devices, but it is also true that if you boot without bringing up networking the 'real' devices still show up in an 'ip link'. The bonding and bridging devices don't show up until the networking scripts create them. Also, as far as the bridging device is concerned, my interpretation from reading the script was that one time someone thought it was a good idea to remove an empty bridge device. That is what I understand this section of code to be doing: if LC_ALL=C /usr/sbin/brctl show | LC_ALL=C grep -q "^${BRIDGE} .*can't get port info"; then /usr/sbin/brctl delbr ${BRIDGE} fi I am not certain, but I believe pre-sysfs that would show the "can't get port info" error when the bridge had no members. The modern sysfs based code only shows that in the event there were problems reading from sysfs. In the event you agree with me about the intent of that section I have attached a new patch which changes the logic to use sysfs to determine if the bridge has any member ports and if not removes the bridge. The patch expects that the one which fixes the ordering problem has already been applied.
Created attachment 317595 [details] Move the bridge member handling later in the ifdown sequence
Created attachment 317596 [details] Fixes the bridge handling to remove the bridge when empty
Created attachment 317598 [details] Fixes the bridge handling to remove the bridge when empty I attached a reverse diff before, oops. This one is correct. It expects to be applied after the patch in attachment 317595 [details], but if you decide not to apply it then not a big deal.
(In reply to comment #1) > Hm. Moving things around in that manner has a likelyhood of screwing up the MAC > address checking code. > > Moreover, I don't see the need to remove bridge devices or bonding devices; > after all, it's not as if we remove 'real' network devices by removing the > modules.bbbbb I just encountered a real-world situation for removing and cleaning up at least the bond devices. The scripts use sysfs to set the BONDING_OPTS parameters, allowing for per bond device settings. When the bond interface is started the device is created and those parameters are set on that device. Some parameters, like mode, always make sense to specify. However, many of the parameters (such as updelay), might be something that gets added or removed between network/interface restarts. If you start with updelay=10000 on bond0 and then ifup bond0, all is fine. However, if you then remove the updelay line entirely (presuming it will fall back to default) and restart either networking or bond0 the updelay is still set to 10000! Since the bond device wasn't removed, all old settings are still hanging around. The cleanest/best solution is for ifdown to remove the bond device. Then, when ifup recreates it, you get a consistent default start. I got bit by this while testing some alternative bonding configurations in our environment.
Please test the erratum candidate: http://people.redhat.com/harald/downloads/initscripts/initscripts-8.45.26.1.el5/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1344.html