Bug 463325 - [PATCH] Cleanup problems with ifdown-eth network script
[PATCH] Cleanup problems with ifdown-eth network script
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: initscripts (Show other bugs)
5.2
All Linux
medium Severity medium
: rc
: ---
Assigned To: initscripts Maintenance Team
BaseOS QE
:
Depends On:
Blocks: 500308
  Show dependency treegraph
 
Reported: 2008-09-22 17:58 EDT by Sean E. Millichamp
Modified: 2009-09-02 07:13 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 500308 (view as bug list)
Environment:
Last Closed: 2009-09-02 07:13:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to fix reported problem (1.35 KB, patch)
2008-09-22 17:58 EDT, Sean E. Millichamp
no flags Details | Diff
Move the bridge member handling later in the ifdown sequence (1.38 KB, patch)
2008-09-24 10:46 EDT, Sean E. Millichamp
no flags Details | Diff
Fixes the bridge handling to remove the bridge when empty (586 bytes, patch)
2008-09-24 10:47 EDT, Sean E. Millichamp
no flags Details | Diff
Fixes the bridge handling to remove the bridge when empty (586 bytes, patch)
2008-09-24 10:50 EDT, Sean E. Millichamp
no flags Details | Diff

  None (edit)
Description Sean E. Millichamp 2008-09-22 17:58:19 EDT
Created attachment 317421 [details]
Patch to fix reported problem

Description of problem:

This patch fixes three problems related to bridging and bonding:

1) The bonding slave devices of a bridged bond interface must be downed before the device is set down in the bridge section.

2) As far as I can tell, the brctl show command no longer prints "can't get port info" (what grep matches on) if no interfaces are members of the bridge.  It should be changed to use something more accurate (such as sysfs).

3) The bond interface should be removed so the system is returned to the state it started at after a network stop.

The attached patch fixes these problems.  Ii also removes the "exit 0" in the bridge-member handling section so that it can continue through to the bond interface removal.

Version-Release number of selected component (if applicable):
initscripts-8.45.19.EL-1

How reproducible:
Every time, given the proper network configurations.

Steps to Reproduce:
1. Configure a bond interface with two ethernet slaves, and a bridge interface with the bond interface as a member.
2. service network start
3. service network stop
  
Actual results:

The interfaces aren't shutdown and cleaned up properly.

Expected results:

The networking configuration should be returned to the state it started in.

Additional info:

Also see bug #463014 which reports a problem with startup ordering and bridged bond interfaces.

Note that the removal of the bond interface currently causes a kernel workqueue thread leak, see bug #463244.  However, that is a reason to fix the kernel bug, not to not clean up the bond interface, imo.
Comment 1 Bill Nottingham 2008-09-23 15:52:49 EDT
Hm. Moving things around in that manner has a likelyhood of screwing up the MAC address checking code.

Moreover, I don't see the need to remove bridge devices or bonding devices; after all, it's not as if we remove 'real' network devices by removing the modules.bbbbb
Comment 2 Sean E. Millichamp 2008-09-24 10:44:48 EDT
I have looked at the MAC address checking code and the patch and decided that the best approach is probably to follow in the same spirit as the patch discussed in bug #463014 and just move the bridge member handling section toward the end of the ifdown script.  I am attaching a patch that does just that and nothing more.

As for removing bridging or bonding devices, I will concede that it likely almost never matters if we remove them or not.  While trying to debug these bonding/bridging ordering problems I frequently found myself manually trying to get back to a "fresh boot" clean networking state to test properly.  Fixing the scripts so that a stop returned you to where you started made this so much simpler.  It is true you don't remove the modules for the 'real' devices, but it is also true that if you boot without bringing up networking the 'real' devices still show up in an 'ip link'.  The bonding and bridging devices don't show up until the networking scripts create them.

Also, as far as the bridging device is concerned, my interpretation from reading the script was that one time someone thought it was a good idea to remove an empty bridge device.  That is what I understand this section of code to be doing:

if LC_ALL=C /usr/sbin/brctl show | LC_ALL=C grep -q "^${BRIDGE}   .*can't get port info"; then
    /usr/sbin/brctl delbr ${BRIDGE}
fi

I am not certain, but I believe pre-sysfs that would show the "can't get port info" error when the bridge had no members.  The modern sysfs based code only shows that in the event there were problems reading from sysfs.  In the event you agree with me about the intent of that section I have attached a new patch which changes the logic to use sysfs to determine if the bridge has any member ports and if not removes the bridge.  The patch expects that the one which fixes the ordering problem has already been applied.
Comment 3 Sean E. Millichamp 2008-09-24 10:46:47 EDT
Created attachment 317595 [details]
Move the bridge member handling later in the ifdown sequence
Comment 4 Sean E. Millichamp 2008-09-24 10:47:37 EDT
Created attachment 317596 [details]
Fixes the bridge handling to remove the bridge when empty
Comment 5 Sean E. Millichamp 2008-09-24 10:50:03 EDT
Created attachment 317598 [details]
Fixes the bridge handling to remove the bridge when empty

I attached a reverse diff before, oops.  This one is correct.  It expects to be applied after the patch in attachment 317595 [details], but if you decide not to apply it then not a big deal.
Comment 7 Sean E. Millichamp 2009-04-21 10:36:37 EDT
(In reply to comment #1)
> Hm. Moving things around in that manner has a likelyhood of screwing up the MAC
> address checking code.
> 
> Moreover, I don't see the need to remove bridge devices or bonding devices;
> after all, it's not as if we remove 'real' network devices by removing the
> modules.bbbbb  

I just encountered a real-world situation for removing and cleaning up at least the bond devices.

The scripts use sysfs to set the BONDING_OPTS parameters, allowing for per bond device settings.  When the bond interface is started the device is created and those parameters are set on that device.  Some parameters, like mode, always make sense to specify.  However, many of the parameters (such as updelay), might be something that gets added or removed between network/interface restarts.

If you start with updelay=10000 on bond0 and then ifup bond0, all is fine.  However, if you then remove the updelay line entirely (presuming it will fall back to default) and restart either networking or bond0 the updelay is still set to 10000!  Since the bond device wasn't removed, all old settings are still hanging around.

The cleanest/best solution is for ifdown to remove the bond device.  Then, when ifup recreates it, you get a consistent default start.

I got bit by this while testing some alternative bonding configurations in our environment.
Comment 8 Harald Hoyer 2009-05-05 08:51:43 EDT
Please test the erratum candidate:
http://people.redhat.com/harald/downloads/initscripts/initscripts-8.45.26.1.el5/
Comment 12 errata-xmlrpc 2009-09-02 07:13:16 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1344.html

Note You need to log in before you can comment on or make changes to this bug.