Red Hat Bugzilla – Bug 619863
[ifcfg-rh] ignore configs with BRIDGE= and VLAN= until we support them
Last modified: 2012-06-26 23:03:04 EDT
(This may be more appropriate assigned to initscripts; if so, feel free. However, if initscripts hardcodes an avoidance of NetworkManager when an interface is a vlan/bridge, then there will be problems when the day arrives the NM *does* handle those types of interfaces...)
If NetworkManager is enabled, /sbin/ifup (and, presumably, /sbin/ifdown) will attempt to use NetworkManager to bring up vlans and bridges, even though NetworkManager doesn't support those types of interfaces.
Either ifup or NetworkManager should be made intelligent enough to allow normal ifup/ifdown processing on interface types that NetworkManager doesn't support.
To see the effect on vlans, create a vlan ifcfg file, eg: ifcfg-eth0.43
then enable NetworkManager (if it isn't already enabled) and run
# ifup eth0.43
Notice that eth0 gets the IP address that should be given to eth0.43, and eth0.43 isn't created at all.
If NetworkManager is disabled, ifup will bring up the vlan interface itself, and will do it correctly.
(for vlans, ifdown does the right thing whether NM is enabled or not)
With a bridge interface, create these two files:
root@rhel6 /etc/sysconfig/network-scripts>cat ifcfg-eth0
root@rhel6 /etc/sysconfig/network-scripts>cat ifcfg-br0
(this is roughly equivalent to what happens when you define a bridge attached to eth0 using netcf)
then run "ifup br0". There will be a "very long" delay while it is attempting to acquire an IP address, then it will finally fail. If you do "brctl br0" during this time, you'll see that the bridge device was created, but eth0 was not attached.
We're having a lot of problems with people forgetting to disable NetworkManager (or not knowing that NetworkManager and netcf/libvirt don't work well together); without going the full path of supporting bridge configuration in NM, solving intermediate problems like this would go a long way towards a friendlier experience (and less bug reports).
I just noticed that NetworkManager is capable of quickly returning an error to
ifup if it believes it isn't managing an interface:
root@rhel6 /etc/sysconfig/network-scripts>ifup eth0
Error: Connection activation failed: Device not managed by NetworkManager
Interface eth0 bring-up failed!
error: failed to execute external program
error: Running 'ifup eth0' failed with exit code 4
(it's a long story how I got to that state, unimportant at the moment).
It seems it should be possible for ifup to attempt the call to NetworkManager,
which would return "not managed by me" if it was an interface type not
supported; ifup would then do something more intelligent with this failure, ie
do whatever it normally does when NM isn't running.
This way the config files wouldn't need any of the broken workaround
"NM_CONTROLLED=no" settings, and ifup wouldn't have to make any assumptions
about what types of interfaces NetworkManager supports (which will both cause
problems when upgrading to a future version of NM that *does* support these interface types)
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.
** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
dbus-send --system --print-reply --dest=com.redhat.ifcfgrh1 /com/redhat/ifcfgrh1 com.redhat.ifcfgrh1.GetIfcfgDetails string:/etc/sysconfig/network-scripts/ifcfg-eth1.43
That's the problem. Back to NM for now ... should it be returning a UUID for a connection it can't handle right?
With respect to your bridging config, if you have ONBOOT=no in ifcfg-eth0, you need to ifup that first before you bring up br0, otherwise it won't be attached.
(We attach bonding slaves automatically to bonding devices, but not bridge ports. I suppose this is inconsistent.)
Created attachment 435634 [details]
the output of nmcli con list
As we discussed in IRC, the vlan interface shows up in NM's list of editable connections (and available connections.
Here is the output of nmcli con list when it's in that state.
Created attachment 435635 [details]
nmcli con list uuid f702aea3-35ed-4db9-937f-5c33f8abd8d6
About Comment 5:
I see now that we've previously worked okay with NM disabled because the standard practice is to put an existing interface onto a bridge without bringing it down. If I start from a bridge ifconfiged down, though, it fails even with NM disabled.
However, if NM is enabled, when I ifconfig up the device, NM tries to use old config, obtaining a DHCP address for the interface, for example. In certain circumstances (which I'm still pinning down), NM actually *overwrites* the minimal ifcfg-eth0 file netcf creates with one of its own, which contains all kinds of extra stuff (including uuid, IP config, etc). (I was actually planning on filing a separate bug for that).
Created attachment 435659 [details]
log of "ncftool ifup br0"
More information about the bridge config.
It turns out that if you do the ifup via ncftool (or virsh iface-start, same thing), it will call ifup for the attached interfaces first. So, "ncftool ifup br0" will do the equivalent of:
The problem is that, even though ifcfg-eth0 contains:
ifup eth0 will acquire an IP address. After that happens, "ifup br0" is called, and it doesn't attach eth0 to the bridge, so it fails to get an IP address.
This same practice works properly when NM is disabled.
I'll make NM ignore configs that have BRIDGE= or VLAN= in them.
QE reproducer instructions:
1) Create an ifcfg file named "ifcfg-bridge" that contains:
2) run this command from a terminal:
dbus-send --system --print-reply --dest=com.redhat.ifcfgrh1
and if the problem is fixed, we expect the dbus-send command to return and error. If the problem is not fixed, it'll return something like:
3) Create another ifcfg file named "ifcfg-vlan" that contains:
4) run this command from a terminal:
dbus-send --system --print-reply --dest=com.redhat.ifcfgrh1
and we expect same results for this command as in step #2
Basically, if the ifcfg file has BRIDGE= or VLAN= in it, NM should ignore that config.
I think you'll also need to identify those with "TYPE=Bridge" as well, won't you?
"VLAN=" catches the vlan interfaces, and "BRIDGE=" gets the physical interfaces that are connected to the bridge, but the bridge itself doesn't have "BRIDGE=" in its config.
OK, Dan, can you please get fix to Laine for testing and add this to Snapshot
We actually already ignore TYPE=Bridge connections. Just not BRIDGE= and VLAN=
(In reply to comment #12)
> OK, Dan, can you please get fix to Laine for testing and add this to Snapshot
Can I get the PM acks at the next meeting so I can build packages?
For various date & timezone related reasons I can't do a scratch build right now (yay for my timezone being wrong when generating the tarball), so here's the SRPM if you don't mind doing a scratch build yourselves:
otherwise I may be able to do one later tonight.
I built from the srpm, restarted NM, then created both a vlan and two bridges using netcf's ncftool, then ifup'ed them. Both came up with no problems, and ifconfig and brctl showed everything was in order (eth0 wasn't mistakenly given the IP of the bridge or vlan, the bridge and vlan both had their IPs, and the physical interfaces were connected to the bridges).
There is one problem I notice (I'll attach a log in a moment): If I remove the bridge and return ifcfg-eth0 back to its original state, ifup no longer works on it because NM still believes that it's a part of a bridge; is it caching the ifcfg contents?
I don't know if that part is easily fixable, but I don't think the fix you've made here should wait - this will eliminate probably 80% of the bug reports from people trying to use netcf and NetworkManager together - *removing* a bridge is a much less common operation.
Created attachment 436416 [details]
-x output of /sbin/ifup when trying to ifup eth0 after removing br0
This log shows what happens if I have an eth0 that is connected to br0, and subsequently bring down the bridge, and modify the eth0 config to be standalone (I've cat'ed ifcfg-eth0 inline to show what it contains).
After changing ifcfg-eth0 to remove the BRIDGE=br0 line (and add back in dhcp config) NM still believes that the interface is part of a bridge, and reports back that it isn't managed by NM. ifup also doesn't want to handle it, I guess because it believes that NM should?
If I restart NM, everything goes back to normal.
Can you attach either or both of:
1) all your ifcfg files *after* you've removed BRIDGE=br0 when you've made the change mentioned in comment 18
2) syslog from before up until after you removed BRIDGE=br0 as described in comment 18
It may be that you have some other ifcfg file that has HWADDR and NM_CONTROLLED=no; if *any* ifcfg has HWADDR=<ethX MAC address> and NM_CONTROLLED=no then NM will ignore that interface and any connections/ifcfgs that have the same HWADDR in them, because you've told NM to ignore it.
Brew scratch build for the package I refer to in comment 16:
*** Bug 621254 has been marked as a duplicate of this bug. ***
Further fixes for this issue have been made upstream and should hit the next snapshot.
QE: *additional* to the testcases from Comment 10, we now expect NetworkManager to unmanage interfaces listed in ifcfg files that have BRIDGE= + HWADDR= or VLAN= + HWADDR=. So with this ifcfg file from comment 10:
we would expect NM to report something liek the following in /var/log/messages:
Aug 9 13:08:45 dcbw NetworkManager: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ...
Aug 9 13:08:45 dcbw NetworkManager: ifcfg-rh: read connection 'System eth0'
Aug 9 13:08:45 dcbw NetworkManager: ifcfg-rh: Ignoring connection 'System eth0' and its device due to NM_CONTROLLED/BRIDGE/VLAN.
Aug 9 13:08:45 dcbw NetworkManager: <info> (eth0): now unmanaged
Aug 9 13:08:45 dcbw NetworkManager: <info> (eth0): device state change: 2 -> 1 (reason 3)
I've been testing with NetworkManager-0.8.1-4.4, provided by Dan, and have found that this problem appears to be solved, ie bridges (and their associated physical interfaces, and vlan interfaces are ignored by NetworkManager, so /sbin/ifup is allowed to do its work on those interfaces as if NetworkManager didn't exist. I think the changes Dan has made in this version should be posted to RHEL, and we can close this bug (pending QE testing, of course).
There are a few issues that I've noticed, but those will be filed separately:
1) if an ethernet device is added to or removed from a bridge (by modifying the ifcfg file), and then ifup is called immediately to bring up the device, the inotify message that tells NM the file has been modified seems to always reach NM *after* the ifup request comes in on dbus. The result is that NM believes the device is still managed, but then fails to bring it up.
If a delay of 1 second is put between modifying the file (with ncftool define, for example) and calling ifup, the inotify message is processed by NM before it receives the ifup, so it has up-to-date information on whether or not it should manage the interface.
This creates a problem for virt-manager when the "activate now" checkbox is selected during bridge creation; since there is no delay between defining the bridge and ifup'ing the device, activation fails. The workaround is to uncheck "activate now", create the bridge, and then click on the "start" button separately.
2) In at least one case, after a couple of quick iterations of removing an interface from a bridge, then re-adding it to the bridge, there can be a very long pause while bringing up the interface. I've found that this is caused by the DHCP server attempting to respond to the discover request with a *unicast* packet. I believe this is an issue with the DHCP server (running on a small wireless router), probably not with our dhclient. This same problem occurs if NM is disabled, so it definitely is not an NM problem.
3) When an interface becomes "unmanaged" due to it being placed on a bridge, any nameserver and route data that may be associated with the device are removed from /etc/resolv.conf and the route table respectively. Solving this problem is way beyond the scope of this bug - the solution will probably be to make NM able to manage bridges.
4) As in (3) if the only managed interface is placed on a bridge, this leaves NM with no interfaces to manage, so it sends out notification that there are no working interfaces on the system. This will cause things like openvpn to fail (and probably others). Again, fixing this problem is beyond the scope of this bug.
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.