Bug 619863
| Summary: | [ifcfg-rh] ignore configs with BRIDGE= and VLAN= until we support them | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Laine Stump <laine> |
| Component: | NetworkManager | Assignee: | Dan Williams <dcbw> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | desktop-bugs <desktop-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 6.0 | CC: | ajia, borgan, dcbw, ddumas, jkoten, notting, riel, syeghiay |
| Target Milestone: | rc | Keywords: | RHELNAK |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | NetworkManager-0.8.1-5.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2010-11-10 19:32:11 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Laine Stump
2010-07-30 18:12:15 UTC
I just noticed that NetworkManager is capable of quickly returning an error to ifup if it believes it isn't managing an interface: root@rhel6 /etc/sysconfig/network-scripts>ifup eth0 Error: Connection activation failed: Device not managed by NetworkManager Interface eth0 bring-up failed! error: failed to execute external program error: Running 'ifup eth0' failed with exit code 4 (it's a long story how I got to that state, unimportant at the moment). It seems it should be possible for ifup to attempt the call to NetworkManager, which would return "not managed by me" if it was an interface type not supported; ifup would then do something more intelligent with this failure, ie do whatever it normally does when NM isn't running. This way the config files wouldn't need any of the broken workaround "NM_CONTROLLED=no" settings, and ifup wouldn't have to make any assumptions about what types of interfaces NetworkManager supports (which will both cause problems when upgrading to a future version of NM that *does* support these interface types) This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** /etc/sysconfig/network-scripts/ifcfg-eth1.43 = DEVICE=eth1.43 VLAN=yes ONBOOT=yes BOOTPROTO=none IPADDR=192.168.43.149 NETMASK=255.255.255.0 dbus-send --system --print-reply --dest=com.redhat.ifcfgrh1 /com/redhat/ifcfgrh1 com.redhat.ifcfgrh1.GetIfcfgDetails string:/etc/sysconfig/network-scripts/ifcfg-eth1.43 yields: UUID=f702aea3-35ed-4db9-937f-5c33f8abd8d6 That's the problem. Back to NM for now ... should it be returning a UUID for a connection it can't handle right? With respect to your bridging config, if you have ONBOOT=no in ifcfg-eth0, you need to ifup that first before you bring up br0, otherwise it won't be attached. (We attach bonding slaves automatically to bonding devices, but not bridge ports. I suppose this is inconsistent.) Created attachment 435634 [details]
the output of nmcli con list
As we discussed in IRC, the vlan interface shows up in NM's list of editable connections (and available connections.
Here is the output of nmcli con list when it's in that state.
Created attachment 435635 [details]
nmcli con list uuid f702aea3-35ed-4db9-937f-5c33f8abd8d6
About Comment 5: I see now that we've previously worked okay with NM disabled because the standard practice is to put an existing interface onto a bridge without bringing it down. If I start from a bridge ifconfiged down, though, it fails even with NM disabled. However, if NM is enabled, when I ifconfig up the device, NM tries to use old config, obtaining a DHCP address for the interface, for example. In certain circumstances (which I'm still pinning down), NM actually *overwrites* the minimal ifcfg-eth0 file netcf creates with one of its own, which contains all kinds of extra stuff (including uuid, IP config, etc). (I was actually planning on filing a separate bug for that). Created attachment 435659 [details]
log of "ncftool ifup br0"
More information about the bridge config.
It turns out that if you do the ifup via ncftool (or virsh iface-start, same thing), it will call ifup for the attached interfaces first. So, "ncftool ifup br0" will do the equivalent of:
ifup eth0
ifup br0
The problem is that, even though ifcfg-eth0 contains:
DEVICE=eth0
HWADDR=00:22:15:59:62:97
ONBOOT=yes
BRIDGE=br0
ifup eth0 will acquire an IP address. After that happens, "ifup br0" is called, and it doesn't attach eth0 to the bridge, so it fails to get an IP address.
This same practice works properly when NM is disabled.
I'll make NM ignore configs that have BRIDGE= or VLAN= in them. QE reproducer instructions: --------------------------------- 1) Create an ifcfg file named "ifcfg-bridge" that contains: DEVICE=eth0 HWADDR=00:22:15:59:62:97 ONBOOT=yes BRIDGE=br0 2) run this command from a terminal: dbus-send --system --print-reply --dest=com.redhat.ifcfgrh1 /com/redhat/ifcfgrh1 com.redhat.ifcfgrh1.GetIfcfgDetails string:/etc/sysconfig/network-scripts/ifcfg-bridge and if the problem is fixed, we expect the dbus-send command to return and error. If the problem is not fixed, it'll return something like: UUID f702aea3-35ed-4db9-937f-5c33f8abd8d6 3) Create another ifcfg file named "ifcfg-vlan" that contains: DEVICE=eth0 HWADDR=00:22:15:59:62:97 ONBOOT=yes VLAN=yes 4) run this command from a terminal: dbus-send --system --print-reply --dest=com.redhat.ifcfgrh1 /com/redhat/ifcfgrh1 com.redhat.ifcfgrh1.GetIfcfgDetails string:/etc/sysconfig/network-scripts/ifcfg-vlan and we expect same results for this command as in step #2 Basically, if the ifcfg file has BRIDGE= or VLAN= in it, NM should ignore that config. I think you'll also need to identify those with "TYPE=Bridge" as well, won't you? "VLAN=" catches the vlan interfaces, and "BRIDGE=" gets the physical interfaces that are connected to the bridge, but the bridge itself doesn't have "BRIDGE=" in its config. OK, Dan, can you please get fix to Laine for testing and add this to Snapshot 11? We actually already ignore TYPE=Bridge connections. Just not BRIDGE= and VLAN= (In reply to comment #12) > OK, Dan, can you please get fix to Laine for testing and add this to Snapshot > 11? Can I get the PM acks at the next meeting so I can build packages? Upstream fix: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=NM_0_8&id=a1c3408d054e8f0c14d90a69ba3e4bd86ade93f3 For various date & timezone related reasons I can't do a scratch build right now (yay for my timezone being wrong when generating the tarball), so here's the SRPM if you don't mind doing a scratch build yourselves: http://people.redhat.com/dcbw/NetworkManager-0.8.1-4.1.el6.src.rpm otherwise I may be able to do one later tonight. Yay!! I built from the srpm, restarted NM, then created both a vlan and two bridges using netcf's ncftool, then ifup'ed them. Both came up with no problems, and ifconfig and brctl showed everything was in order (eth0 wasn't mistakenly given the IP of the bridge or vlan, the bridge and vlan both had their IPs, and the physical interfaces were connected to the bridges). There is one problem I notice (I'll attach a log in a moment): If I remove the bridge and return ifcfg-eth0 back to its original state, ifup no longer works on it because NM still believes that it's a part of a bridge; is it caching the ifcfg contents? I don't know if that part is easily fixable, but I don't think the fix you've made here should wait - this will eliminate probably 80% of the bug reports from people trying to use netcf and NetworkManager together - *removing* a bridge is a much less common operation. Created attachment 436416 [details]
-x output of /sbin/ifup when trying to ifup eth0 after removing br0
This log shows what happens if I have an eth0 that is connected to br0, and subsequently bring down the bridge, and modify the eth0 config to be standalone (I've cat'ed ifcfg-eth0 inline to show what it contains).
After changing ifcfg-eth0 to remove the BRIDGE=br0 line (and add back in dhcp config) NM still believes that the interface is part of a bridge, and reports back that it isn't managed by NM. ifup also doesn't want to handle it, I guess because it believes that NM should?
If I restart NM, everything goes back to normal.
Can you attach either or both of: 1) all your ifcfg files *after* you've removed BRIDGE=br0 when you've made the change mentioned in comment 18 2) syslog from before up until after you removed BRIDGE=br0 as described in comment 18 It may be that you have some other ifcfg file that has HWADDR and NM_CONTROLLED=no; if *any* ifcfg has HWADDR=<ethX MAC address> and NM_CONTROLLED=no then NM will ignore that interface and any connections/ifcfgs that have the same HWADDR in them, because you've told NM to ignore it. Brew scratch build for the package I refer to in comment 16: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=2650675 *** Bug 621254 has been marked as a duplicate of this bug. *** Further fixes for this issue have been made upstream and should hit the next snapshot. QE: *additional* to the testcases from Comment 10, we now expect NetworkManager to unmanage interfaces listed in ifcfg files that have BRIDGE= + HWADDR= or VLAN= + HWADDR=. So with this ifcfg file from comment 10: DEVICE=eth0 HWADDR=00:22:15:59:62:97 ONBOOT=yes BRIDGE=br0 we would expect NM to report something liek the following in /var/log/messages: Aug 9 13:08:45 dcbw NetworkManager[29767]: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ... Aug 9 13:08:45 dcbw NetworkManager[29767]: ifcfg-rh: read connection 'System eth0' Aug 9 13:08:45 dcbw NetworkManager[29767]: ifcfg-rh: Ignoring connection 'System eth0' and its device due to NM_CONTROLLED/BRIDGE/VLAN. Aug 9 13:08:45 dcbw NetworkManager[29767]: <info> (eth0): now unmanaged Aug 9 13:08:45 dcbw NetworkManager[29767]: <info> (eth0): device state change: 2 -> 1 (reason 3) I've been testing with NetworkManager-0.8.1-4.4, provided by Dan, and have found that this problem appears to be solved, ie bridges (and their associated physical interfaces, and vlan interfaces are ignored by NetworkManager, so /sbin/ifup is allowed to do its work on those interfaces as if NetworkManager didn't exist. I think the changes Dan has made in this version should be posted to RHEL, and we can close this bug (pending QE testing, of course). There are a few issues that I've noticed, but those will be filed separately: 1) if an ethernet device is added to or removed from a bridge (by modifying the ifcfg file), and then ifup is called immediately to bring up the device, the inotify message that tells NM the file has been modified seems to always reach NM *after* the ifup request comes in on dbus. The result is that NM believes the device is still managed, but then fails to bring it up. If a delay of 1 second is put between modifying the file (with ncftool define, for example) and calling ifup, the inotify message is processed by NM before it receives the ifup, so it has up-to-date information on whether or not it should manage the interface. This creates a problem for virt-manager when the "activate now" checkbox is selected during bridge creation; since there is no delay between defining the bridge and ifup'ing the device, activation fails. The workaround is to uncheck "activate now", create the bridge, and then click on the "start" button separately. 2) In at least one case, after a couple of quick iterations of removing an interface from a bridge, then re-adding it to the bridge, there can be a very long pause while bringing up the interface. I've found that this is caused by the DHCP server attempting to respond to the discover request with a *unicast* packet. I believe this is an issue with the DHCP server (running on a small wireless router), probably not with our dhclient. This same problem occurs if NM is disabled, so it definitely is not an NM problem. 3) When an interface becomes "unmanaged" due to it being placed on a bridge, any nameserver and route data that may be associated with the device are removed from /etc/resolv.conf and the route table respectively. Solving this problem is way beyond the scope of this bug - the solution will probably be to make NM able to manage bridges. 4) As in (3) if the only managed interface is placed on a bridge, this leaves NM with no interfaces to manage, so it sends out notification that there are no working interfaces on the system. This will cause things like openvpn to fail (and probably others). Again, fixing this problem is beyond the scope of this bug. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |