Bug 1444109
Summary: | [cockpit] - Creating active-backup bond with primary slave which has the active connection, leads to a situation in which the secondary slave activated and enslaved first | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Michael Burman <mburman> | ||||
Component: | cockpit | Assignee: | Marius Vollmer <mvollmer> | ||||
Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-daemons | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.3 | CC: | aloughla, atragler, bgalvani, edwardh, fgiudici, lrintel, mvollmer, rkhan, sukulkar, thaller | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-01-15 07:34:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1472965 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Michael Burman
2017-04-20 15:11:26 UTC
NM log, not in debug - Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.6457] device (enp6s0): state change: prepare -> config (reason 'none') [40 50 0] Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.6496] device (enp4s0): state change: disconnected -> prepare (reason 'none') [30 40 0] Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.6511] device (enp4s0): state change: prepare -> config (reason 'none') [40 50 0] Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.7426] device (bond1): state change: config -> ip-config (reason 'none') [50 70 0] Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.7431] device (bond1): IPv4 config waiting until carrier is on Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.7432] device (bond1): IPv6 config waiting until carrier is on Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.7666] device (enp6s0): state change: config -> ip-config (reason 'none') [50 70 0] Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.9734] device (bond1): enslaved bond slave enp6s0 Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.9735] device (enp6s0): Activation: connection 'enp6s0' enslaved, continuing activation Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.9737] device (bond1): IPv4 config waiting until carrier is on Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.9737] device (bond1): IPv6 config waiting until carrier is on Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.9745] device (enp6s0): state change: ip-config -> secondaries (reason 'none') [70 90 0] Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.9750] device (enp6s0): state change: secondaries -> activated (reason 'none') [90 100 0] Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.9811] device (enp6s0): Activation: successful, device activated. Apr 20 16:13:11 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693991.9825] device (enp4s0): state change: config -> ip-config (reason 'none') [50 70 0] Apr 20 16:13:12 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693992.1786] device (bond1): enslaved bond slave enp4s0 Apr 20 16:13:12 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693992.1787] device (enp4s0): Activation: connection 'enp4s0' enslaved, continuing activation Apr 20 16:13:12 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693992.1787] device (bond1): IPv4 config waiting until carrier is on Apr 20 16:13:12 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693992.1788] device (bond1): IPv6 config waiting until carrier is on Apr 20 16:13:12 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693992.1796] device (enp4s0): state change: ip-config -> secondaries (reason 'none') [70 90 0] Apr 20 16:13:12 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693992.1802] device (enp4s0): state change: secondaries -> activated (reason 'none') [90 100 0] Apr 20 16:13:12 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693992.1845] device (enp4s0): Activation: successful, device activated. Apr 20 16:13:15 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693995.1962] device (enp6s0): link connected Apr 20 16:13:15 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693995.2022] device (bond1): link connected Apr 20 16:13:15 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693995.2027] dhcp4 (bond1): activation: beginning transaction (timeout in 45 seconds) Apr 20 16:13:15 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693995.2066] dhcp4 (bond1): dhclient started with pid 3238 Apr 20 16:13:15 orchid-vds2.qa.lab.tlv.redhat.com dhclient[3238]: DHCPDISCOVER on bond1 to 255.255.255.255 port 67 interval 6 (xid=0x9d9f109) Apr 20 16:13:15 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492693995.3887] device (enp4s0): link connected Apr 20 16:13:21 orchid-vds2.qa.lab.tlv.redhat.com dhclient[3238]: DHCPDISCOVER on bond1 to 255.255.255.255 port 67 interval 11 (xid=0x9d9f109) Apr 20 16:13:32 orchid-vds2.qa.lab.tlv.redhat.com dhclient[3238]: DHCPDISCOVER on bond1 to 255.255.255.255 port 67 interval 16 (xid=0x9d9f109) Apr 20 16:13:48 orchid-vds2.qa.lab.tlv.redhat.com dhclient[3238]: DHCPDISCOVER on bond1 to 255.255.255.255 port 67 interval 16 (xid=0x9d9f109) Apr 20 16:13:49 orchid-vds2.qa.lab.tlv.redhat.com dhclient[3238]: DHCPREQUEST on bond1 to 255.255.255.255 port 67 (xid=0x9d9f109) Apr 20 16:13:49 orchid-vds2.qa.lab.tlv.redhat.com dhclient[3238]: DHCPOFFER from 10.35.128.254 Apr 20 16:13:49 orchid-vds2.qa.lab.tlv.redhat.com dhclient[3238]: DHCPACK from 10.35.128.254 (xid=0x9d9f109) Apr 20 16:13:49 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492694029.0959] dhcp4 (bond1): address 10.35.128.227 Apr 20 16:13:49 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492694029.0963] dhcp4 (bond1): plen 24 (255.255.255.0) Apr 20 16:13:49 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492694029.0963] dhcp4 (bond1): gateway 10.35.128.254 Apr 20 16:13:49 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[976]: <info> [1492694029.0963] dhcp4 (bond1): server identifier 10.35.28.1 Created attachment 1273019 [details]
NM log in debug
Version - cockpit-dashboard-135-4.el7.x86_64 cockpit-ovirt-dashboard-0.10.7-0.0.17.el7ev.noarch cockpit-storaged-135-4.el7.noarch cockpit-bridge-135-4.el7.x86_64 cockpit-ws-135-4.el7.x86_64 cockpit-system-135-4.el7.noarch NetworkManager-libnm-1.4.0-19.el7_3.x86_64 NetworkManager-team-1.4.0-19.el7_3.x86_64 NetworkManager-config-server-1.4.0-19.el7_3.x86_64 NetworkManager-1.4.0-19.el7_3.x86_64 NetworkManager-tui-1.4.0-19.el7_3.x86_64 The idea was that Cockpit does not activate any slave, but leaves it to NM to determine the order. NM itself is changing its behavior in this area, too, and I don't really know which version of it is where and behaves how. Could you try this with just "nmcli"? I'll assign this to NetworkManager. Please assign back if you think I need to provide more evidence that Cockpit is "doing the right thing". (In reply to Marius Vollmer from comment #5) > The idea was that Cockpit does not activate any slave, but leaves it to NM > to determine the order. NM itself is changing its behavior in this area, > too, and I don't really know which version of it is where and behaves how. > > Could you try this with just "nmcli"? > > I'll assign this to NetworkManager. Please assign back if you think I need > to provide more evidence that Cockpit is "doing the right thing". If using nmcli, then you must specify that the primary slave will come and enslaved first. If not, the second slave can come before the primary and we will end up with the described situation in this bug. If not sending the correct order, we might end up with wrong ip and MAC for the bond. I don't think that NetworkManager knows the correct order for the slaves come up, unless sending and specifying this with nmcli command. For example, this is how i send it with nmcli : [root@orchid-vds2 ~]# nmcli connection show NAME UUID TYPE DEVICE System enp4s0 c81d9f81-beea-4b64-9568-631dc4a8e44e 802-3-ethernet enp4s0 virbr0 43b12d22-67be-420c-ac88-b4d7c4765caf bridge virbr0 enp6s0 73127947-780e-408e-b3b9-a0955bee2b5d 802-3-ethernet -- ens1f0 fc6850dc-9b81-4371-b71e-6af577dacc63 802-3-ethernet -- ens1f1 bb874038-8edb-4827-9e0f-af12d0d14b51 802-3-ethernet -- [root@orchid-vds2 ~]# nmcli connection add type bond con-name bond1 ifname bond1 mode active-backup primary enp4s0; \ > nmcli connection modify id bond1 ipv4.method auto ipv6.method ignore; \ > nmcli con mod uuid c81d9f81-beea-4b64-9568-631dc4a8e44e ipv4.method disabled ipv6.method ignore; \ > nmcli connection modify uuid c81d9f81-beea-4b64-9568-631dc4a8e44e connection.slave-type bond connection.master bond1 connection.autoconnect yes; \ > nmcli connection modify id enp6s0 connection.slave-type bond connection.master bond1 connection.autoconnect yes; \ > nmcli con down uuid c81d9f81-beea-4b64-9568-631dc4a8e44e; \ > nmcli con up uuid c81d9f81-beea-4b64-9568-631dc4a8e44e; \ > nmcli con down id enp6s0; \ > nmcli con up id enp6s0; \ > nmcli con up id bond1 Connection 'bond1' (4b9d349e-4aa0-4ff4-a5e9-992024491030) successfully added. Connection 'System enp4s0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/0) Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5) Connection 'enp6s0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/4) Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/6) Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/7) - If i'm not sending to NM that 'System enp4s0' connection(the primary slave) should come first, then i might get the wrong MAC and IP for the bond and 'enp6s0' might come first, and not 'enp4s0'. And this exactly what happens when creating the bond in active-backup via cockpit. There is a meaning for the order of the slaves coming up and enslaved to the bond. NM 1.8 has introduced a configuration parameter to control the order the slaves are coming up: slaves-order=name But that needs to be set explicitly like vdsm does: https://gerrit.ovirt.org/#/c/78362/3/static/etc/NetworkManager/conf.d/vdsm.conf What I find strange here is, that existing behaviour should not have been changed (the order was based on the iface index, but I guess you reused the same machines so why would that change?). This may have also hit us on another BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1463218 Let me add here a little bit of context: Some changes were applied to cockpit and NM in order to have coherent behavior when creating the bond and when restarting the machine. As you know, the bond's MAC is taken by the first enslaved interface. What happened in the past was that on bond creation, cockpit determined the MAC by the order it activated the slaves but on reboot was NetworkManager that picked up the order of activation. This was addressed with two actions: 1) cockpit was changed to allow the order of enslaved interfaces to be picked up by NM: to do this it will activate as last the master interface 2) NM was patched to allow device activation on the basis of their name (option should be selected in config file, otherwise activation order will be on the basis of the ifindex... but it should have landed as default config for RHEV) This would allow coherency in the activation (enslave) order (and so MAC address) during creation and boot. There is no special enslave order for active-backup mode (the only one in which you can specify the primary device). If a specific MAC is required, it can now be specified on the master changing the 802-3-ethernet.cloned-mac-address property. With nmcli would be: nmcli connection modify BOND_CONN_NAME 802-3-ethernet.cloned-mac-address DESIRED_MAC The issue suggested by Edward is not related to this one but seems instead pointing to a NetworkManager bug. I would just close this bug. Anyway, the only other option that you may want is to set the MAC of the primary interface in cockpit, leveraging the "cloned-mac-address property". For this reason, reassigning to cockpit, but I think could be closed. (In reply to Francesco Giudici from comment #8) > Let me add here a little bit of context: > Some changes were applied to cockpit and NM in order to have coherent > behavior when creating the bond and when restarting the machine. > As you know, the bond's MAC is taken by the first enslaved interface. > What happened in the past was that on bond creation, cockpit determined the > MAC by the order it activated the slaves but on reboot was NetworkManager > that picked up the order of activation. > > This was addressed with two actions: > 1) cockpit was changed to allow the order of enslaved interfaces to be > picked up by NM: to do this it will activate as last the master interface > 2) NM was patched to allow device activation on the basis of their name > (option should be selected in config file, otherwise activation order will > be on the basis of the ifindex... but it should have landed as default > config for RHEV) > > This would allow coherency in the activation (enslave) order (and so MAC > address) during creation and boot. > > There is no special enslave order for active-backup mode (the only one in > which you can specify the primary device). > > If a specific MAC is required, it can now be specified on the master > changing the 802-3-ethernet.cloned-mac-address property. > With nmcli would be: > nmcli connection modify BOND_CONN_NAME 802-3-ethernet.cloned-mac-address > DESIRED_MAC > > The issue suggested by Edward is not related to this one but seems instead > pointing to a NetworkManager bug. > > I would just close this bug. Anyway, the only other option that you may want > is to set the MAC of the primary interface in cockpit, leveraging the > "cloned-mac-address property". For this reason, reassigning to cockpit, but > I think could be closed. I really don't think that this bug should be closed as it is still a bug. I believe that it's now depends on BZ 1472965. This bug can be closed only after tested and verified on latest cockpit version and latest rhel7.4 version with the fix for NM bug for setting up a bond^^. I'm affected by this bug 100% of the times when creating a bond mode=1. No matter which cockpit version i'm using(tested latest 141). Every time when i'm trying to create a bond mode=1 and choosing primary slave, i always end up from the wrong IP of the second slave and loosing connection. As currently on cockpit version 141 we still don't have an option set the MAC of the primary interface in cockpit, leveraging the "cloned-mac-address property", i always affected by this bug and it's not possible to work this way. What is the status of this report? does cockpit team going to fix it? add option to set the MAC of the primary interface? VDSM is placing a configuration file under NetworkManager/conf.d to set the correct slave order. But this is effective only after VDSM is installed and the host rebooted (or NM restarted). https://github.com/oVirt/vdsm/blob/master/static/etc/NetworkManager/conf.d/vdsm.conf The problem I think we face here is that the default NM slave order is still active at the stage cockpit is used. After VDSM is installed and host rebooted, the order used is as defined by vdsm.conf (slaves-order=name). The only workaround I can think of is to install VDSM, reboot the host (or restart NM service) and then create the bond using cockpit. Another option is to create the bond using RHV and not cockpit. A proposed solution is for cockpit to deploy the required NM configuration (and restart NM service) per request. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |