Bug 1294728
Summary: | the team connection becomes incorrect after I restart NetworkManager | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | liyunwang <1121083764> | ||||||||
Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 7.2 | CC: | 1121083764, bgalvani, dcbw, lrintel, rkhan, thaller, vbenes | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | NetworkManager-1.4.0-0.2.git20160621.072358da.el7 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2016-11-03 19:05:47 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1203710, 1301628, 1313485 | ||||||||||
Attachments: |
|
NetworkManager takes over existing connections on restart. If it can't match an existing profile with device data, it creates a new profile to back up the live data. Could you, please, paste output of $ nmcli con show team0-slave1 $ nmcli con show eno16780032 and also attach NetworkManager logs showing the restart (/var/log/messages) [root@node199 ~]# nmcli con show team0-slave1 connection.id: team0-slave1 connection.uuid: 5308a214-b0ae-46b5-b0dd-7d5bfa5e30c4 connection.interface-name: eno16780032 connection.type: 802-3-ethernet connection.autoconnect: yes connection.autoconnect-priority: 0 connection.timestamp: 1451437456 connection.read-only: no connection.permissions: connection.zone: -- connection.master: team0 connection.slave-type: team connection.autoconnect-slaves: -1 (default) connection.secondaries: connection.gateway-ping-timeout: 0 connection.metered: 未知 802-3-ethernet.port: -- 802-3-ethernet.speed: 0 802-3-ethernet.duplex: -- 802-3-ethernet.auto-negotiate: yes 802-3-ethernet.mac-address: 00:0C:29:BB:F9:48 802-3-ethernet.cloned-mac-address: -- 802-3-ethernet.mac-address-blacklist: 802-3-ethernet.mtu: 自动 802-3-ethernet.s390-subchannels: 802-3-ethernet.s390-nettype: -- 802-3-ethernet.s390-options: 802-3-ethernet.wake-on-lan: 1 (default) 802-3-ethernet.wake-on-lan-password: -- team-port.config: -- [root@node199 ~]# [root@node199 ~]# nmcli con show eno16780032 connection.id: eno16780032 connection.uuid: f1729d83-4850-4be8-8bd2-f3a8db98ab4e connection.interface-name: eno16780032 connection.type: 802-3-ethernet connection.autoconnect: no connection.autoconnect-priority: 0 connection.timestamp: 1451529045 connection.read-only: no connection.permissions: connection.zone: -- connection.master: team0 connection.slave-type: team connection.autoconnect-slaves: -1 (default) connection.secondaries: connection.gateway-ping-timeout: 0 connection.metered: 未知 802-3-ethernet.port: -- 802-3-ethernet.speed: 0 802-3-ethernet.duplex: -- 802-3-ethernet.auto-negotiate: yes 802-3-ethernet.mac-address: 00:0C:29:BB:F9:48 802-3-ethernet.cloned-mac-address: -- 802-3-ethernet.mac-address-blacklist: 802-3-ethernet.mtu: 自动 802-3-ethernet.s390-subchannels: 802-3-ethernet.s390-nettype: -- 802-3-ethernet.s390-options: 802-3-ethernet.wake-on-lan: 1 (default) 802-3-ethernet.wake-on-lan-password: -- team-port.config: { "link_watch": { "name": "ethtool" } } GENERAL.名称: eno16780032 GENERAL.UUID: f1729d83-4850-4be8-8bd2-f3a8db98ab4e GENERAL.设备: eno16780032 GENERAL.状态: 已激活 GENERAL.默认: 否 GENERAL.DEFAULT6: 否 GENERAL.VPN: 否 GENERAL.ZONE: -- GENERAL.DBUS-PATH: /org/freedesktop/NetworkManager/ActiveConnection/1 GENERAL.CON-PATH: /org/freedesktop/NetworkManager/Settings/4 GENERAL.SPEC 对象: / GENERAL.MASTER-PATH: /org/freedesktop/NetworkManager/Devices/1 [root@node199 ~]# Created attachment 1110686 [details]
log
[root@node199 ~]# nmcli con show team0-slave1 connection.id: team0-slave1 connection.uuid: 5308a214-b0ae-46b5-b0dd-7d5bfa5e30c4 connection.interface-name: eno16780032 connection.type: 802-3-ethernet connection.autoconnect: yes connection.autoconnect-priority: 0 connection.timestamp: 1451437456 connection.read-only: no connection.permissions: connection.zone: -- connection.master: team0 connection.slave-type: team connection.autoconnect-slaves: -1 (default) connection.secondaries: connection.gateway-ping-timeout: 0 connection.metered: unknown 802-3-ethernet.port: -- 802-3-ethernet.speed: 0 802-3-ethernet.duplex: -- 802-3-ethernet.auto-negotiate: yes 802-3-ethernet.mac-address: 00:0C:29:BB:F9:48 802-3-ethernet.cloned-mac-address: -- 802-3-ethernet.mac-address-blacklist: 802-3-ethernet.mtu: auto 802-3-ethernet.s390-subchannels: 802-3-ethernet.s390-nettype: -- 802-3-ethernet.s390-options: 802-3-ethernet.wake-on-lan: 1 (default) 802-3-ethernet.wake-on-lan-password: -- team-port.config: -- [root@node199 ~]# [root@node199 ~]# nmcli con show eno16780032 connection.id: eno16780032 connection.uuid: f1729d83-4850-4be8-8bd2-f3a8db98ab4e connection.interface-name: eno16780032 connection.type: 802-3-ethernet connection.autoconnect: no connection.autoconnect-priority: 0 connection.timestamp: 1451529645 connection.read-only: no connection.permissions: connection.zone: -- connection.master: team0 connection.slave-type: team connection.autoconnect-slaves: -1 (default) connection.secondaries: connection.gateway-ping-timeout: 0 connection.metered: unknown 802-3-ethernet.port: -- 802-3-ethernet.speed: 0 802-3-ethernet.duplex: -- 802-3-ethernet.auto-negotiate: yes 802-3-ethernet.mac-address: 00:0C:29:BB:F9:48 802-3-ethernet.cloned-mac-address: -- 802-3-ethernet.mac-address-blacklist: 802-3-ethernet.mtu: auto 802-3-ethernet.s390-subchannels: 802-3-ethernet.s390-nettype: -- 802-3-ethernet.s390-options: 802-3-ethernet.wake-on-lan: 1 (default) 802-3-ethernet.wake-on-lan-password: -- team-port.config: { "link_watch": { "name": "ethtool" } } GENERAL.NAME: eno16780032 GENERAL.UUID: f1729d83-4850-4be8-8bd2-f3a8db98ab4e GENERAL.DEVICES: eno16780032 GENERAL.STATE: activated GENERAL.DEFAULT: no GENERAL.DEFAULT6: no GENERAL.VPN: no GENERAL.ZONE: -- GENERAL.DBUS-PATH: /org/freedesktop/NetworkManager/ActiveConnection/1 GENERAL.CON-PATH: /org/freedesktop/NetworkManager/Settings/4 GENERAL.SPEC-OBJECT: / GENERAL.MASTER-PATH: /org/freedesktop/NetworkManager/Devices/1 [root@node199 ~]# Currently there are 2 problems when NM is restarted: - the team.config of the generated connection is not populated properly because nm-device-team.c:update_connection() is called synchronously during startup, while the handle to the teamdctl instance is set later when g_bus_watch_name() signals the presence of the service on the bus. This can be fixed easily by trying to grab a temporary teamdctl handle in update_connection() if the NMDeviceTeam instance doesn't have one - even if we fix the previous point, there is no guarantee that the team configurations written to the daemon and retrieved later compare to the same string (usually they don't). For example a configuration written as '{ "link_watch": { "name": "ethtool"}}' is read later as: '{ "device": "team0", "link_watch": { "name": "ethtool" }, "runner": { "name": "roundrobin" } }' and hence the comparison between generated and on-disk connection will fail. Any ideas on how to approach this second point? Of course a possible solution is to ignore the team.conf property during comparison by removing the NM_SETTING_PARAM_INFERRABLE flag, but then connection assuming will not work correctly when there are multiple connection with different configurations. (In reply to Beniamino Galvani from comment #6) > Any ideas on how to approach this second point? Of course a possible > solution is to ignore the team.conf property during comparison by > removing the NM_SETTING_PARAM_INFERRABLE flag, but then connection > assuming will not work correctly when there are multiple connection > with different configurations. I think it should be possible to normalize the JSON strong so that we can do a string compare. Downside is, we probably need to link the team-plugin to a json library. Pushed RFC branch bg/team-conf-check-rh1294728. >> build: rename --enable-teamdctl configure option to --enable-team is it possible (with acceptable effort) to keep the old option as fallback, so that old build-scripts continue to work? Preferably, also our ./configure options stay backward compatible. >> build: require Jansson JSON library to build team device plugin can you also update contrib/rpm's SPEC file with the new build-requirement? And just squash "contrib/fedora: update REQUIRED_PACKAGES file with jansson-devel" into this commit? I think as teamd already has a jansson requirement, it is very reasonable that our team plugin requires that library too. >> device/team: always try to connect to teamd in update_connection() watch_active is never set to FALSE. I think it should. >> device/team: compare team config to check connection compatibility seams team_config_equal() leaks strings. gs_free ? >> libnm-core: remove INFERRABLE flag from team.config property The property should not be used to compare compatible connections. could you elaborate in the commit message, telling that now the team-config should not longer be comparer in nm_utils_match_connection() as INFERABLE property, but it is checked via check_connection_compatible(). (In reply to Thomas Haller from comment #9) > >> build: rename --enable-teamdctl configure option to --enable-team > > is it possible (with acceptable effort) to keep the old option as fallback, > so that old build-scripts continue to work? Preferably, also our ./configure > options stay backward compatible. The new name sounds more appropriate, but if we want to keep backwards compatibility for configure options we should keep the old one, it's not so bad. I'd prefer to avoid adding duplicate options since this adds unneeded complexity. Also, fixed according to other suggestions and repushed. (In reply to Beniamino Galvani from comment #10) > (In reply to Thomas Haller from comment #9) > > >> build: rename --enable-teamdctl configure option to --enable-team > > > > is it possible (with acceptable effort) to keep the old option as fallback, > > so that old build-scripts continue to work? Preferably, also our ./configure > > options stay backward compatible. > > The new name sounds more appropriate, but if we want to keep backwards > compatibility for configure options we should keep the old one, it's not > so bad. I'd prefer to avoid adding duplicate options since this adds unneeded > complexity. OK. No strong opinion. All options are ugly in their own way. > Also, fixed according to other suggestions and repushed. could you move commit "contrib/fedora: add dependency on jansson" as first. Otherwise contrib/rpm is broken for the first patches. + if (!json1 || !json2) { + ret = g_strcmp0 (conf1, conf2); "!" + ret = !strcmp (dump1, dump2); should be "dump1 && dump2 && !strcmp (dump1, dump2);" json1 = json_loads (conf1, 0, &jerror); + if (json1) json2 = json_loads (conf2, 0, &jerror); hm, maybe let's do the renaming of the configure option -- but with supporting the old name too? I like the new name better. I think the overhead to also support a deprecated option is acceptable. E.g. see deprecated --enable-more-asserts: http://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/configure.ac?id=0f4ddf78824e176543bd89f2b65cbedc86d57f27#n883 ... acceptable IMO (In reply to Thomas Haller from comment #12) > hm, maybe let's do the renaming of the configure option -- but with > supporting the old name too? I like the new name better. Ok, fixed and repushed. lgtm I think the FIXME comment in nm-device-team.c was actually wrong... check_connection_compatible() is supposed to check if connections are compatible with a device's immutable properties (like device type, permanent MAC address, etc) not if it's compatible with the current configuration. Basically, if the connection could ever be started on the device now or in the future, compatible == TRUE. I think having the comparison there would cause any team config that doesn't exactly match the current config to not be compatible with the device, which would preclude using two different team configs with the device. To fix problem #2 from comment 6, when comparing the NMSettingTeam configs we probably need to normalize the two 'config' properties we're comparing in libnm. We want to validate the JSON there too, so it would be a bonus to do it there too. (In reply to Dan Williams from comment #15) > > To fix problem #2 from comment 6, when comparing the NMSettingTeam configs > we probably need to normalize the two 'config' properties we're comparing in > libnm. We want to validate the JSON there too, so it would be a bonus to do > it there too. which (in the current implementation) would require libnm to link against the json library... (In reply to Thomas Haller from comment #16) > (In reply to Dan Williams from comment #15) > > > > To fix problem #2 from comment 6, when comparing the NMSettingTeam configs > > we probably need to normalize the two 'config' properties we're comparing in > > libnm. We want to validate the JSON there too, so it would be a bonus to do > > it there too. > > which (in the current implementation) would require libnm to link against > the json library... Unfortunately yes. I don't like to add a new dependency to libnm but at the same time I don't see other solutions. Maybe it would be possible to make libnm dlopen at runtime the needed symbols from jansson? Anyway, I updated the branch moving the JSON check and comparison logic to libnm and it seems to be working well (solving the problems Dan mentioned). The remaining issue is if it's acceptable to have libnm depend on jansson... In _nm_utils_check_valid_json(), g_set_error() does nothing error == NULL so you don't need the extra check for "if (error)". The rest LGTM. The only way around the jansson dep issue that I can realistically see is to allow the team config validation to be disabled or minimally validated and select between the two with a configure-time switch. That said, jansson is extremely small (52k stripped). This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions (In reply to Dan Williams from comment #18) > The only way around the jansson dep issue that I can realistically see is to > allow the team config validation to be disabled or minimally validated and > select between the two with a configure-time switch. That said, jansson is > extremely small (52k stripped). Added a configure switch to enable the dependency and repushed. *** Bug 1310636 has been marked as a duplicate of this bug. *** _nm_utils_check_valid_json() should behave identical for WITH/WITHOUT_JANSSON when passing NULL or "". For example, if you call WITH_JANSSON: _nm_utils_check_valid_json(NULL, NULL) yields FALSE. WITHOUT_JANSSON: _nm_utils_check_valid_json(NULL, NULL) yields TRUE. I suspect that WITH_JANSSON: _nm_utils_check_valid_json("", NULL) yields also FALSE (didn't check that). They should agree. Also, verify() does: if (priv->config) { if (!_nm_utils_check_valid_json (priv->config, error)) { thus, "" is not valid. For invalid arguments, _nm_utils_team_config_equal() should return FALSE too and not coerce it to "{}". Or alternatively, if we decide that "" equals "{}", then _nm_utils_check_valid_json() should return TRUE for it. tl;dr: it should be defined what NULL and "" means (invalid, or "{}", or something else), and then all functions should agree on that. I still dislike to increase our dependency to libjansson. But it is arguably useful and no easy alternative. So, ACK to that... Having --disable-json-validation, doesn't really help there as every distribution will build the library with full support. (In reply to Thomas Haller from comment #22) > tl;dr: it should be defined what NULL and "" means (invalid, or "{}", or > something else), and then all functions should agree on that. _nm_utils_check_valid_json() checks if a JSON is valid and so it should return FALSE for NULL, empty or malformed strings. I agree that the function should behave the same on NULL and "" whether Jansson is enabled or not. But _nm_utils_team_config_equal() is a bit different, since it is supposed to have a knowledge of the team configuration semantics. So in my opinion it makes sense to consider NULL equal to a "{}" empty configuration. Otherwise the caller would have to check this. Pushed again bg/team-conf-check-rh1294728. > I still dislike to increase our dependency to libjansson. But it is arguably > useful and no easy alternative. So, ACK to that... Having > --disable-json-validation, doesn't really help there as every distribution > will build the library with full support. Probably there are cases (embedded systems?) where people will never build team support and it would be useful to have an option to avoid increasing dependencies without a real need. But I don't have a strong opinion on this... (In reply to Beniamino Galvani from comment #23) > (In reply to Thomas Haller from comment #22) > > > I still dislike to increase our dependency to libjansson. But it is arguably > > useful and no easy alternative. So, ACK to that... Having > > --disable-json-validation, doesn't really help there as every distribution > > will build the library with full support. > > Probably there are cases (embedded systems?) where people will never > build team support and it would be useful to have an option to avoid > increasing dependencies without a real need. But I don't have a strong > opinion on this... Certainly there should be a configure option to build without. I just say, a distro most likely will enable it, so all users pay the price even if they don't use team. But I don't know how to avoid that. Pushed one fixup. Other then that, lgtm. (In reply to Thomas Haller from comment #24) > Certainly there should be a configure option to build without. I just say, a > distro most likely will enable it, so all users pay the price even if they > don't use team. But I don't know how to avoid that. This is probably the only solution... > Pushed one fixup. Squashed, thanks. > Other then that, lgtm. Merged to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=a5f794ac0fcba47004ba23027dda108db1419284 Created attachment 1162212 [details]
[PATCH] libnm-core: fix comparison of team configuration
Additional fix, please review.
LGTM lgtm (In reply to Beniamino Galvani from comment #26) > Created attachment 1162212 [details] > [PATCH] libnm-core: fix comparison of team configuration Applied to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=0dc999d80ec65dfaf230b8e909d543acef4af677 as I said in https://bugzilla.redhat.com/show_bug.cgi?id=1310435#c11 there is still a failure as team-slave has no default config. It can survive one restart correctly (if upped in certain order) but the second one is no-go. [root@qe-dell-ovs5-vm-5 ~]# nmcli connection up id team Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/3) [root@qe-dell-ovs5-vm-5 ~]# nmcli connection up id team-eth1 Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/4) [root@qe-dell-ovs5-vm-5 ~]# systemctl restart NetworkManager [root@qe-dell-ovs5-vm-5 ~]# nmcli device DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 eth1 ethernet connected team-eth1 nm-team team connected team [root@qe-dell-ovs5-vm-5 ~]# systemctl restart NetworkManager [root@qe-dell-ovs5-vm-5 ~]# nmcli device DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 eth1 ethernet connected eth1 nm-team team connected nm-team and: [root@qe-dell-ovs5-vm-5 ~]# nmcli connection up id team-eth1 Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/3) [root@qe-dell-ovs5-vm-5 ~]# nmcli device DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 eth1 ethernet connected team-eth1 nm-team team connected team [root@qe-dell-ovs5-vm-5 ~]# systemctl restart NetworkManager [root@qe-dell-ovs5-vm-5 ~]# nmcli device DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 eth1 ethernet connected eth1 nm-team team connected nm-team but: [root@qe-dell-ovs5-vm-5 ~]# nmcli connection up id team Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/3) [root@qe-dell-ovs5-vm-5 ~]# nmcli connection up id team-eth1 Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/4) [root@qe-dell-ovs5-vm-5 ~]# systemctl restart NetworkManager [root@qe-dell-ovs5-vm-5 ~]# nmcli device DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 eth1 ethernet connected team-eth1 nm-team team connected team and so on. So this only works when you start team, then slave and then service restart. This is incorrect. (In reply to Vladimir Benes from comment #31) > as I said in https://bugzilla.redhat.com/show_bug.cgi?id=1310435#c11 there > is still a failure as team-slave has no default config. It can survive one > restart correctly (if upped in certain order) but the second one is no-go. > and so on. So this only works when you start team, then slave and then > service restart. This is incorrect. Can you provide the team configuration you used and/or logs? Thanks. [root@qe-dell-ovs5-vm-50 NetworkManager]# nmcli connection add type team [root@qe-dell-ovs5-vm-50 NetworkManager]# nmcli connection add type ethernet ifname eth1 master nm-team con-name slave [root@qe-dell-ovs5-vm-50 NetworkManager]# nmcli device DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 eth1 ethernet connected slave nm-team team connected team [root@qe-dell-ovs5-vm-50 NetworkManager]# systemctl restart NetworkManager [root@qe-dell-ovs5-vm-50 NetworkManager]# nmcli device DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 eth1 ethernet connected eth1 nm-team team connected nm-team that's it. There seems to be some issue with repeated up of team but that's another story. Added missing --enable-json-validation build option to the spec file. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2581.html |
Created attachment 1110434 [details] duplicate team connection information Description of problem: I create a team connection successfully, like this: [root@node199 ~]# nmcli connection show 名称 UUID 类型 设备 team0-slave2 0a5e7430-9c86-42dc-b5a2-d1eec6fb437d 802-3-ethernet eno33559296 team0-slave1 5308a214-b0ae-46b5-b0dd-7d5bfa5e30c4 802-3-ethernet eno16780032 virbr0 84aeacff-fc3e-4951-bf15-5a0e6cf9e144 bridge virbr0 Team0 3128112a-578b-4c05-b30c-de6698629920 team team0 And then I restart NetworkManager. The problem is that my team connection becomes incorrect,like this: [root@node199 ~]# nmcli connection show 名称 UUID 类型 设备 team0-slave1 5308a214-b0ae-46b5-b0dd-7d5bfa5e30c4 802-3-ethernet -- Team0 3128112a-578b-4c05-b30c-de6698629920 team -- team0-slave2 0a5e7430-9c86-42dc-b5a2-d1eec6fb437d 802-3-ethernet -- eno33559296 feb582c4-53e2-4cd7-a806-99523cf7afc6 802-3-ethernet eno33559296 virbr0 4623333c-adba-499b-9095-fcac3f1f2d65 bridge virbr0 eno16780032 f1729d83-4850-4be8-8bd2-f3a8db98ab4e 802-3-ethernet eno16780032 team0 dec681e4-df51-472c-897e-c7ba13dda3fd team team0 Version-Release number of selected component (if applicable): rhel-server-7.2-x86_64 How reproducible: always Steps to Reproduce: 1.create a team connection using NetworkManager or nmcli; 2.restart NetworkManager; 3.nmcli connection show; Actual results: The team connection becomes incorrect. Expected results: The team connection is correct. Additional info: There is also someting wrong in the NetworkManager,please refer to the attachment.