Bug 1371126
Summary: | layer 2-only device is taken down when NetworkManager stops | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Thomas Haller <thaller> | ||||
Component: | NetworkManager | Assignee: | Thomas Haller <thaller> | ||||
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.4 | CC: | aloughla, atragler, bgalvani, bmcclain, danken, dcbw, fgiudici, lrintel, mburman, rkhan, snagar, thaller, vbenes, ycui, ylavi | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | 7.4 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | NetworkManager-1.4.0-13.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1390667 (view as bug list) | Environment: | |||||
Last Closed: | 2017-08-01 09:17:07 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1452062 | ||||||
Bug Blocks: | 1390667 | ||||||
Attachments: |
|
Description
Thomas Haller
2016-08-29 12:08:51 UTC
please review: th/device-ip-state-for-disabled-rh1371126 Wouldn't you need to update nm-modem.c too for the same reason in nm_modem_stage3_ip[4|6]_config_start()? It overrides act_stage3_ip6_config_start() and would need the same fixes. I think there are two cases here: 1) when an IP method has been requested (with or without may-fail). Here, we should set IP_FAIL and may-fail will figure out whether the entire activation should fail. And we could rename RETURN_STOP to RETURN_FAIL to make it clearer that this is a hard failure when success was expected. 2) when the IP method has been explicitly disabled (with disable/ignore). Here I think your patch is correct and this should not set IP_FAIL. But we need to update nm-modem.c too. NMDeviceModem overrides the NMDevice::act_stage3_ip4_config_start() with its own logic. That gets to nm_modem_stage3_ip[4|6]_config_start() where the same logic needs to be applied for ignore/disabled. Now, the other use of STOP in this function doesn't indicate failure so we should replace that with SUCCESS too. What happens here is that NMModem decides what IP methods to ask the modem to use based on modem capabilities, IP method, and may-fail. If the modem doesn't support IPv6 and ipv6.may-fail=true then NM will request only IPv4 and we'll get priv->ip6_method = NM_MODEM_IP_METHOD_UNKNOWN and should treat that as ipv6.method=ignore and thus return SUCCESS. nm_device_activate_stage3_ip6_start() still gets called even though the modem doesn't support IPv6, because the modem capabilities don't get pushed up to NMDevice. So in summary, I think with a little rework we can clarify how IP failure and disable works nicely: - RETURN_FAIL will set IP_FAIL and will *always* indicate that the IP method was requested and failed, including if the device doesn't support the method. This would require switching nm-modem.c::nm_modem_stage3_ip[4|6]_config_start() to return RETURN_FAIL in the 'default' switch case, which is supposed to indicate that the modem doesn't support the IP method. - If the IP method was disabled/ignored that results in RETURN_SUCCESS and IP_DONE because nothing failed, like your patch does for nm-device.c, but we also need to do it in nm-modem.c where the code checks disabled/ignored and currently returns STOP. Does that all sound reasonable? (In reply to Dan Williams from comment #3) > Now, the other use of STOP in this function doesn't indicate failure so we > should replace that with SUCCESS too. What happens here is that NMModem > decides what IP methods to ask the modem to use based on modem capabilities, > IP method, and may-fail. If the modem doesn't support IPv6 and > ipv6.may-fail=true then NM will request only IPv4 and we'll get > priv->ip6_method = NM_MODEM_IP_METHOD_UNKNOWN and should treat that as > ipv6.method=ignore and thus return SUCCESS. > nm_device_activate_stage3_ip6_start() still gets called even though the > modem doesn't support IPv6, because the modem capabilities don't get pushed > up to NMDevice. Grr, I changed my mind after writing this so please ignore this section. The "In summary" and everything after reflects my opinion. (In reply to Dan Williams from comment #4) > Grr, I changed my mind after writing this so please ignore this section. > The "In summary" and everything after reflects my opinion. I think I addressed you comment. Reworked and repushed to th/device-ip-state-for-disabled-rh1371126 th/device-ip-state-for-disabled-rh1371126 LGTM merged to upstream master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=eb7e2e4cd684dbd1fcb9339553a19451fa9432ab Created attachment 1203646 [details] test script for creating devices and stopping NetworkManager strange... testing on NetworkManager 1.4.0-9.el7, I now cannot reproduce the original issue -- although the fix from comment 7 is not in rhel-7.3 Note also, the patch from comment 7 introduced a regression, see https://bugzilla.gnome.org/show_bug.cgi?id=771579 (In reply to Thomas Haller from comment #8) > Created attachment 1203646 [details] > test script for creating devices and stopping NetworkManager > > strange... testing on NetworkManager 1.4.0-9.el7, I now cannot reproduce the > original issue -- although the fix from comment 7 is not in rhel-7.3 > > Note also, the patch from comment 7 introduced a regression, see > https://bugzilla.gnome.org/show_bug.cgi?id=771579 I was unable to reproduce it, because - unmanaged_on_quit() returned FALSE, - because nm_platform_link_can_assume() returned FALSE, - because it didn't find an IPv6 address in my test. If you set ipv6.method=ignore, it is quite possible that you get an IPv6 address assigned. In which case, the original issue is reproducible. See also bug 1378418 @NetworkManager-team: please review https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=th%2Fno-unmanage-on-quit-rh1371126 + if (!NM_IN_STRSET (nm_utils_get_ip_config_method (connection, NM_TYPE_SETTING_IP4_CONFIG), + NM_SETTING_IP6_CONFIG_METHOD_SHARED)) Shouldn't that be IP4_CONFIG_METHOD_SHARED? Not that it makes a difference operationally, but should probably be changed here. I'm a bit worried about other types of devices that aren't wifi and require external daemons though, like ADSL, Bluetooth, WWAN, etc. We can't pick up and assume that configuration on restart. How are those now handled? (In reply to Dan Williams from comment #14) > + if (!NM_IN_STRSET (nm_utils_get_ip_config_method (connection, > NM_TYPE_SETTING_IP4_CONFIG), > + NM_SETTING_IP6_CONFIG_METHOD_SHARED)) > > Shouldn't that be IP4_CONFIG_METHOD_SHARED? Not that it makes a difference > operationally, but should probably be changed here. > > I'm a bit worried about other types of devices that aren't wifi and require > external daemons though, like ADSL, Bluetooth, WWAN, etc. We can't pick up > and assume that configuration on restart. How are those now handled? you are right on both accounts. How about v2? (repushed). new scratch build with version 2: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11908088 v2 LGTM fix merged upstream: master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d298b7c96d939b6c5cb7eddd7824e4f0cd695294 nm-1-4: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=b14fe3ce0802c708b25982899d787bbbf15eebff scratch-build v3 https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11985134 this is the patch that was eventually merged upstream. This is still not working well for bond. reverting to assigned *** Bug 1455126 has been marked as a duplicate of this bug. *** This bug failed QA due to CI failure from bug 1455126. However, the test bond_leave_L2_only_up_when_going_down is no longer failing. Instead, failing is the test before it (vlan_over_no_L3_bond_restart_persistence, bug 1452062. Setting Depends on rh 1452062. Bringing this back to ON_QA as the test is not failing anymore. (Side effect of some other fix?) (In reply to Thomas Haller from comment #25) > This bug failed QA due to CI failure from bug 1455126. > > However, the test bond_leave_L2_only_up_when_going_down is no longer failing. > > Instead, failing is the test before it > (vlan_over_no_L3_bond_restart_persistence, bug 1452062. > > Setting Depends on rh 1452062. bug 1452062 seems to be fixed now too. This issue is likely fixed for good. If not, please open a new bug instead of reopening, because this issue is rather unspecific. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2299 |