Bug 1270814
| Summary: | Setting up team with invalid json config leads to inconsistent state | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Vitezslav Humpa <vhumpa> | |
| Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | |
| Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.2 | CC: | bgalvani, dcbw, lkundrak, lrintel, rkhan, thaller, vbenes | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1367752 (view as bug list) | Environment: | ||
| Last Closed: | 2016-11-03 19:17:19 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1301628, 1313485, 1367752 | |||
| Attachments: | ||||
Created attachment 1084664 [details]
[PATCH] device: terminate the activation chain when entering the failed state
The attached patch fixes the inconsistent state of slave devices after the deletion of connections.
Created attachment 1084920 [details]
[PATCH v2] device: terminate the activation chain when entering the failed state
Reworded code comment.
LGTM LGTM Merged to master: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=c8e2339091c4623d4aab790ddf8feedd95a7cd24 and nm-1-0: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?h=nm-1-0&id=8b5e5a3dae0156c052cda977874ad410e182fc4b It's still not fixed properly, team master is failed immediately but slaves are not.
reproducer:
nmcli con add type team con-name team0 ifname nm-team
nmcli connection add type team-slave ifname eth1 con-name team0.0 master nm-team
nmcli connection add type team-slave ifname eth2 con-name team0.1 master nm-team
nmcli con modify team0 team.config "{\"blah\":1,\"blah\":2,\"blah\":3}"
nmcli connection up id team0
[root@wlan-r2s5 ~]# nmcli device
DEVICE TYPE STATE CONNECTION
eth0 ethernet connected testeth0
eth1 ethernet connecting (prepare) team0.0
eth2 ethernet connecting (prepare) team0.1
^^^ this is infinitely connecting
Pushed branch bg/slave-activation-fail-rh1270814 >> manager: don't auto-activate masters with zero autoconnect retries
When a slave activates it tears up the master too. In this case, this is not an "autoactivation" of master, instead it's like an explicit activation. The autoactivate-retry-count shouldn't matter.
Otherwise, if you have a slave and a master whos autoactivation-count is zero, activating the slave will fail.
Can you not instead avoid "..during master activation activate_slave_connections() resets the retry count of slaves"?
Rest lgtm
Created attachment 1174567 [details] [PATCH] policy: reset slaves' retry counter only for explicit activations (In reply to Thomas Haller from comment #12) > >> manager: don't auto-activate masters with zero autoconnect retries > Can you not instead avoid "..during master activation > activate_slave_connections() resets the retry count of slaves"? How about avoiding to reset the counter for auto-connections and do it only for explicit activations as in the attached patch? Or alternatively, we should track when the master activation-request was created as a consequence of a slave activation, by adding a new property to NMActivationRequest and use it to decide whether to skip the counter reset. (In reply to Beniamino Galvani from comment #13) > Created attachment 1174567 [details] > [PATCH] policy: reset slaves' retry counter only for explicit activations > > (In reply to Thomas Haller from comment #12) > > >> manager: don't auto-activate masters with zero autoconnect retries > > > Can you not instead avoid "..during master activation > > activate_slave_connections() resets the retry count of slaves"? > > How about avoiding to reset the counter for auto-connections and do it only > for explicit activations as in the attached patch? > > Or alternatively, we should track when the master activation-request was > created as a consequence of a slave activation, by adding a new property to > NMActivationRequest and use it to decide whether to skip the counter reset. the patch lgtm. Does it work? :) (In reply to Thomas Haller from comment #14) > the patch lgtm. Does it work? :) It seems so :) LGTM Merged to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=22fc078a39ac8ebc8e8413a934cadd9145e234c2 (In reply to Vladimir Benes from comment #9) > It's still not fixed properly, team master is failed immediately but slaves > are not. > reproducer: > nmcli con add type team con-name team0 ifname nm-team > nmcli connection add type team-slave ifname eth1 con-name team0.0 master > nm-team > nmcli connection add type team-slave ifname eth2 con-name team0.1 master > nm-team > nmcli con modify team0 team.config "{\"blah\":1,\"blah\":2,\"blah\":3}" > nmcli connection up id team0 > > [root@wlan-r2s5 ~]# nmcli device > DEVICE TYPE STATE CONNECTION > eth0 ethernet connected testeth0 > eth1 ethernet connecting (prepare) team0.0 > eth2 ethernet connecting (prepare) team0.1 > > ^^^ this is infinitely connecting this now works as [root@qe-dell-ovs5-vm-56 NetworkManager]# nmcli connection up id team0 Error: Connection activation failed: Active connection could not be attached to the device but: [root@qe-dell-ovs5-vm-56 NetworkManager]# nmcli connection up id team0.0 hangs do we need another bug? there is a bug 1367752 filed but this issue has been fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2581.html |
Created attachment 1082001 [details] Debug log of the reproducer Description of problem: Setting up a team with ethernet slaves leads to inconsistent state when configured with an invalid json configuration. That is that even after all connections have been deleted 'nmcli device' still shows the slave devices with their profiles as active. Reproducer: Set up team with 2 ethernet slaves $ nmcli con add type team con-name team0 ifname nm-team $ nmcli connection add type team-slave ifname eth1 con-name team0.0 master nm-team $ nmcli connection add type team-slave ifname eth2 con-name team0.1 master nm-team Modify team with invalid json config $ nmcli con modify team0 team.config "{blah blah blah}" Attempt to reactivate the connection $ nmcli con up team0 Error: Timeout 90 sec expired. Now, we have: [root@ibm-p8-02-lp8 ~]# nmcli c NAME UUID TYPE DEVICE testeth10 39c89451-a220-4867-86ed-69cb2ad7fd45 802-3-ethernet -- testeth5 60bf19a7-abe9-4f2c-9a00-885711a996d4 802-3-ethernet -- testeth8 c055d1cd-9d89-49bb-b8d1-6b4e75a02566 802-3-ethernet -- testeth2 b8bcf85b-7221-4df9-a24d-9356d79ba32e 802-3-ethernet -- testeth3 d6ce347d-6c1b-4d00-bff9-f454ac22e0ff 802-3-ethernet -- testeth9 20c225a4-1afd-40e4-8900-f6627e0afaeb 802-3-ethernet -- testeth7 36ed2c17-c049-4915-aad0-5ac15d44a102 802-3-ethernet -- testeth6 1663ecb6-b641-4078-874f-ef2ed8b89f47 802-3-ethernet -- testeth0 84a77dd8-aeb1-4ad1-8fe8-b95a6beab180 802-3-ethernet eth0 testeth1 9e24456f-68dc-4080-aa3f-ee788218992b 802-3-ethernet -- team0.1 04e25d44-d643-4443-bd8f-1d272c6055cd 802-3-ethernet -- team0.0 d5cc24dc-f517-4a6e-8689-a769ea5433c9 802-3-ethernet -- team0 94b51acb-3a2f-4a58-829f-f6ab3c8d4fa4 team -- testeth4 fca7ef19-7396-42b9-8840-5aea66cfceb6 802-3-ethernet -- [root@ibm-p8-02-lp8 ~]# nmcli d DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 eth1 ethernet connecting (getting IP configuration) team0.0 eth2 ethernet connecting (getting IP configuration) team0.1 eth10 ethernet disconnected -- eth3 ethernet disconnected -- eth4 ethernet disconnected -- eth5 ethernet disconnected -- eth6 ethernet disconnected -- eth7 ethernet disconnected -- eth8 ethernet disconnected -- eth9 ethernet disconnected -- lo loopback unmanaged If we delete all team profiles, we still have nmcli device claim the profiles are connecting. [root@ibm-p8-02-lp8 nmcli]# nmcli con del team0 team0.0 team0.1 Connection 'team0' (94b51acb-3a2f-4a58-829f-f6ab3c8d4fa4) successfully deleted. Connection 'team0.0' (d5cc24dc-f517-4a6e-8689-a769ea5433c9) successfully deleted. Connection 'team0.1' (04e25d44-d643-4443-bd8f-1d272c6055cd) successfully deleted. [root@ibm-p8-02-lp8 nmcli]# nmcli c NAME UUID TYPE DEVICE testeth10 39c89451-a220-4867-86ed-69cb2ad7fd45 802-3-ethernet -- testeth5 60bf19a7-abe9-4f2c-9a00-885711a996d4 802-3-ethernet -- testeth8 c055d1cd-9d89-49bb-b8d1-6b4e75a02566 802-3-ethernet -- testeth2 b8bcf85b-7221-4df9-a24d-9356d79ba32e 802-3-ethernet -- testeth3 d6ce347d-6c1b-4d00-bff9-f454ac22e0ff 802-3-ethernet -- testeth9 20c225a4-1afd-40e4-8900-f6627e0afaeb 802-3-ethernet -- testeth7 36ed2c17-c049-4915-aad0-5ac15d44a102 802-3-ethernet -- testeth6 1663ecb6-b641-4078-874f-ef2ed8b89f47 802-3-ethernet -- testeth0 84a77dd8-aeb1-4ad1-8fe8-b95a6beab180 802-3-ethernet eth0 testeth1 9e24456f-68dc-4080-aa3f-ee788218992b 802-3-ethernet -- testeth4 fca7ef19-7396-42b9-8840-5aea66cfceb6 802-3-ethernet -- [root@ibm-p8-02-lp8 nmcli]# nmcli device DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 eth1 ethernet connecting (getting IP configuration) team0.0 eth2 ethernet connecting (getting IP configuration) team0.1 eth10 ethernet disconnected -- eth3 ethernet disconnected -- eth4 ethernet disconnected -- eth5 ethernet disconnected -- eth6 ethernet disconnected -- eth7 ethernet disconnected -- eth8 ethernet disconnected -- eth9 ethernet disconnected -- lo loopback unmanaged Version-Release number of selected component (if applicable): From RHEL-7.2 Snap4 Additional info: Debug log is attached. This issue suggests possible incorrect handling of failing connections.