Bug 2076131
| Summary: | Lose unmanaged port when rollback a linux bridge | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Gris Ge <fge> | ||||||
| Component: | NetworkManager | Assignee: | Lubomir Rintel <lrintel> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Vladimir Benes <vbenes> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 8.6 | CC: | acabral, bgalvani, ferferna, lrintel, rkhan, sukulkar, thaller, till, vbenes | ||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | NetworkManager-1.39.90-1.el8 | Doc Type: | No Doc Update | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 2076132 (view as bug list) | Environment: | |||||||
| Last Closed: | 2022-11-08 10:10:31 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 2035519 | ||||||||
| Bug Blocks: | 2076132 | ||||||||
| Attachments: |
|
||||||||
|
Description
Gris Ge
2022-04-18 03:42:12 UTC
Created attachment 1873178 [details]
Reproducer script
Created attachment 1873483 [details] logfile1 ran script from comment 1. against NM from upstream main (a1ff31db3b473ccc35f754265395c6dee0e3926c). <info> [1650359641.5058] audit: op="connection-activate" uuid="f187936a-ccac-4cc3-a1e4-78fd1bbdba71" name="brtest0" pid=2590 uid=0 result="success" ... <debug> [1650359641.5263] platform: (vethtest1) link: releasing 17 from master 'brtest0' (18) ... <info> [1650359644.1261] audit: op="checkpoint-rollback" arg="/org/freedesktop/NetworkManager/Checkpoint/2" pid=2604 uid=0 result="success" we see that the external port gets detached while reactivating the device. During rollback, it doesn't happen. Via email, this issue was discussed is relation to bug 2035519. The error picture there was different, see log at https://bugzilla.redhat.com/show_bug.cgi?id=2035519#c10 . There, the port was attached during rollback. It seems expected that `nmcli connection up` (which does a full (re)activation detaches unknown ports). It also seems expected, that rollback does not restore ports that it knows nothing about. Note that the fix for bug 2035519 does not make `CheckpointCreate()` remember external ports and restore it. Instead, it makes that `CheckpointRollback()` preserves currently attached, external ports. If `CheckpointCreate()` would remember external ports (and later restore), it would still not fix certain usecases: 1) in the reproducer, if you wouldn't do the rollback (because a success configuration), then the external ports are already lost during `nmcli connection up`. 2) in the reproducer, if any external ports are attached after the first CheckpointCreate(), they would not be restored during rollback, because they were not present when creating the checkpoint. If the original problems is that detaching external ports can break running containers, then this would still break containers that were started after CheckpointCreate. Maybe `nmcli connection up` of a bridge profile should leave external ports attached. Which is not what was discussed in bug 2035519 and is unrelated to rollback. It also seems rather ugly to do that conceptually (that `nmcli connection up` does not bring the device is a fully known state -- including dropping unknown ports). this bug report discusses very little about the motivation or use-case of the problem. So it's unclear what problem we are trying to fix. With respect to the reproducer script, NM does what it's implemented to do. > There, the port was attached during rollback.
s/attached/detached/
(In reply to Thomas Haller from comment #3) > this bug report discusses very little about the motivation or use-case of > the problem. So it's unclear what problem we are trying to fix. > With respect to the reproducer script, NM does what it's implemented to do. Linking to https://bugzilla.redhat.com/show_bug.cgi?id=2035519#c5 is not sufficient. For one, the scenario shown there (in form of the log and what is dicussed) is different from what the reproducer script here does. It says: > The desired behaviour for our use case is to put the bridge to the state at the time the Checkpoint/85 was captured. but what about the 2 problems (comment 3)? The question is still, why are you re-activating the bridge? Isn't that already a very disruptive operation, that must not be done while there are containers attached? Maybe ActivateConnection() should have a mode to preserve attached, external ports... The detail use case and original reporter is at bug 2035519 . Please check with engineers in that bug. [root@gsm-r5s8-01 NetworkManager-ci]# rpm -q NetworkManager NetworkManager-1.39.6-1.el8.x86_64 [root@gsm-r5s8-01 NetworkManager-ci]# sh bug.sh [root@gsm-r5s8-01 NetworkManager-ci]# nmcli device DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 brtest0 bridge connected brtest0 vethtest0 ethernet connected vethtest0 eth1 ethernet disconnected -- eth10 ethernet disconnected -- eth2 ethernet disconnected -- eth3 ethernet disconnected -- eth4 ethernet disconnected -- eth5 ethernet disconnected -- eth6 ethernet disconnected -- eth7 ethernet disconnected -- eth8 ethernet disconnected -- eth9 ethernet disconnected -- vethtest0.ep ethernet unmanaged -- vethtest1 ethernet unmanaged -- vethtest1.ep ethernet unmanaged -- lo loopback unmanaged -- and [root@gsm-r5s8-01 NetworkManager-ci]# rpm -q NetworkManager NetworkManager-1.36.0-3.el8.x86_64 [root@gsm-r5s8-01 NetworkManager-ci]# sh bug.sh [root@gsm-r5s8-01 NetworkManager-ci]# nmcli device DEVICE TYPE STATE CONNECTION eth0 ethernet connected testeth0 brtest0 bridge connected brtest0 vethtest0 ethernet connected vethtest0 eth1 ethernet disconnected -- eth10 ethernet disconnected -- eth2 ethernet disconnected -- eth3 ethernet disconnected -- eth4 ethernet disconnected -- eth5 ethernet disconnected -- eth6 ethernet disconnected -- eth7 ethernet disconnected -- eth8 ethernet disconnected -- eth9 ethernet disconnected -- vethtest0.ep ethernet unmanaged -- vethtest1 ethernet unmanaged -- vethtest1.ep ethernet unmanaged -- lo loopback unmanaged -- no idea where the difference is, I tend to move back to ASSIGNED Please drop all occurrences of "2>/dev/null" from bug.sh, so that we can see why didn't nmstate succeed. Here's more minimal testcase that exercises what has been changed: $ nmcli c add type bridge con-name xbr0 ifname xbr0 $ nmcli c modify xbr0 mtu 666 $ nmcli d reapply xbr0 Should fail with the old build and succeed with the new one. verified: tested https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci/-/merge_requests/1108 the new version of the test correctly executed with 1.39.10-1 The reproducer in comment #0 failed on NetworkManager-1.39.10-30745.copr.8e8fed433f.el9.x86_64 Please investigate! back to assigned base on comment 14 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7680 *** Bug 2076132 has been marked as a duplicate of this bug. *** |