Bug 2076131

Summary: Lose unmanaged port when rollback a linux bridge
Product: Red Hat Enterprise Linux 8 Reporter: Gris Ge <fge>
Component: NetworkManagerAssignee: Lubomir Rintel <lrintel>
Status: CLOSED ERRATA QA Contact: Vladimir Benes <vbenes>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 8.6CC: acabral, bgalvani, ferferna, lrintel, rkhan, sukulkar, thaller, till, vbenes
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.39.90-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2076132 (view as bug list) Environment:
Last Closed: 2022-11-08 10:10:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2035519    
Bug Blocks: 2076132    
Attachments:
Description Flags
Reproducer script
none
logfile1 none

Description Gris Ge 2022-04-18 03:42:12 UTC
Description of problem:

A linux bridge with unmanaged veth port attached will lose this unmanaged port on checkpoint rollback.


Version-Release number of selected component (if applicable):
NetworkManager-1.36.0-3.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. sudo ./bug.sh
2.
3.

Actual results:

vethtest1 detached from linux bridge

Expected results:

vethtest1 is still from linux bridge

Additional info:

Comment 1 Gris Ge 2022-04-18 03:42:44 UTC
Created attachment 1873178 [details]
Reproducer script

Comment 2 Thomas Haller 2022-04-19 09:35:07 UTC
Created attachment 1873483 [details]
logfile1

ran script from comment 1.
against NM from upstream main (a1ff31db3b473ccc35f754265395c6dee0e3926c).

Comment 3 Thomas Haller 2022-04-19 09:52:44 UTC
<info>  [1650359641.5058] audit: op="connection-activate" uuid="f187936a-ccac-4cc3-a1e4-78fd1bbdba71" name="brtest0" pid=2590 uid=0 result="success"
...
<debug> [1650359641.5263] platform: (vethtest1) link: releasing 17 from master 'brtest0' (18)
...
<info>  [1650359644.1261] audit: op="checkpoint-rollback" arg="/org/freedesktop/NetworkManager/Checkpoint/2" pid=2604 uid=0 result="success"


we see that the external port gets detached while reactivating the device. During rollback, it doesn't happen.


Via email, this issue was discussed is relation to bug 2035519. The error picture there was different, see log at https://bugzilla.redhat.com/show_bug.cgi?id=2035519#c10 . There, the port was attached during rollback.


It seems expected that `nmcli connection up` (which does a full (re)activation detaches unknown ports).
It also seems expected, that rollback does not restore ports that it knows nothing about. Note that the fix for bug 2035519 does not make `CheckpointCreate()` remember external ports and restore it. Instead, it makes that `CheckpointRollback()` preserves currently attached, external ports.

If `CheckpointCreate()` would remember external ports (and later restore), it would still not fix certain usecases:

1) in the reproducer, if you wouldn't do the rollback (because a success configuration), then the external ports are already lost during `nmcli connection up`.

2) in the reproducer, if any external ports are attached after the first CheckpointCreate(), they would not be restored during rollback, because they were not present when creating the checkpoint. If the original problems is that detaching external ports can break running containers, then this would still break containers that were started after CheckpointCreate.



Maybe `nmcli connection up` of a bridge profile should leave external ports attached. Which is not what was discussed in bug 2035519 and is unrelated to rollback. It also seems rather ugly to do that conceptually (that `nmcli connection up` does not bring the device is a fully known state -- including dropping unknown ports).



this bug report discusses very little about the motivation or use-case of the problem. So it's unclear what problem we are trying to fix.
With respect to the reproducer script, NM does what it's implemented to do.

Comment 4 Thomas Haller 2022-04-19 10:12:07 UTC
> There, the port was attached during rollback.

s/attached/detached/

Comment 5 Thomas Haller 2022-04-20 10:01:42 UTC
(In reply to Thomas Haller from comment #3)
> this bug report discusses very little about the motivation or use-case of
> the problem. So it's unclear what problem we are trying to fix.
> With respect to the reproducer script, NM does what it's implemented to do.

Linking to https://bugzilla.redhat.com/show_bug.cgi?id=2035519#c5 is not sufficient.
For one, the scenario shown there (in form of the log and what is dicussed) is different from what the reproducer script here does.

It says:

> The desired behaviour for our use case is to put the bridge to the state at the time the Checkpoint/85 was captured.

but what about the 2 problems (comment 3)?


The question is still, why are you re-activating the bridge? Isn't that already a very disruptive operation, that must not be done while there are containers attached?
Maybe ActivateConnection() should have a mode to preserve attached, external ports...

Comment 6 Gris Ge 2022-05-17 07:43:48 UTC
The detail use case and original reporter is at bug 2035519 .

Please check with engineers in that bug.

Comment 10 Vladimir Benes 2022-06-14 09:31:37 UTC
[root@gsm-r5s8-01 NetworkManager-ci]# rpm -q NetworkManager
NetworkManager-1.39.6-1.el8.x86_64
[root@gsm-r5s8-01 NetworkManager-ci]# sh bug.sh 
[root@gsm-r5s8-01 NetworkManager-ci]# nmcli  device 
DEVICE        TYPE      STATE         CONNECTION 
eth0          ethernet  connected     testeth0   
brtest0       bridge    connected     brtest0    
vethtest0     ethernet  connected     vethtest0  
eth1          ethernet  disconnected  --         
eth10         ethernet  disconnected  --         
eth2          ethernet  disconnected  --         
eth3          ethernet  disconnected  --         
eth4          ethernet  disconnected  --         
eth5          ethernet  disconnected  --         
eth6          ethernet  disconnected  --         
eth7          ethernet  disconnected  --         
eth8          ethernet  disconnected  --         
eth9          ethernet  disconnected  --         
vethtest0.ep  ethernet  unmanaged     --         
vethtest1     ethernet  unmanaged     --         
vethtest1.ep  ethernet  unmanaged     --         
lo            loopback  unmanaged     --  

and 
[root@gsm-r5s8-01 NetworkManager-ci]# rpm -q NetworkManager
NetworkManager-1.36.0-3.el8.x86_64
[root@gsm-r5s8-01 NetworkManager-ci]# sh bug.sh 
[root@gsm-r5s8-01 NetworkManager-ci]# nmcli  device 
DEVICE        TYPE      STATE         CONNECTION 
eth0          ethernet  connected     testeth0   
brtest0       bridge    connected     brtest0    
vethtest0     ethernet  connected     vethtest0  
eth1          ethernet  disconnected  --         
eth10         ethernet  disconnected  --         
eth2          ethernet  disconnected  --         
eth3          ethernet  disconnected  --         
eth4          ethernet  disconnected  --         
eth5          ethernet  disconnected  --         
eth6          ethernet  disconnected  --         
eth7          ethernet  disconnected  --         
eth8          ethernet  disconnected  --         
eth9          ethernet  disconnected  --         
vethtest0.ep  ethernet  unmanaged     --         
vethtest1     ethernet  unmanaged     --         
vethtest1.ep  ethernet  unmanaged     --         
lo            loopback  unmanaged     -- 


no idea where the difference is, I tend to move back to ASSIGNED

Comment 11 Lubomir Rintel 2022-06-16 11:29:47 UTC
Please drop all occurrences of "2>/dev/null" from bug.sh,
so that we can see why didn't nmstate succeed.

Here's more minimal testcase that exercises what has been changed:

  $ nmcli c add type bridge con-name xbr0 ifname xbr0
  $ nmcli c modify xbr0 mtu 666
  $ nmcli d reapply xbr0

Should fail with the old build and succeed with the new one.

Comment 13 Vladimir Benes 2022-07-18 11:27:26 UTC
the new version of the test correctly executed with 1.39.10-1

Comment 14 Gris Ge 2022-07-26 02:39:59 UTC
The reproducer in comment #0 failed on NetworkManager-1.39.10-30745.copr.8e8fed433f.el9.x86_64

Please investigate!

Comment 15 Thomas Haller 2022-07-28 12:23:42 UTC
back to assigned base on comment 14

Comment 25 errata-xmlrpc 2022-11-08 10:10:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7680

Comment 26 sfaye 2022-12-05 11:31:15 UTC
*** Bug 2076132 has been marked as a duplicate of this bug. ***