Bug 2149012

Summary: NM brings down interfaces attached to a ovs bridge after "nmcli networking off/on"
Product: Red Hat Enterprise Linux 9 Reporter: Beniamino Galvani <bgalvani>
Component: NetworkManagerAssignee: Fernando F. Mancera <ferferna>
Status: CLOSED ERRATA QA Contact: Vladimir Benes <vbenes>
Severity: unspecified Docs Contact:
Priority: high    
Version: 9.2CC: bgalvani, blitton, lrintel, palonsor, pdiak, rkhan, rravaiol, sfaye, sukulkar, till, tkondvil, vbenes
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: NetworkManager-1.43.10-1.el9 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:37:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Beniamino Galvani 2022-11-28 14:47:06 UTC
When a virtual interface is created outside of NetworkManager and attached to an ovs bridge, after disabling and re-enabling networking multiple times via "nmcli networking off; nmcli networking on", NetworkManager brings the interface down. This can be reproduced with the following commands:

  ip link add vxlan1 type vxlan remote 172.25.12.1 id 120 dstport 0
  ip link set vxlan1 up
  ovs-vsctl add-br br1
  ovs-vsctl add-port br1 vxlan1

  ovs-vsctl show
  ip link show vxlan1

  nmcli networking off
  nmcli networking on

  sleep 1

  nmcli networking off
  nmcli networking on

  ovs-vsctl show
  ip link show vxlan1

At the end, vxlan1 is down:

  272: vxlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue master ovs-system state DOWN mode DEFAULT group default qlen 1000

The expected result is that the interface is not touched by NM since it was created externally.

Affected versions:
NetworkManager 1.30, NetworkManager 1.40, current git main

Comment 1 Beniamino Galvani 2022-11-28 14:55:29 UTC
Initially, the vxlan is in disconnected state and is considered 'external'.

  [1669644967.4654] device (vxlan1): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'external')

The problem is that after toggling networking, the 'external' state is lost and the device becomes fully managed.

  [1669644995.0894] device (vxlan1): state change: disconnected -> unmanaged (reason 'sleeping', sys-iface-state: 'external')
  [1669645001.8996] device (vxlan1): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
  [1669645001.9245] device (vxlan1): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')

At this point a "networking off" will bring the interface down.

Comment 13 Vladimir Benes 2023-04-20 13:00:06 UTC
after a certain amount of repetitions, I still see missing LOWER_UP
see attachment

Comment 20 Beniamino Galvani 2023-05-29 08:37:44 UTC
> adding may_fail tag to the ovs_vxlan_networking_off_on test

I couldn't reproduce the new failure with the NMCI test, but according to logs it seems caused by a race condition in NM that makes the external device fully managed by NM

By stopping and resuming NM at the right time the issue is 100% reproducible:

  # Temporarily stop NetworkManager to trigger the race condition, which                                                                                       
  # happens when NM detects the interface already attached to the OVS                                                                                          
  # bridge and already announced by udev.
  killall -STOP NetworkManager
  ip link add vxlan1 type vxlan remote 172.25.12.1 id 120 dstport 0
  ip link set vxlan1 up
  ovs-vsctl add-br br1
  ovs-vsctl add-port br1 vxlan1
  sleep .4
  killall -CONT NetworkManager

  ovs-vsctl show
  ip link show vxlan1

  nmcli networking off
  nmcli networking on

  sleep 1

  nmcli networking off
  nmcli networking on

  ovs-vsctl show
  ip link show vxlan1
  # vxlan1 is DOWN now

Comment 24 Vladimir Benes 2023-07-03 14:09:20 UTC
working well, moving to verified

Comment 26 errata-xmlrpc 2023-11-07 08:37:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6585