Bug 2149012
Summary: | NM brings down interfaces attached to a ovs bridge after "nmcli networking off/on" | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Beniamino Galvani <bgalvani> |
Component: | NetworkManager | Assignee: | Fernando F. Mancera <ferferna> |
Status: | CLOSED ERRATA | QA Contact: | Vladimir Benes <vbenes> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | 9.2 | CC: | bgalvani, blitton, lrintel, palonsor, pdiak, rkhan, rravaiol, sfaye, sukulkar, till, tkondvil, vbenes |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | NetworkManager-1.43.10-1.el9 | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-11-07 08:37:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Beniamino Galvani
2022-11-28 14:47:06 UTC
Initially, the vxlan is in disconnected state and is considered 'external'. [1669644967.4654] device (vxlan1): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'external') The problem is that after toggling networking, the 'external' state is lost and the device becomes fully managed. [1669644995.0894] device (vxlan1): state change: disconnected -> unmanaged (reason 'sleeping', sys-iface-state: 'external') [1669645001.8996] device (vxlan1): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') [1669645001.9245] device (vxlan1): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed') At this point a "networking off" will bring the interface down. after a certain amount of repetitions, I still see missing LOWER_UP see attachment > adding may_fail tag to the ovs_vxlan_networking_off_on test
I couldn't reproduce the new failure with the NMCI test, but according to logs it seems caused by a race condition in NM that makes the external device fully managed by NM
By stopping and resuming NM at the right time the issue is 100% reproducible:
# Temporarily stop NetworkManager to trigger the race condition, which
# happens when NM detects the interface already attached to the OVS
# bridge and already announced by udev.
killall -STOP NetworkManager
ip link add vxlan1 type vxlan remote 172.25.12.1 id 120 dstport 0
ip link set vxlan1 up
ovs-vsctl add-br br1
ovs-vsctl add-port br1 vxlan1
sleep .4
killall -CONT NetworkManager
ovs-vsctl show
ip link show vxlan1
nmcli networking off
nmcli networking on
sleep 1
nmcli networking off
nmcli networking on
ovs-vsctl show
ip link show vxlan1
# vxlan1 is DOWN now
working well, moving to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:6585 |