Bug 2149012
| Summary: | NM brings down interfaces attached to a ovs bridge after "nmcli networking off/on" | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Beniamino Galvani <bgalvani> |
| Component: | NetworkManager | Assignee: | Fernando F. Mancera <ferferna> |
| Status: | VERIFIED --- | QA Contact: | Vladimir Benes <vbenes> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | 9.2 | CC: | bgalvani, blitton, lrintel, palonsor, pdiak, rkhan, rravaiol, sfaye, sukulkar, till, tkondvil, vbenes |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | NetworkManager-1.43.10-1.el9 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Beniamino Galvani
2022-11-28 14:47:06 UTC
Initially, the vxlan is in disconnected state and is considered 'external'. [1669644967.4654] device (vxlan1): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'external') The problem is that after toggling networking, the 'external' state is lost and the device becomes fully managed. [1669644995.0894] device (vxlan1): state change: disconnected -> unmanaged (reason 'sleeping', sys-iface-state: 'external') [1669645001.8996] device (vxlan1): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') [1669645001.9245] device (vxlan1): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed') At this point a "networking off" will bring the interface down. after a certain amount of repetitions, I still see missing LOWER_UP see attachment > adding may_fail tag to the ovs_vxlan_networking_off_on test
I couldn't reproduce the new failure with the NMCI test, but according to logs it seems caused by a race condition in NM that makes the external device fully managed by NM
By stopping and resuming NM at the right time the issue is 100% reproducible:
# Temporarily stop NetworkManager to trigger the race condition, which
# happens when NM detects the interface already attached to the OVS
# bridge and already announced by udev.
killall -STOP NetworkManager
ip link add vxlan1 type vxlan remote 172.25.12.1 id 120 dstport 0
ip link set vxlan1 up
ovs-vsctl add-br br1
ovs-vsctl add-port br1 vxlan1
sleep .4
killall -CONT NetworkManager
ovs-vsctl show
ip link show vxlan1
nmcli networking off
nmcli networking on
sleep 1
nmcli networking off
nmcli networking on
ovs-vsctl show
ip link show vxlan1
# vxlan1 is DOWN now
working well, moving to verified |