Bug 2029937

Summary: [RHEL-9]: OVS configuration created via NMCLI intermittently disappears after reboot, power_cycle, etc.
Product: Red Hat Enterprise Linux 9 Reporter: Rick Alongi <ralongi>
Component: NetworkManagerAssignee: Fernando F. Mancera <ferferna>
Status: CLOSED ERRATA QA Contact: Vladimir Benes <vbenes>
Severity: unspecified Docs Contact:
Priority: high    
Version: 9.0CC: bgalvani, ctrautma, ferferna, fge, fpokryvk, lrintel, rkhan, sukulkar, thaller, till, vbenes
Target Milestone: rcKeywords: Regression, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: NetworkManager-1.36.0-0.8.el9 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-17 15:48:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ovs logs
none
Reproducer script none

Description Rick Alongi 2021-12-07 15:43:37 UTC
Created attachment 1845087 [details]
ovs logs

Created attachment 1845087 [details]
ovs logs

Description of problem:
OVS configuration created via NMCLI intermittently disappears after reboot, power_cycle, etc.

Version-Release number of selected component (if applicable):
RHEL-9.0.0-20211121.7
Kernel: 5.14.0-17.el9.x86_64

[root@netqe9 ~]# rpm -qa | grep -i networkmanager
NetworkManager-libnm-1.36.0-0.1.el9.x86_64
NetworkManager-1.36.0-0.1.el9.x86_64
NetworkManager-team-1.36.0-0.1.el9.x86_64
NetworkManager-wifi-1.36.0-0.1.el9.x86_64
NetworkManager-wwan-1.36.0-0.1.el9.x86_64
NetworkManager-bluetooth-1.36.0-0.1.el9.x86_64
NetworkManager-adsl-1.36.0-0.1.el9.x86_64
NetworkManager-tui-1.36.0-0.1.el9.x86_64
NetworkManager-config-server-1.36.0-0.1.el9.noarch
NetworkManager-ovs-1.36.0-0.1.el9.x86_64

How reproducible:
Almost 100%

Steps to Reproduce:
1. Provision single system with RHEL-9.0.0-20211121.7, install OVS 2.15 or 2.16, install NetworkManager-ovs 
2. Configure OVS using nmcli
3. Reboot or power cycle system
4. When system comes back up, config created using nmcli is sometimes missing
5. Detailed steps listed below under "Additional info" section

Actual results:
OVS configuration created using nmcli is sometimes missing after reboot or power cycle.

Expected results:
OVS configuration created using nmcli is always persisent after reboot or power cycle.

Additional info:

Notes:

- Problem observed when using compose RHEL-9.0.0-20211121.7 but not with RHEL-9.0.0-20211007.7
- Problem observed using both OVS 2.15 and 2.16 with RHEL-9.0.0-20211121.7
- Problem NOT observed using both OVS 2.15 and 2.16 with RHEL-9.0.0-20211007.7
- Although this example shows ovsbr2 with VLAN going missing, it has also been observed that ovsbr1 will be the config missing after reboot and ovsbr2 with VLAN will be intact so it is not just related to VLANs.
- Using ovs-vsctl to create bridge always results in that bridge remaining intact.
- OVS log files attached to BZ (ovs_logs.tar.gz)
- Failed beaker job example: https://beaker.engineering.redhat.com/jobs/6053486
- sos report: http://netqe-infra01.knqe.lab.eng.bos.redhat.com/sosreports/sosreport-netqe9-2021-12-07-whtihop.tar.xz

Steps:

- provision system with RHEL-9.0.0-20211121.7

- yum -y install http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch-selinux-extra-policy/1.0/30.el9fdp/noarch/openvswitch-selinux-extra-policy-1.0-30.el9fdp.noarch.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.16/2.16.0/27.el9fdp/x86_64/openvswitch2.16-2.16.0-27.el9fdp.x86_64.rpm

- systemctl enable openvswitch && systemctl start openvswitch

- yum -y install NetworkManager-ovs
- systemctl daemon-reload
- systemctl restart NetworkManager

ovsbr0="ovsbr0"
ovsbr1="ovsbr1"
ovsbr2="ovsbr2"
vm_name="g0"
vlan_id=10

JOBID=6068077
if [ -z "$JOBID" ]; then
	ipaddr=120
else
	ipaddr=$((JOBID % 100 + 20))
fi

ovsbr0_ip4addr=192.168.$((ipaddr + 0)).2
ovsbr0_ip6addr=2014:$((ipaddr + 0))::2
ovsbr1_ip4addr=192.168.$((ipaddr + 20)).2
ovsbr1_ip6addr=2014:$((ipaddr + 20))::2
ovsbr2_ip4addr=192.168.$((ipaddr + 40)).2
ovsbr2_ip6addr=2014:$((ipaddr + 40))::2 

# ovsbr0 config:
ovs-vsctl --if-exists del-br $ovsbr0
ovs-vsctl add-br ovsbr0

# ovsbr1 config:
ovs-vsctl --if-exists del-br $ovsbr1
nmcli c add type ovs-bridge conn.interface $ovsbr1 con-name $ovsbr1
nmcli c add type ovs-port conn.interface $ovsbr1 master $ovsbr1 con-name ovs-port-$ovsbr1
nmcli c add type ovs-interface slave-type ovs-port conn.interface $ovsbr1 master ovs-port-$ovsbr1  con-name ovs-if-$ovsbr1 ipv4.method static ipv4.address $ovsbr1_ip4addr/24 ipv6.method static ipv6.address $ovsbr1_ip6addr/64
nmcli con up ovs-if-$ovsbr1
nmcli con up ovs-port-$ovsbr1
nmcli con up $ovsbr1

# ovsbr2 config:
ovs-vsctl --if-exists del-br $ovsbr2
nmcli c add type ovs-bridge conn.interface $ovsbr2 con-name $ovsbr2
nmcli c add type ovs-port conn.interface vlan$vlan_id master $ovsbr2 ovs-port.tag $vlan_id con-name ovs-port-vlan$vlan_id
nmcli c add type ovs-interface slave-type ovs-port conn.interface vlan$vlan_id master ovs-port-vlan$vlan_id con-name ovs-if-vlan$vlan_id ipv4.method static ipv4.address $ovsbr2_ip4addr/24 ipv6.method static ipv6.address $ovsbr2_ip6addr/64
nmcli con up ovs-if-vlan$vlan_id
nmcli con up ovs-port-vlan$vlan_id
nmcli con up $ovsbr2

[root@netqe9 ~]# rpm -qa | grep -i networkmanager
NetworkManager-libnm-1.36.0-0.1.el9.x86_64
NetworkManager-1.36.0-0.1.el9.x86_64
NetworkManager-team-1.36.0-0.1.el9.x86_64
NetworkManager-wifi-1.36.0-0.1.el9.x86_64
NetworkManager-wwan-1.36.0-0.1.el9.x86_64
NetworkManager-bluetooth-1.36.0-0.1.el9.x86_64
NetworkManager-adsl-1.36.0-0.1.el9.x86_64
NetworkManager-tui-1.36.0-0.1.el9.x86_64
NetworkManager-config-server-1.36.0-0.1.el9.noarch
NetworkManager-ovs-1.36.0-0.1.el9.x86_64

# Before reboot:

[root@netqe9 ~]# ovs-vsctl show | grep vlan
        Port vlan10
            Interface vlan10
[root@netqe9 ~]# ovs-vsctl show | grep ovsbr0
    Bridge ovsbr0
        Port ovsbr0
            Interface ovsbr0
[root@netqe9 ~]# ovs-vsctl show | grep ovsbr1
    Bridge ovsbr1
        Port ovsbr1
            Interface ovsbr1
[root@netqe9 ~]# ovs-vsctl show | grep ovsbr2
    Bridge ovsbr2
[root@netqe9 ~]# ovs-vsctl show
b6153de9-af5a-4518-817e-2f21c93a57dc
    Bridge ovsbr0
        Port ovsbr0
            Interface ovsbr0
                type: internal
    Bridge ovsbr2
        Port vlan10
            tag: 10
            Interface vlan10
                type: internal
    Bridge ovsbr1
        Port ovsbr1
            Interface ovsbr1
                type: internal
    ovs_version: "2.16.2"
[root@netqe9 ~]# ip a | grep $ovsbr1_ip4addr
    inet 192.168.117.2/24 brd 192.168.117.255 scope global noprefixroute ovsbr1
[root@netqe9 ~]# ip a | grep $ovsbr1_ip6addr
    inet6 2014:117::2/64 scope global noprefixroute 
[root@netqe9 ~]# ip a | grep $ovsbr2_ip4addr
    inet 192.168.137.2/24 brd 192.168.137.255 scope global noprefixroute vlan10
[root@netqe9 ~]# ip a | grep $ovsbr2_ip6addr
    inet6 2014:137::2/64 scope global noprefixroute 
    
[root@netqe9 ~]# nmcli con show
NAME             UUID                                  TYPE           DEVICE 
eno3             32d4e235-e719-4784-aa04-91c639c98639  ethernet       eno3   
ovs-if-ovsbr1    a4d7ebff-7f81-4cb0-909f-55ba329908c3  ovs-interface  ovsbr1 
ovs-if-vlan10    613922cd-b258-4175-b814-f8a8d4fd16ad  ovs-interface  vlan10 
ovsbr1           10bb209f-e40e-480f-9ab2-a420d48db773  ovs-bridge     ovsbr1 
ovsbr2           5ad2487c-ba78-464f-8da8-22f595cf45a6  ovs-bridge     ovsbr2 
ovs-port-ovsbr1  c341153f-1ee8-427a-88c3-86f27e17c64a  ovs-port       ovsbr1 
ovs-port-vlan10  75cea22e-bdb3-4a16-8673-5ac828bfb1b4  ovs-port       vlan10 
eno1             eff16c0d-fe1b-476e-b6a2-f51de6c2852a  ethernet       --     
eno2             1ac56ee2-db10-4952-8594-d5059f3dda42  ethernet       --     
eno4             ce2bba34-24d4-457c-ae85-f74d390a5817  ethernet       --     
enp130s0f0       41caec6a-c112-4ec9-ad6c-7599094dc882  ethernet       --     
enp130s0f1       57ddc01b-c8ab-4b95-8142-366be3f440d8  ethernet       --     
enp132s0f0       68d631e8-8054-4c37-8c1f-cef9dc4bc54d  ethernet       --     
enp132s0f1       061f7e45-2421-4dfb-b0b9-5cacb3d21076  ethernet       --     
enp4s0f0         833aae82-c5c8-4776-a66b-fdcf1889cde2  ethernet       --     
enp4s0f1         c2ea27d2-1ef1-4450-93e7-00b2eeae120e  ethernet       --     
[root@netqe9 ~]# 
    
# After reboot (note that ovsbr2 config is now missing):

[root@netqe9 ~]# ovs-vsctl show | grep vlan
[root@netqe9 ~]# ovs-vsctl show | grep ovsbr0
    Bridge ovsbr0
        Port ovsbr0
            Interface ovsbr0
[root@netqe9 ~]# ovs-vsctl show | grep ovsbr1
    Bridge ovsbr1
        Port ovsbr1
            Interface ovsbr1
[root@netqe9 ~]# ovs-vsctl show | grep ovsbr2
[root@netqe9 ~]# ovs-vsctl show
b6153de9-af5a-4518-817e-2f21c93a57dc
    Bridge ovsbr0
        Port ovsbr0
            Interface ovsbr0
                type: internal
    Bridge ovsbr1
        Port ovsbr1
            Interface ovsbr1
                type: internal
    ovs_version: "2.16.2"
[root@netqe9 ~]# ip a | grep $ovsbr1_ip4addr
    inet 192.168.117.2/24 brd 192.168.117.255 scope global noprefixroute ovsbr1
[root@netqe9 ~]# ip a | grep $ovsbr1_ip6addr
    inet6 2014:117::2/64 scope global noprefixroute 
[root@netqe9 ~]# ip a | grep $ovsbr2_ip4addr
[root@netqe9 ~]# ip a | grep $ovsbr2_ip6addr
[root@netqe9 ~]# 

[root@netqe9 ~]# nmcli con show
NAME             UUID                                  TYPE           DEVICE 
eno3             32d4e235-e719-4784-aa04-91c639c98639  ethernet       eno3   
ovs-if-ovsbr1    a4d7ebff-7f81-4cb0-909f-55ba329908c3  ovs-interface  ovsbr1 
ovsbr1           10bb209f-e40e-480f-9ab2-a420d48db773  ovs-bridge     ovsbr1 
ovsbr2           5ad2487c-ba78-464f-8da8-22f595cf45a6  ovs-bridge     ovsbr2 
ovs-port-ovsbr1  c341153f-1ee8-427a-88c3-86f27e17c64a  ovs-port       ovsbr1 
ovs-port-vlan10  75cea22e-bdb3-4a16-8673-5ac828bfb1b4  ovs-port       vlan10 
eno1             eff16c0d-fe1b-476e-b6a2-f51de6c2852a  ethernet       --     
eno2             1ac56ee2-db10-4952-8594-d5059f3dda42  ethernet       --     
eno4             ce2bba34-24d4-457c-ae85-f74d390a5817  ethernet       --     
enp130s0f0       41caec6a-c112-4ec9-ad6c-7599094dc882  ethernet       --     
enp130s0f1       57ddc01b-c8ab-4b95-8142-366be3f440d8  ethernet       --     
enp132s0f0       68d631e8-8054-4c37-8c1f-cef9dc4bc54d  ethernet       --     
enp132s0f1       061f7e45-2421-4dfb-b0b9-5cacb3d21076  ethernet       --     
enp4s0f0         833aae82-c5c8-4776-a66b-fdcf1889cde2  ethernet       --     
enp4s0f1         c2ea27d2-1ef1-4450-93e7-00b2eeae120e  ethernet       --     
ovs-if-vlan10    613922cd-b258-4175-b814-f8a8d4fd16ad  ovs-interface  --

# Reboot once more and problem is no longer present:

[root@netqe9 ~]# ovs-vsctl show | grep vlan
        Port vlan10
            Interface vlan10
[root@netqe9 ~]# ovs-vsctl show | grep ovsbr0
    Bridge ovsbr0
        Port ovsbr0
            Interface ovsbr0
[root@netqe9 ~]# ovs-vsctl show | grep ovsbr1
    Bridge ovsbr1
        Port ovsbr1
            Interface ovsbr1
[root@netqe9 ~]# ovs-vsctl show | grep ovsbr2
    Bridge ovsbr2
[root@netqe9 ~]# ovs-vsctl show
b6153de9-af5a-4518-817e-2f21c93a57dc
    Bridge ovsbr0
        Port ovsbr0
            Interface ovsbr0
                type: internal
    Bridge ovsbr2
        Port vlan10
            tag: 10
            Interface vlan10
                type: internal
    Bridge ovsbr1
        Port ovsbr1
            Interface ovsbr1
                type: internal
    ovs_version: "2.16.2"
[root@netqe9 ~]# ip a | grep $ovsbr1_ip4addr
    inet 192.168.117.2/24 brd 192.168.117.255 scope global noprefixroute ovsbr1
[root@netqe9 ~]# ip a | grep $ovsbr1_ip6addr
    inet6 2014:117::2/64 scope global noprefixroute 
[root@netqe9 ~]# ip a | grep $ovsbr2_ip4addr
    inet 192.168.137.2/24 brd 192.168.137.255 scope global noprefixroute vlan10
[root@netqe9 ~]# ip a | grep $ovsbr2_ip6addr
    inet6 2014:137::2/64 scope global noprefixroute 
[root@netqe9 ~]# nmcli con show
NAME             UUID                                  TYPE           DEVICE 
eno3             32d4e235-e719-4784-aa04-91c639c98639  ethernet       eno3   
ovs-if-ovsbr1    a4d7ebff-7f81-4cb0-909f-55ba329908c3  ovs-interface  ovsbr1 
ovs-if-vlan10    613922cd-b258-4175-b814-f8a8d4fd16ad  ovs-interface  vlan10 
ovsbr1           10bb209f-e40e-480f-9ab2-a420d48db773  ovs-bridge     ovsbr1 
ovsbr2           5ad2487c-ba78-464f-8da8-22f595cf45a6  ovs-bridge     ovsbr2 
ovs-port-ovsbr1  c341153f-1ee8-427a-88c3-86f27e17c64a  ovs-port       ovsbr1 
ovs-port-vlan10  75cea22e-bdb3-4a16-8673-5ac828bfb1b4  ovs-port       vlan10 
eno1             eff16c0d-fe1b-476e-b6a2-f51de6c2852a  ethernet       --     
eno2             1ac56ee2-db10-4952-8594-d5059f3dda42  ethernet       --     
eno4             ce2bba34-24d4-457c-ae85-f74d390a5817  ethernet       --     
enp130s0f0       41caec6a-c112-4ec9-ad6c-7599094dc882  ethernet       --     
enp130s0f1       57ddc01b-c8ab-4b95-8142-366be3f440d8  ethernet       --     
enp132s0f0       68d631e8-8054-4c37-8c1f-cef9dc4bc54d  ethernet       --     
enp132s0f1       061f7e45-2421-4dfb-b0b9-5cacb3d21076  ethernet       --     
enp4s0f0         833aae82-c5c8-4776-a66b-fdcf1889cde2  ethernet       --     
enp4s0f1         c2ea27d2-1ef1-4450-93e7-00b2eeae120e  ethernet       --     
[root@netqe9 ~]#

Comment 1 Rick Alongi 2021-12-08 12:13:15 UTC
Please note there is a typo in comment #0 for the Notes in the Additional info section.  The second note should read "Problem observed using both OVS 2.15 and 2.16 with RHEL-9.0.0-20211121.7".

Comment 3 Beniamino Galvani 2021-12-14 08:45:33 UTC
Please enable NM trace logging by setting level=TRACE in the [logging] section of /etc/NetworkManager/NetworkManager.conf, then restart NM, reproduce the issue, and attach the output of 'journalctl -b'.

Comment 5 Rick Alongi 2021-12-14 13:48:40 UTC
Reproduced issue with NetworkManager TRACE enabled.

Compose: RHEL-9.0.0-20211213.3
Kernel: 5.14.0-29.el9.x86_64

[root@netqe9 ~]# rpm -qa | grep -i networkmanager
NetworkManager-libnm-1.36.0-0.2.el9.x86_64
NetworkManager-1.36.0-0.2.el9.x86_64
NetworkManager-team-1.36.0-0.2.el9.x86_64
NetworkManager-wifi-1.36.0-0.2.el9.x86_64
NetworkManager-wwan-1.36.0-0.2.el9.x86_64
NetworkManager-bluetooth-1.36.0-0.2.el9.x86_64
NetworkManager-adsl-1.36.0-0.2.el9.x86_64
NetworkManager-tui-1.36.0-0.2.el9.x86_64
NetworkManager-config-server-1.36.0-0.2.el9.noarch
NetworkManager-ovs-1.36.0-0.2.el9.x86_64

Output from journalctl -b attached to this BZ as journalctl.log.

Comment 6 Fernando F. Mancera 2022-01-20 17:38:29 UTC
(In reply to Rick Alongi from comment #5)
> Reproduced issue with NetworkManager TRACE enabled.
> 
> Compose: RHEL-9.0.0-20211213.3
> Kernel: 5.14.0-29.el9.x86_64
> 
> [root@netqe9 ~]# rpm -qa | grep -i networkmanager
> NetworkManager-libnm-1.36.0-0.2.el9.x86_64
> NetworkManager-1.36.0-0.2.el9.x86_64
> NetworkManager-team-1.36.0-0.2.el9.x86_64
> NetworkManager-wifi-1.36.0-0.2.el9.x86_64
> NetworkManager-wwan-1.36.0-0.2.el9.x86_64
> NetworkManager-bluetooth-1.36.0-0.2.el9.x86_64
> NetworkManager-adsl-1.36.0-0.2.el9.x86_64
> NetworkManager-tui-1.36.0-0.2.el9.x86_64
> NetworkManager-config-server-1.36.0-0.2.el9.noarch
> NetworkManager-ovs-1.36.0-0.2.el9.x86_64
> 
> Output from journalctl -b attached to this BZ as journalctl.log.

I have been able to reproduce this in RHEL 8.6 as well. It seems a regression from NetworkManager-1.36.0-0.1+. I am working on it. Thanks for reporting.

Comment 8 Beniamino Galvani 2022-01-31 12:53:44 UTC
Created attachment 1858054 [details]
Reproducer script

This script sets up a scenario similar to the one in the description and performs a (simulated) reboot until a bridge is missing.

Without commit https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/3034b99c00a5662b0499bec88ad1193235629dc7 , the script fails after 1-2 iterations.

After the fix, I ran the script for 100 iterations without any failure.

Comment 14 errata-xmlrpc 2022-05-17 15:48:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: NetworkManager), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:3915