Bug 1903712

Summary: OVS bridge port is deleted when creating a linux-bridge in an Openshift-SDN cluster
Product: Red Hat Enterprise Linux 8 Reporter: Yossi Segev <ysegev>
Component: nmstateAssignee: Fernando F. Mancera <ferferna>
Status: CLOSED ERRATA QA Contact: Mingyu Shi <mshi>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.4CC: danken, ellorent, ferferna, fge, jiji, jishi, myakove, network-qe, phoracek, till
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: 8.0Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nmstate-1.0.0-1.el8.noarch Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1904889 (view as bug list) Environment:
Last Closed: 2021-05-18 15:17:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1904889    
Attachments:
Description Flags
pre-tested.log none

Description Yossi Segev 2020-12-02 17:02:41 UTC
Description of problem:
When creating a linux-bridge, with one of the node's physical NICs as a port, the port of the OVS bridge (in our case - vxlan_sys_4789 of the ovs-system bridge) is deleted.


Version-Release number of selected component (if applicable):
nmstate-0.3.4-15.el8_3


How reproducible:
Always


Steps to Reproduce:
1. On an Openshift-SDN cluster - enter one of the worker nodes:
[cnv-qe-jenkins@myakove-hhsbc-executor]$ ssh core.1.244
Red Hat Enterprise Linux CoreOS 47.83.202011252347-0
  Part of OpenShift 4.7, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.7/architecture/architecture-rhcos.html

---
Last login: Wed Dec  2 16:21:48 2020 from 192.168.3.62
[systemd]
Failed Units: 1
  NetworkManager-wait-online.service
[core@myakove-hhsbc-worker-0-99vpj ~]$

2. Make sure you can view the vxlan_sys_4789 interface:
[core@myakove-hhsbc-worker-0-99vpj ~]$ ip link show dev vxlan_sys_4789
9: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 02:e7:b3:88:04:6c brd ff:ff:ff:ff:ff:ff

3. Back in the cluster - apply this NNCP:
$ cat << EOF | oc apply -f
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1test-nncp
spec:
  desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port:
        - name: ens9
      ipv4:
        dhcp: false
        enabled: false
      ipv6:
        enabled: false
      name: br1test
      state: up
      type: linux-bridge
  nodeSelector:
    node-role.kubernetes.io/worker: 'myakove26-mh89k-worker-0-5tskb'
EOF

* Change port name (ens9) to whatever physical (secondary NIC) there is on your cluster node.
* Change the value of the nodeSelector to the name of the worker you logged-in in step #1.

4. Login to the node and search for the vxlan_sys_4789 interface again.


Actual results:
Interface is not found anymore.


Expected results:
Interface exists.


Additional info:
For CNV - it prevents creating VMs with multus (secondary NICs). Trying to create a VM fails with an error saying "no trace to route".

Some debug prints we added to nmstatectl show that wrongly it dectect ovs-interfaces as something to be deleted
Unable to use a TTY - input is not a terminal or the right kind of file
2020-12-02 15:13:38,628 root         DEBUG    Async action: Create checkpoint started
2020-12-02 15:13:38,637 root         DEBUG    Checkpoint None created for all devices
2020-12-02 15:13:38,638 root         DEBUG    Async action: Create checkpoint finished
2020-12-02 15:13:38,649 root         DEBUG    Async action: Delete device: br0 started
2020-12-02 15:13:38,650 root         DEBUG    Async action: Delete device: tun0 started
2020-12-02 15:13:38,650 root         DEBUG    Async action: Delete device: vxlan_sys_4789 started
2020-12-02 15:13:38,661 root         DEBUG    Interface is not real anymore: iface=br0
2020-12-02 15:13:38,661 root         DEBUG    Async action: Delete device: br0 finished
2020-12-02 15:13:38,662 root         DEBUG    Interface is not real anymore: iface=tun0
2020-12-02 15:13:38,662 root         DEBUG    Async action: Delete device: tun0 finished
2020-12-02 15:13:38,731 root         DEBUG    Interface is not real anymore: iface=vxlan_sys_4789
2020-12-02 15:13:38,731 root         DEBUG    Async action: Delete device: vxlan_sys_4789 finished
2020-12-02 15:13:38,772 root         DEBUG    Async action: Destroy checkpoint /org/freedesktop/NetworkManager/Checkpoint/1 started
2020-12-02 15:13:38,776 root         DEBUG    Checkpoint /org/freedesktop/NetworkManager/Checkpoint/1 destroyed
2020-12-02 15:13:38,776 root         DEBUG    Async action: Destroy checkpoint /org/freedesktop/NetworkManager/Checkpoint/1 finished
{'name': 'br0', 'type': 'ovs-interface', 'state': 'absent', 'ipv4': {'enabled': False}, 'ipv6': {'enabled': False}, 'mtu': 1400, 'mac-address': '3A:0C:AD:DA:B9:49', 'lldp': {'enabled': False}, 'bridge': {}}
{'name': 'tun0', 'type': 'ovs-interface', 'state': 'absent', 'ipv4': {'enabled': False}, 'ipv6': {'enabled': False}, 'mtu': 1400, 'mac-address': 'FE:8B:E3:38:7A:DC', 'lldp': {'enabled': False}}
{'name': 'vxlan_sys_4789', 'type': 'vxlan', 'state': 'absent', 'ipv4': {'enabled': False}, 'ipv6': {'enabled': False}, 'mtu': 65000, 'mac-address': 'B6:19:D4:BB:DD:95', 'lldp': {'enabled': False}, 'vxlan': {'id': 0, 'base-iface': '', 'remote': '', 'destination-port': 4789}}
Desired state applied: 
---
interfaces:
- name: br1test
  type: linux-bridge
  state: absent
  bridge:
    options:
      stp:
        enabled: false
    port:
    - name: ens9
  ipv4:
    enabled: false
    dhcp: false
  ipv6:
    enabled: false

Comment 1 Fernando F. Mancera 2020-12-07 00:16:03 UTC
Upstream fix https://github.com/nmstate/nmstate/pull/1438

Reproducer:

ovs-vsctl add-br br0 -- set Bridge br0 fail-mode=secure
ovs-vsctl add-port br0 vxlan0 -- set interface vxlan0 type=vxlan options:remote_ip=flow options:key=flow

Then apply any state using nmstate and it will remove unmanaged OVS interfaces.

Comment 4 Mingyu Shi 2020-12-14 08:12:07 UTC
Created attachment 1738872 [details]
pre-tested.log

Tested with versions:
nmstate-1.0.0-1.el8.noarch
nispor-1.0.1-2.el8.x86_64
NetworkManager-1.30.0-0.3.el8.x86_64
DISTRO=RHEL-8.4.0-20201203.n.0
Linux hp-dl380pg8-11.rhts.eng.pek2.redhat.com 4.18.0-257.el8.x86_64 #1 SMP Wed Dec 2 02:01:12 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch
openvswitch2.13-2.13.0-39.el8fdp.x86_64
nmstate-plugin-ovsdb-1.0.0-1.el8.noarch
python3-openvswitch2.13-2.13.0-39.el8fdp.x86_64

Comment 7 Mingyu Shi 2020-12-22 04:12:02 UTC
Verified with versions:
nmstate-1.0.0-1.el8.noarch
nispor-1.0.1-2.el8.x86_64
NetworkManager-1.30.0-0.4.el8.x86_64
DISTRO=RHEL-8.4.0-20201217.d.2
Linux hpe-dl380pgen8-02-vm-13.hpe2.lab.eng.bos.redhat.com 4.18.0-262.el8.dt3.x86_64 #1 SMP Tue Dec 15 04:28:42 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
openvswitch2.11-2.11.3-74.el8fdp.x86_64
nmstate-plugin-ovsdb-1.0.0-1.el8.noarch
openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch

Comment 9 errata-xmlrpc 2021-05-18 15:17:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (nmstate bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1748