Bug 1931376

Summary: VMs disconnected from nmstate-defined bridge after CNV-2.5.4->CNV-2.6.0 upgrade
Product: Container Native Virtualization (CNV) Reporter: Inbar Rose <irose>
Component: NetworkingAssignee: Petr Horáček <phoracek>
Status: CLOSED ERRATA QA Contact: Meni Yakove <myakove>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.6.0CC: cnv-qe-bugs, danken, fdeutsch, pelauter, phoracek, sgordon, vindicators
Target Milestone: ---Keywords: Automation, Regression
Target Release: 2.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kubernetes-nmstate-handler-container-v2.6.0-23 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1936432 (view as bug list) Environment:
Last Closed: 2021-03-10 11:23:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1932247    
Bug Blocks: 1936432    

Comment 4 Sylvain Réault 2021-02-22 14:57:33 UTC
Hello,

We have same results with used "macvtap" peripheral. I used the updates-testing repos.

On my another Servers, this problem does not appear but I don't used the updates-testing repos.

Sylvain

Comment 5 Petr Horáček 2021-02-22 15:37:22 UTC
Inbar,

Those logs are taken after the upgrade finished?

Both of those VMs vm-upgrade-a and vm-upgrade-b are on the same host. They report correct IP addresses. The bridge they are connected to is available on their node and seems healthy.

The one issue I see on your cluster is that VMs are not getting the MAC address they were allocated using the KubeMacPool. This is caused by NetworkAttachmentDefinition missing the `cnv-tuning` plugin. Please adjust your NetworkAttachmentDefinitions to mimick https://docs.openshift.com/container-platform/4.6/virt/virtual_machines/vm_networking/virt-attaching-vm-multiple-networks.html#virt-creating-bridge-nad-cli_virt-attaching-multiple-networks.

I would need access to the cluster to perform further debugging and see where the traffic gets stuck.

Comment 6 Petr Horáček 2021-02-22 15:38:22 UTC
(In reply to Sylvain Réault from comment #4)
> Hello,
> 
> We have same results with used "macvtap" peripheral. I used the
> updates-testing repos.
> 
> On my another Servers, this problem does not appear but I don't used the
> updates-testing repos.
> 
> Sylvain

Thanks for reporting this. Unfortunately, macvtap is not supported in OpenShift Virtualization. If you have issues with it, please open an Issue on KubeVirt's GitHub https://github.com/kubevirt/kubevirt/issues.

Comment 8 Inbar Rose 2021-02-23 06:32:37 UTC
(In reply to Petr Horáček from comment #5)
> Inbar,
> 
> Those logs are taken after the upgrade finished?
> 
> Both of those VMs vm-upgrade-a and vm-upgrade-b are on the same host. They
> report correct IP addresses. The bridge they are connected to is available
> on their node and seems healthy.
> 
> The one issue I see on your cluster is that VMs are not getting the MAC
> address they were allocated using the KubeMacPool. This is caused by
> NetworkAttachmentDefinition missing the `cnv-tuning` plugin. Please adjust
> your NetworkAttachmentDefinitions to mimick
> https://docs.openshift.com/container-platform/4.6/virt/virtual_machines/
> vm_networking/virt-attaching-vm-multiple-networks.html#virt-creating-bridge-
> nad-cli_virt-attaching-multiple-networks.
> 
> I would need access to the cluster to perform further debugging and see
> where the traffic gets stuck.

I enabled cnv-tuning, I will run tests again and hopefully that solves the issue

Comment 10 Meni Yakove 2021-02-27 17:36:31 UTC
nmstate-handler version is: v2.6.0-21

After nmstate-handler pods restart I lost the veth interface for the bridge.


Before restart pods:
5: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel master br1test state UP mode DEFAULT group default qlen 1000
88: br1test: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
90: vethca31f423@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br1test state UP mode DEFAULT group default


After restart pods:
5: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel master br1test state UP mode DEFAULT group default qlen 1000
88: br1test: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000

And no connectivity between the VMs.
 

The code from the pod:

def _bring_slave_up_if_not_in_desire(self):
        """
        When slave been included in master, automactially set it as state UP
        if not defiend in desire state
        """
        for iface in self._ifaces.values():
            if iface.is_up and iface.is_master:
                cur_iface = self.current_ifaces.get(iface.name)
                for slave_name in iface.slaves:
                    if cur_iface and slave_name in cur_iface.slaves:
                        # Nmstate should bring up the port interface if it has
                        # been added to the state not in all transactions
                        continue
                    slave_iface = self._ifaces[slave_name]
                    if not slave_iface.is_desired and not slave_iface.is_up:
                        slave_iface.mark_as_up()
                        slave_iface.mark_as_changed()

Comment 11 Petr Horáček 2021-02-27 18:33:02 UTC
I can confirm. While the veths stayed attached on a simple bridge without any NIC attached, when the bridge is attached to host's physical NIC, veths get disconnected.

Sound NNCP:

    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
      ipv4:
        dhcp: false
        enabled: false
      ipv6:
        enabled: false
      name: br1
      state: up
      type: linux-bridge

Failing one:

    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port:
        - name: ens9
      ipv4:
        dhcp: false
        enabled: false
      ipv6:
        enabled: false
      name: br1
      state: up
      type: linux-bridge

Comment 12 Meni Yakove 2021-03-01 22:16:06 UTC
Failed to verify the latest fix.
Failed to apply nncp.
nmstate-handler version is: v2.6.0-22


Traceback (most recent call last):
  File "/usr/bin/nmstatectl", line 11, in <module>
    load_entry_point('nmstate==0.3.4', 'console_scripts', 'nmstatectl')()
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 67, in main
    return args.func(args)
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 267, in apply
    args.save_to_disk,
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 289, in apply_state
    save_to_disk=save_to_disk,
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 71, in apply
    _apply_ifaces_state(plugins, net_state, verify_change, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 104, in _apply_ifaces_state
    plugin.apply_changes(net_state, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/plugin.py", line 174, in apply_changes
    nm_applier.apply_changes(self.context, net_state, save_to_disk)
  File "/usr/lib/python3.6/site-packages/libnmstate/nm/applier.py", line 84, in apply_changes
    for iface in net_state.ifaces.all_ifaces.values():
AttributeError: 'Ifaces' object has no attribute 'all_ifaces'



code from nmstate-handler pod:

/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py
def _bring_slave_up_if_not_in_desire(self):
        """
        When slave been included in master, automactially set it as state UP
        if not defiend in desire state
        """
        for iface in self._ifaces.values():
            if iface.is_desired and iface.is_up and iface.is_master:
                cur_iface = self.current_ifaces.get(iface.name)
                for slave_name in iface.slaves:
                    if cur_iface and slave_name in cur_iface.slaves:
                        # Nmstate should bring up the port interface if it has
                        # been added to the state not in all transactions
                        continue
                    slave_iface = self._ifaces[slave_name]
                    if not slave_iface.is_desired and not slave_iface.is_up:
                        slave_iface.mark_as_up()
                        slave_iface.mark_as_changed()

def _remove_unknown_interface_type_slaves(self):
        """
        When master containing slaves with unknown interface type or down
        state, they should be removed from master slave list before verifying.
        """
        for iface in self._ifaces.values():
            if iface.is_up and iface.is_master and iface.slaves:
                for slave_name in iface.slaves:
                    slave_iface = self._ifaces[slave_name]
                    if (
                        slave_iface.type == InterfaceType.UNKNOWN
                        or slave_iface.state != InterfaceState.UP
                    ):
                        iface.remove_slave(slave_name)

Comment 15 Meni Yakove 2021-03-02 20:44:06 UTC
Verified.
nmstate-handler version is: v2.6.0-23

Comment 17 errata-xmlrpc 2021-03-10 11:23:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799