Bug 2078573 - SDN CNI -Fail to create nncp when vxlan is up
Summary: SDN CNI -Fail to create nncp when vxlan is up
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.11.0
Assignee: Christoph Stäbler
QA Contact: Aleksandra Malykhin
URL:
Whiteboard:
: 2089326 (view as bug list)
Depends On: 2078940 2104439 2104457 2104820
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-25 16:42 UTC by Ruth Netser
Modified: 2022-08-10 11:08 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:08:16 UTC
Target Upstream Version:
Embargoed:
amalykhi: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocp-build-data pull 1633 0 None Merged Temporarily add ose repo for nmstate-handler 2022-05-25 09:22:17 UTC
Red Hat Bugzilla 2078940 1 urgent CLOSED base-iface field at OVS vxlan port is empty 2023-10-04 05:55:20 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:08:32 UTC

Internal Links: 2078940

Description Ruth Netser 2022-04-25 16:42:02 UTC
Description of problem:
Fail to create linux-bridge nncp on:

    - lastHearbeatTime: "2022-04-25T16:29:49Z"
      lastTransitionTime: "2022-04-25T16:29:49Z"
      message: |
        error reconciling NodeNetworkConfigurationPolicy at desired state apply: ,
        failed to execute nmstatectl set --no-commit --timeout 480: 'exit status 1'
        KeyError
          ''

This is a regression (works in CNV v4.11.0-49 with the same knmstate version)

Version-Release number of selected component (if applicable):

OCP 4.11.0-0.nightly-2022-04-24-135651
kubernetes-nmstate-operator.4.11.0-202203281806
CNV v4.11.0-53

How reproducible:
100%

Steps to Reproduce:
1. Create linux bridge nncp
2.
3.

Actual results:
nncp fail to apply

Expected results:
nncp should be applied 


Additional info:
=================
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: upgrade-br-marker
spec:
  desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port: []
      ipv4:
        auto-dns: true
        dhcp: false
        enabled: false
      ipv6:
        auto-dns: true
        autoconf: false
        dhcp: false
        enabled: false
      name: upg-br-mark
      state: up
      type: linux-bridge
  nodeSelector:
    kubernetes.io/hostname: cnv-qe-infra-24.cnvqe2.lab.eng.rdu2.redhat.com


=================
2022-04-25 14:57:19,313 root         DEBUG    NetworkManager version 1.32.10    
2022-04-25 14:57:19,376 root         DEBUG    Async action: Retrieve applied config: ethernet eno1 started  
2022-04-25 14:57:19,377 root         DEBUG    Async action: Retrieve applied config: ethernet eno2 started 
2022-04-25 14:57:19,378 root         DEBUG    Async action: Retrieve applied config: ethernet eno1 finished 
2022-04-25 14:57:19,379 root         DEBUG    Async action: Retrieve applied config: ethernet eno2 finished
2022-04-25 14:57:19,380 root         DEBUG    Interface ethernet.eno1 found. Merging the interface information.
2022-04-25 14:57:19,381 root         DEBUG    Interface ethernet.eno2 found. Merging the interface information.
2022-04-25 14:57:19,381 root         DEBUG    Interface ethernet.eno3 found. Merging the interface information.
2022-04-25 14:57:19,381 root         DEBUG    Interface ethernet.eno4 found. Merging the interface information.
Traceback (most recent call last):                                              
  File \"/usr/bin/nmstatectl\", line 11, in <module>                            
    load_entry_point('nmstate==1.2.1', 'console_scripts', 'nmstatectl')()       
  File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 74, in main
    return args.func(args)                                                      
  File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 338, in set
    return apply(args)                                                          
  File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 366, in apply
    args.save_to_disk,                                                          
  File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 419, in apply_state
    save_to_disk=save_to_disk,                                                  
  File \"/usr/lib/python3.6/site-packages/libnmstate/netapplier.py\", line 86, in apply
    desired_state, ignored_ifnames, current_state, save_to_disk                 
  File \"/usr/lib/python3.6/site-packages/libnmstate/net_state.py\", line 51, in __init__
    gen_conf_mode,                                                              
  File \"/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py\", line 164, in __init__
    self._pre_edit_validation_and_cleanup()                                     
  File \"/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py\", line 268, in _pre_edit_validation_and_cleanup
    self._validate_vlan_not_over_infiniband()                                   
  File \"/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py\", line 324, in _validate_vlan_not_over_infiniband
    self._kernel_ifaces[iface.parent].type                                      
KeyError: ''

Comment 1 Quique Llorente 2022-04-26 05:53:41 UTC
Checking the NNS looks like the vlxan interface present at node "vxlan_sys_4789" has not base-iface:

vxlan:
    base-iface: ''


So this fail when accessing the interfaces dictionary with an empty value 

https://github.com/nmstate/nmstate/blob/nmstate-1.2/libnmstate/ifaces/ifaces.py#L324

Also the nodes are based on different RHEL version, RHEL 8.5 (nodes) vs RHEL 8.6 (nmstate containers) so they have different NetworkManager versions

libnm version 1.36.0 mismatches NetworkManager version 1.32.10

Comment 3 Ben Nemec 2022-04-27 14:14:58 UTC
While we think the underlying cause here is https://bugzilla.redhat.com/show_bug.cgi?id=2078940 , we want to use this bug to investigate the version mismatch between the node and the container.

Comment 6 Aleksandra Malykhin 2022-05-03 12:11:53 UTC
Reproduced on standalone operator

OCP version 4.11.0-0.nightly-2022-04-26-181148
knmstate version kubernetes-nmstate-operator.4.11.0-202205020057

Also have the warning:
WARNING  libnm version 1.36.0 mismatches NetworkManager version 1.32.10

Comment 7 Lalatendu Mohanty 2022-05-11 18:59:35 UTC
This bug is marked as regression. Does this impact OCP 4.10.z or some previous versions ?

Comment 10 Aleksandra Malykhin 2022-05-17 14:14:59 UTC
Verified on kubernetes-nmstate-operator.4.11.0-202205171127

apply the nncp from the description

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: upgrade-br-marker
spec:
  desiredState:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port: []
      ipv4:
        auto-dns: true
        dhcp: false
        enabled: false
      ipv6:
        auto-dns: true
        autoconf: false
        dhcp: false
        enabled: false
      name: upg-br-mark
      state: up
      type: linux-bridge
  nodeSelector:
    kubernetes.io/hostname: worker-0-0
[kni@provisionhost-0-0 ocp-edge-auto_ocp-edge-cluster-0]$ oc get nncp -w
NAME                STATUS        REASON
upgrade-br-marker   Progressing   ConfigurationProgressing
upgrade-br-marker   Progressing   ConfigurationProgressing
upgrade-br-marker   Available     SuccessfullyConfigured

[kni@provisionhost-0-0 ocp-edge-auto_ocp-edge-cluster-0]$ oc get nns worker-0-0 -o yaml

....
      mac-address: CA:01:82:5F:DE:11
      mtu: 1500
      name: upg-br-mark
      state: up
      type: linux-bridge
...

Comment 14 Christoph Stäbler 2022-05-25 09:06:16 UTC
*** Bug 2089326 has been marked as a duplicate of this bug. ***

Comment 15 Aleksandra Malykhin 2022-05-30 17:49:53 UTC
Verified with Kubernetes NMState Operator   4.11.0-202205250927

Comment 17 errata-xmlrpc 2022-08-10 11:08:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.