Bug 2000052 - NNCP creation failures after nmstate-handler pod deletion
Summary: NNCP creation failures after nmstate-handler pod deletion
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 4.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.0
Assignee: Radim Hrazdil
QA Contact: awax
URL:
Whiteboard:
Depends On:
Blocks: 2001796 2001901 2004527
TreeView+ depends on / blocked
 
Reported: 2021-09-01 09:36 UTC by Geetika Kapoor
Modified: 2021-11-02 16:01 UTC (History)
4 users (show)

Fixed In Version: kubernetes-nmstate-handler-container-v4.9.0-22
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2001796 (view as bug list)
Environment:
Last Closed: 2021-11-02 16:01:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github nmstate kubernetes-nmstate pull 793 0 None None None 2021-09-06 14:41:32 UTC
Red Hat Product Errata RHSA-2021:4104 0 None None None 2021-11-02 16:01:30 UTC

Description Geetika Kapoor 2021-09-01 09:36:50 UTC
Description of problem:

NNCP creation failures after nmstate-handler pod deletion. 
if nmstate-handler pod is deleted, nncp is not getting configured because nns has changed params so current state doesn't match with desired state?
before killing the pods, ping works as expected.


Version-Release number of selected component (if applicable):


$ oc get csv -n openshift-cnv
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.9.0   OpenShift Virtualization   4.9.0     kubevirt-hyperconverged-operator.v4.8.0   Succeeded

How reproducible:

Regression. Always. works in 4.8

Steps to Reproduce:
1. Run automation tests/network/nmstate/test_nmstate_sanity.py::TestNmstatePodDeletion
2.
3.

Actual results:

NNCP is not getting created. Error is desired state doesn't match current state.

Expected results:


Additional info:


desired network state {'bridge': {'options': {'stp': {'enabled': False, 'priority': 32768, 'forward-delay': 15, 'hello-time': 2, 'max-age': 20}, 'mac-ageing-time': 300, 'group-forward-mask': 0, 'multicast-snooping': True, 'hello-timer': 0, 'gc-timer': 24928, 'multicast-router': 1, 'group-addr': '01:80:C2:00:00:00', 'hash-max': 4096, 'multicast-last-member-count': 2, 'multicast-last-member-interval': 100, 'multicast-querier': False, 'multicast-querier-interval': 25500, 'multicast-query-use-ifaddr': False, 'multicast-query-interval': 12500, 'multicast-query-response-interval': 1000, 'multicast-startup-query-count': 2, 'multicast-startup-query-interval': 3125}, 'port': [{'name': 'ens10', 'stp-hairpin-mode': False, 'stp-path-cost': 100, 'stp-priority': 32, 'vlan': {'mode': 'trunk', 'trunk-tags': [{'id-range': {'min': 2, 'max': 4094}}], 'enable-native': False}}]}, 'ipv4': {'dhcp': False, 'enabled': False, 'address': []}, 'ipv6': {'enabled': False}, 'name': 'br1test', 'state': 'up', 'type': 'linux-bridge', 'lldp': {'enabled': False}, 'mac-address': 'FA:16:3E:A0:FF:B0', 'mtu': 1450}


nns

[cnv-qe-jenkins@n-myakove-49-52wq5-executor logs]$ oc get nns n-myakove-49-52wq5-worker-0-7gljg -o yaml
apiVersion: nmstate.io/v1beta1
kind: NodeNetworkState
metadata:
  creationTimestamp: "2021-08-31T08:54:17Z"
  generation: 1
  labels:
    app.kubernetes.io/component: network
    app.kubernetes.io/managed-by: hco-operator
    app.kubernetes.io/part-of: hyperconverged-cluster
    app.kubernetes.io/version: v4.9.0
    nmstate.io/force-nns-refresh: "1630481891198597603"
  name: n-myakove-49-52wq5-worker-0-7gljg
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: n-myakove-49-52wq5-worker-0-7gljg
    uid: a1a0a0c4-98f0-4872-a152-7bf81adaf9b7
  resourceVersion: "1567848"
  uid: d295938d-eb99-4086-bc4c-be15230fd8d9
status:
  currentState:
    interfaces:
    - ipv4:
        address: []
        enabled: false
      ipv6:
        address: []
        enabled: false
      mac-address: FA:BE:A6:0E:C4:47
      mtu: 1400
      name: br0
      state: down
      type: ovs-interface
    - bridge:
        options:
          group-addr: 01:80:C2:00:00:00
          group-forward-mask: 0
          hash-max: 4096
          mac-ageing-time: 300
          multicast-last-member-count: 2
          multicast-last-member-interval: 100
          multicast-querier: false
          multicast-querier-interval: 25500
          multicast-query-interval: 12500
          multicast-query-response-interval: 1000
          multicast-query-use-ifaddr: false
          multicast-router: 1
          multicast-snooping: true
          multicast-startup-query-count: 2
          multicast-startup-query-interval: 3125
          stp:
            enabled: false
            forward-delay: 15
            hello-time: 2
            max-age: 20
            priority: 32768
        port:
        - name: ens10
          stp-hairpin-mode: false
          stp-path-cost: 100
          stp-priority: 32
      ipv4:
        address: []
        dhcp: false
        enabled: false
      ipv6:
        address: []
        autoconf: false
        dhcp: false
        enabled: false
      lldp:
        enabled: false
      mac-address: FA:16:3E:A0:FF:B0
      mtu: 1450
      name: br1test
      state: up
      type: linux-bridge
    - ipv4:
        address: []
        dhcp: false
        enabled: false
      ipv6:
        address: []
        autoconf: false
        dhcp: false
        enabled: false
      lldp:
        enabled: false
      mac-address: FA:16:3E:A0:FF:B0
      mtu: 1450
      name: ens10
      state: up
      type: ethernet
    - ipv4:
        address:
        - ip: 192.168.0.127
          prefix-length: 18
        - ip: 192.168.0.7
          prefix-length: 32
        auto-dns: true
        auto-gateway: true
        auto-route-table-id: 0
        auto-routes: true
        dhcp: true
        enabled: true
      ipv6:
        address:
        - ip: fe80::9543:79f0:befe:3367
          prefix-length: 64
        auto-dns: true
        auto-gateway: true
        auto-route-table-id: 0
        auto-routes: true
        autoconf: true
        dhcp: true
        enabled: true
      lldp:
        enabled: false
      mac-address: FA:16:3E:EE:BB:D9
      mtu: 1450
      name: ens3
      state: up
      type: ethernet
    - ipv4:
        address: []
        dhcp: false
        enabled: false
      ipv6:
        address: []
        autoconf: false
        dhcp: false
        enabled: false
      lldp:
        enabled: false
      mac-address: FA:16:3E:7E:E6:B5
      mtu: 1400
      name: ens8
      state: up
      type: ethernet
    - ipv4:
        address: []
        enabled: false
      ipv6:
        address: []
        enabled: false
      lldp:
        enabled: false
      mac-address: FA:16:3E:15:D0:56
      mtu: 1450
      name: ens9
      state: down
      type: ethernet
    - ipv4:
        address:
        - ip: 127.0.0.1
          prefix-length: 8
        enabled: true
      ipv6:
        address:
        - ip: ::1
          prefix-length: 128
        enabled: true
      mac-address: "00:00:00:00:00:00"
      mtu: 65536
      name: lo
      state: up
      type: unknown
    - ipv4:
        address: []
        enabled: false
      ipv6:
        address: []
        enabled: false
      mac-address: 6E:69:21:4A:6E:D6
      mtu: 1500
      name: ovs-system
      state: down
      type: ovs-interface
    - ipv4:
        address:
        - ip: 10.129.2.1
          prefix-length: 23
        enabled: true
      ipv6:
        address:
        - ip: fe80::e08b:7fff:fe3d:1afb
          prefix-length: 64
        enabled: true
      mac-address: E2:8B:7F:3D:1A:FB
      mtu: 1400
      name: tun0
      state: up
      type: ovs-interface
    - ipv4:
        address: []
        enabled: false
      ipv6:
        address:
        - ip: fe80::4420:d1ff:fee4:797f
          prefix-length: 64
        enabled: true
      lldp:
        enabled: false
      mac-address: 46:20:D1:E4:79:7F
      mtu: 65000
      name: vxlan_sys_4789
      state: down
      type: vxlan
      vxlan:
        base-iface: ""
        destination-port: 4789
        id: 0
        remote: ""
    routes:
      config:
      - destination: 10.128.0.0/14
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
      - destination: 172.30.0.0/16
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
      running:
      - destination: fe80::/64
        metric: 101
        next-hop-address: '::'
        next-hop-interface: ens3
        table-id: 254
      - destination: fe80::/64
        metric: 256
        next-hop-address: '::'
        next-hop-interface: vxlan_sys_4789
        table-id: 254
      - destination: fe80::/64
        metric: 256
        next-hop-address: '::'
        next-hop-interface: tun0
        table-id: 254
      - destination: 0.0.0.0/0
        metric: 101
        next-hop-address: 192.168.0.1
        next-hop-interface: ens3
        table-id: 254
      - destination: 10.128.0.0/14
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
      - destination: 169.254.169.254/32
        metric: 101
        next-hop-address: 192.168.0.10
        next-hop-interface: ens3
        table-id: 254
      - destination: 172.30.0.0/16
        metric: 0
        next-hop-address: 0.0.0.0
        next-hop-interface: tun0
        table-id: 254
      - destination: 192.168.0.0/18
        metric: 101
        next-hop-address: 0.0.0.0
        next-hop-interface: ens3
        table-id: 254
  lastSuccessfulUpdateTime: "2021-09-01T09:05:48Z"

Comment 3 Radim Hrazdil 2021-09-02 14:48:32 UTC
Hello Geetika, I'm having issues reproducing this, could you please collect also:

Before applying the NNCP:
oc get nns <node> -o yaml

After applying the NNCP and before 
oc get nns <node> -o yaml
oc get nnce <policy_name>.<node> -o yaml

After killing the handler pod:
oc get nns <node> -o yaml
oc get nnce <policy_name>.<node> -o yaml

Comment 5 Radim Hrazdil 2021-09-06 10:17:01 UTC
Thank you Geetika!

I have managed to reproduce the issue u/s, the problem seems to be caused by vlan-filtering script that configures VLAN trunking on linux-bridge ports.
It is odd that the issue has arised now, since this is what we have done in kubernetes-nmstate for a long time.

My assumption is that it's caused by a change in nmstate verification logic. This is still being investigated.

The issue should be resolved when d/s release contains https://github.com/nmstate/kubernetes-nmstate/pull/793

Comment 6 Radim Hrazdil 2021-09-06 14:41:33 UTC
I have verified that nmstate has changed this behaviour between 0.3.4 and 1.0.2-14.

The PR linked above should indeed fix the issue.

Comment 7 awax 2021-09-11 19:02:50 UTC
Tested with these version-Release numbers:
CNV - 4.9.0
OCP - 4.9.0-rc.0
Kubernetes Version: v1.22.0-rc.0+75ee307

bug was fixed.

Comment 10 errata-xmlrpc 2021-11-02 16:01:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4104


Note You need to log in before you can comment on or make changes to this bug.