Bug 2130604

Summary:	Unable to start/stop VM while rebooting the node where kubemacpool-mac-controller-manager pod is running
Product:	Container Native Virtualization (CNV)	Reporter:	Adolfo Aguirrezabal <aaguirre>
Component:	Networking	Assignee:	Ram Lavi <ralavi>
Status:	CLOSED ERRATA	QA Contact:	awax
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.11.0	CC:	awax, blevin, ellorent, nrozen, phoracek, ycui
Target Milestone:	---
Target Release:	4.13.3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	v4.13.3.rhel9-34	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-08-16 14:09:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Adolfo Aguirrezabal 2022-09-28 15:20:31 UTC

Description of problem:
When the master node where the kubemacpool-mac-controller-manager pod is running is being rebooted, a user is unable to start or stop a previously created VM.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Create a VM.
2. Reboot the node where the kubemacpool-mac-controller-manager pod is running
3. Try to start or stop the previously created VM while the node is being rebooted.

Actual results:
The VM is not started or stopped.

# virtctl start vm
Error starting VirtualMachine Internal error occurred: Internal error occurred: failed calling webhook "mutatevirtualmachines.kubemacpool.io": failed to call webhook: Post "https://kubemacpool-service.openshift-cnv.svc:443/mutate-virtualmachines?timeout=10s": dial tcp 10.130.0.11:8000: connect: connection refused


Expected results:
The VM can be started or stopped.

Additional info:
Tested on 4.11 cluster with 3 masters and 3 workers

Comment 1 Petr Horáček 2022-09-30 15:35:55 UTC

We probably don't want to go back to active-backup architecture, but we should investigate whether there is something that could reduce the downtime. Perhaps a health check that would spawn a new KMP instance in case the old one is not responding.

Comment 2 Quique Llorente 2022-10-25 08:33:25 UTC

What happend today if we restart the node where virt-operator is running ? do we suffer the same issue ?

Comment 3 Quique Llorente 2022-10-25 08:34:55 UTC

My mistake I mean virt-controller but those are running at workers

Comment 4 Quique Llorente 2022-10-25 08:40:36 UTC

First we have to check if the env has multiple masters (that's a requirement from openshift to have a proper environment I think) then we can improve the situation probes so the instead of node reboot time is pod start time.

Comment 5 Ram Lavi 2022-10-27 10:17:23 UTC

note that if you reboot the node gracefully then the kubemacpool pod should not experience the downtime you mention. The scenario is relevant for when you ungracefully reboot the node (for example node crash). For more information on graceful reboot, please see https://docs.openshift.com/container-platform/4.11/nodes/nodes/nodes-nodes-rebooting.html#nodes-nodes-rebooting-gracefully_nodes-nodes-rebooting.

Comment 6 Ram Lavi 2022-11-02 12:33:50 UTC

In order to fix the issue for cases where the node ungracefully restarts, we need to add a toleration to the KMP-manager deployment pod. 
Also explored adding liveness probe - but it doesn't work since the kubelet (the object that probes the pod to determine if it's alive) is also dead with the node, thus rendering it useless for this case).

Comment 7 Ram Lavi 2022-11-02 17:06:38 UTC

I've set it so that if the node is down for more than 1 minute, the KMP pod will be evicted to another node if available. PR will arrive soon

Comment 8 Ram Lavi 2022-11-02 17:36:49 UTC

https://github.com/k8snetworkplumbingwg/kubemacpool/pull/374

Comment 9 Petr Horáček 2022-11-09 11:54:24 UTC

The fix was merged on KMP. KMP release and pinning it in CNAO are next.

Comment 12 Ram Lavi 2023-03-07 08:41:09 UTC

Hey Anat, please provide an explanation as to why this BZ is not fixed.

Comment 13 awax 2023-03-07 12:14:05 UTC

Hi Ram, as you can see in the node, the tolaration is still set to '300':

oc get pod -n openshift-cnv kubemacpool-mac-controller-manager-649cbb596c-24qfg -oyaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    description: KubeMacPool manages MAC allocation to Pods and VMs
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.129.0.131/23"],"mac_address":"0a:58:0a:81:00:83","gateway_ips":["10.129.0.1"],"ip_address":"10.129.0.131/23","gateway_ip":"10.129.0.1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.129.0.131"
          ],
          "mac": "0a:58:0a:81:00:83",
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.129.0.131"
          ],
          "mac": "0a:58:0a:81:00:83",
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: restricted-v2
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2023-03-07T08:48:03Z"
  generateName: kubemacpool-mac-controller-manager-649cbb596c-

...

  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - name: tls-key-pair
    secret:
      defaultMode: 420
      secretName: kubemacpool-service
  - name: kube-api-access-2lqx4
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt

   ...

Comment 14 Ram Lavi 2023-03-08 07:48:20 UTC

The reason the Kubemacpool fix wasn't enough is because the toleration is overwritten by the CNAO operator when deployed by it.

Comment 15 Ram Lavi 2023-03-08 08:25:11 UTC

https://github.com/kubevirt/cluster-network-addons-operator/pull/1515

Comment 16 Petr Horáček 2023-05-11 10:09:43 UTC

Deferring to 4.13.2 to save capacity verifying urgent 4.13.1 bugs.

Comment 17 Ram Lavi 2023-07-03 07:05:35 UTC

For CNV4.13 stable branch: https://github.com/kubevirt/cluster-network-addons-operator/pull/1581

Comment 18 Bell Levin 2023-08-14 10:20:38 UTC

Verified the bug on PSI cluster net-bl-4133250 (v4.13.3-250)

Version info:

[blevin@fedora kubeconfigs]$ oc get csv -n openshift-cnv
NAME                                       DISPLAY                       VERSION   REPLACES                                   PHASE
kubevirt-hyperconverged-operator.v4.13.3   OpenShift Virtualization      4.13.3    kubevirt-hyperconverged-operator.v4.13.2   Succeeded



[blevin@fedora kubeconfigs]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.9    True        False         4h32m   Cluster version is 4.13.9

Reproduced by steps in the description.
After rebooting the node, takes approx. 2 minutes for the user to be able to use virtctl commands on the VM.

Comment 25 errata-xmlrpc 2023-08-16 14:09:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.13.3 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:4664