Description of problem: mismatch of virt-handler daemonset state and pods state after making master schedulable and unschedulable 0) Check the Schedulable labels. oc get nodes -l "kubevirt.io/schedulable" NAME STATUS ROLES AGE VERSION c01-sb48a-mg575-worker-0-hst2w Ready worker 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-worker-0-qxfr2 Ready worker 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-worker-0-wxp85 Ready worker 4d23h v1.21.8+ee73ea2 1) Mark the master nodes as Schedulable == "true" ]$ oc patch schedulers.config.openshift.io cluster --type='merge' --patch='{"spec": {"mastersSchedulable": true}}' 2) Notice that the virt-handler pod are now running on the Master nodes as well. $ oc get ds -n openshift-cnv NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE ... virt-handler 6 6 6 6 6 kubernetes.io/os=linux 4d22h oc get nodes -l "kubevirt.io/schedulable=true" NAME STATUS ROLES AGE VERSION c01-sb48a-mg575-master-0 Ready master,worker 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-master-1 Ready master,worker 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-master-2 Ready master,worker 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-worker-0-hst2w Ready worker 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-worker-0-qxfr2 Ready worker 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-worker-0-wxp85 Ready worker 4d23h v1.21.8+ee73ea2 3) Mark the master nodes as Schedulable == "false" ]$ oc patch schedulers.config.openshift.io cluster --type='merge' --patch='{"spec": {"mastersSchedulable": false}}' 4) The Daemonset has got updated. $ oc get ds -n openshift-cnv NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE ... virt-handler 3 3 3 3 3 kubernetes.io/os=linux 4d22h 5) Notice that the virt-handler pod are STILL running on the Master nodes as well. Though the virt-handler daemonset count says Current count as "3". $ oc get nodes -l "kubevirt.io/schedulable=true" NAME STATUS ROLES AGE VERSION c01-sb48a-mg575-master-0 Ready master 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-master-1 Ready master 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-master-2 Ready master 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-worker-0-hst2w Ready worker 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-worker-0-qxfr2 Ready worker 4d23h v1.21.8+ee73ea2 c01-sb48a-mg575-worker-0-wxp85 Ready worker 4d23h v1.21.8+ee73ea2 Version-Release number of selected component (if applicable): CNV-4.8 , 4.10 How reproducible: Steps to Reproduce: 1. Make Masters Schedulable and Unschedulable 2. 3. Actual results: virt-handler daemonset count and virt-handler pods actual count mismatch Expected results: 1) virt-handler daemonset count and virt-handler pods count match 2) expectation is when I check for "kubevirt.io/schedulable=true" it should only exist on nodes, VMs are currently able to be scheduled on. Additional info:
Need to update more logs for this bug.
What is the impact of this defect? Is it just an incorrect state in kubevirt, or do VMs actually run on the "unschedulable" control plane nodes? is there a side effect of the state differences? ie. kubevirt may try to start more VMs that the cluster has resources, and k8s will just not schedule them? Something else?
So, the issue here seems to be that when the "worker" role is removed from a node, we basically ignore it, i.e. virt-handler keeps running and the node remains schedulable. We probably do want to keep virt-handler around, especially if there are VMIs running on that node, since we really don't want to orphan those. However, when the worker role is removed, we should set the kubevirt.io/schedulable label to false. That would prevent any new VMI from being created on that node. Would that last part be considered a proper fix to this issue?
test again with build: OCP-4.12.0-rc.1 CNV-v4.12.0-741 still get same result as comment 12
I verified on CNV-v4.12.3-79: I.First Scenario: steps provided by Lubo in comment #22: # limited the virt-handler to work only on one node > $ oc annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged kubevirt.kubevirt.io/jsonpatch='[{"op": "add", "path": "/spec/workloads", "value": {"nodePlacement": {"nodeSelector": {"kubernetes.io/hostname": "virt-den-412-k6ws2-worker-0-vbjgw"}}}}]' # virt-handler ds state shows that it works only on one node > $ oc get ds -n openshift-cnv | grep virt-handler > virt-handler 1 1 1 1 1 kubernetes.io/hostname=virt-den-412-k6ws2-worker-0-vbjgw,kubernetes.io/os=linux 4h11m # there is only one virt-handler POD is running: > $ oc get pod -n openshift-cnv | grep virt-handler > virt-handler-npg5t 1/1 Running 0 8m53s # Nodes without virt-handler POD marked as schedulable=false: > $ for i in $(oc get node -o name);do echo $i;oc describe $i | grep kubevirt.io/schedulable; done > node/virt-den-412-k6ws2-worker-0-hvrxl > kubevirt.io/schedulable=false > node/virt-den-412-k6ws2-worker-0-rc5ks > kubevirt.io/schedulable=false > node/virt-den-412-k6ws2-worker-0-vbjgw > kubevirt.io/schedulable=true II. Second scenario: based on original steps from the BZ description # Mark the master nodes as Schedulable > $ oc patch schedulers.config.openshift.io cluster --type='merge' --patch='{"spec": {"mastersSchedulable": true}}' # virt-handler pods are now running on the Master nodes as well. > $ oc get pod -n openshift-cnv | grep virt-handler > virt-handler-7p2qj 1/1 Running 0 95s > virt-handler-ftpzp 1/1 Running 0 95s > virt-handler-kf2c5 1/1 Running 0 52s > virt-handler-q8hxp 1/1 Running 0 95s > virt-handler-t7g4q 1/1 Running 0 103s > virt-handler-tt4z8 1/1 Running 0 103s # daemonset also shows 6 pods: > $ oc get ds -n openshift-cnv | grep virt-handler > virt-handler 6 6 6 6 6 kubernetes.io/os=linux 4h29m # and I'm able to create VM on the master node (using node-selector) > $ oc get vmi > NAME AGE PHASE IP NODENAME READY > vm-fedora-node1-1 53s Running 10.129.0.176 virt-den-412-k6ws2-master-2 True # Mark master nodes as non-schedulable > $ oc patch schedulers.config.openshift.io cluster --type='merge' --patch='{"spec": {"mastersSchedulable": false}}' # daemonset shows only 3 pods: > $ oc get ds -n openshift-cnv | grep virt-handler > virt-handler 3 3 3 3 3 kubernetes.io/os=linux 4h32m # however all 6 virt-handler pods are still running (as I understand - it is *expected behavior* because there could be running VMs on the master nodes) > $ oc get pod -n openshift-cnv | grep virt-handler > virt-handler-7p2qj 1/1 Running 0 6m34s > virt-handler-ftpzp 1/1 Running 0 6m34s > virt-handler-kf2c5 1/1 Running 0 5m51s > virt-handler-q8hxp 1/1 Running 0 6m34s > virt-handler-t7g4q 1/1 Running 0 6m42s > virt-handler-tt4z8 1/1 Running 0 6m42s # and my VM still active on the master node: > $ oc get vmi > NAME AGE PHASE IP NODENAME READY > vm-fedora-node1-1 5m31s Running 10.129.0.176 virt-den-412-k6ws2-master-2 True # When I tried to restart VM - it stuck with `ErrorUnschedulable` state > NAME AGE STATUS READY > vm-fedora-node1-1 9m33s ErrorUnschedulable False > Warning FailedScheduling 109s default-scheduler 0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. # ^^ it seems also expected # Manually removed virt-handler from one of master nodes - after some time the `schedulable` label on this node switched to `false` > $ oc describe node/virt-den-412-k6ws2-master-1 | grep kubevirt.io/schedulable > kubevirt.io/schedulable=false
LGTM. One more thing: # When I tried to restart VM - it stuck with `ErrorUnschedulable` state Here you could use anti/affinity and you would be able to exercise this part as well. Not required.
Thanks Lubo! Moving this BZ to Verified state
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days