Bug 2070033 - mismatch of virt-handler daemonset state and pods state after making master schedulable and unschedulable
Summary: mismatch of virt-handler daemonset state and pods state after making master s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.14.0
Assignee: Prita Narayan
QA Contact: zhe peng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-30 10:45 UTC by Kedar Bidarkar
Modified: 2024-03-08 04:25 UTC (History)
10 users (show)

Fixed In Version: hco-bundle-registry-containerv-4.12.0-736
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-08 14:05:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 8238 0 None open Patching node set kubevirt.io/schedulable to false upon exiting heartbeat 2022-08-04 08:56:14 UTC
Github kubevirt kubevirt pull 8790 0 None open [release-0.58] Patching node set kubevirt.io/schedulable to false upon exiting heartbeat 2022-11-18 08:55:34 UTC
Red Hat Issue Tracker CNV-17317 0 None None None 2022-10-27 09:48:43 UTC
Red Hat Product Errata RHSA-2023:6817 0 None None None 2023-11-08 14:05:28 UTC

Description Kedar Bidarkar 2022-03-30 10:45:33 UTC
Description of problem:

mismatch of virt-handler daemonset state and pods state after making master schedulable and unschedulable

0) Check the Schedulable labels.
oc get nodes -l "kubevirt.io/schedulable"
NAME                             STATUS   ROLES    AGE     VERSION
c01-sb48a-mg575-worker-0-hst2w   Ready    worker   4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-worker-0-qxfr2   Ready    worker   4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-worker-0-wxp85   Ready    worker   4d23h   v1.21.8+ee73ea2

1) Mark the master nodes as Schedulable == "true"

 ]$ oc patch schedulers.config.openshift.io cluster --type='merge' --patch='{"spec": {"mastersSchedulable": true}}'

2) Notice that the virt-handler pod are now running on the Master nodes as well.
$ oc get ds -n openshift-cnv
NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
...
virt-handler                   6         6         6       6            6           kubernetes.io/os=linux          4d22h

oc get nodes -l "kubevirt.io/schedulable=true"
NAME                             STATUS   ROLES           AGE     VERSION
c01-sb48a-mg575-master-0         Ready    master,worker   4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-master-1         Ready    master,worker   4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-master-2         Ready    master,worker   4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-worker-0-hst2w   Ready    worker          4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-worker-0-qxfr2   Ready    worker          4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-worker-0-wxp85   Ready    worker          4d23h   v1.21.8+ee73ea2

3) Mark the master nodes as Schedulable == "false"
]$ oc patch schedulers.config.openshift.io cluster --type='merge' --patch='{"spec": {"mastersSchedulable": false}}'

4) The Daemonset has got updated.
$ oc get ds -n openshift-cnv
NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
...
virt-handler                   3         3         3       3            3           kubernetes.io/os=linux          4d22h


5) Notice that the virt-handler pod are STILL running on the Master nodes as well.
Though the virt-handler daemonset count says Current count as "3".

$ oc get nodes -l "kubevirt.io/schedulable=true"
NAME                             STATUS   ROLES    AGE     VERSION
c01-sb48a-mg575-master-0         Ready    master   4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-master-1         Ready    master   4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-master-2         Ready    master   4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-worker-0-hst2w   Ready    worker   4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-worker-0-qxfr2   Ready    worker   4d23h   v1.21.8+ee73ea2
c01-sb48a-mg575-worker-0-wxp85   Ready    worker   4d23h   v1.21.8+ee73ea2



Version-Release number of selected component (if applicable):
CNV-4.8 , 4.10 

How reproducible:


Steps to Reproduce:
1. Make Masters Schedulable and Unschedulable
2. 
3.

Actual results:
virt-handler daemonset count and virt-handler pods actual count mismatch 

Expected results:
1) virt-handler daemonset count and virt-handler pods count match
2) expectation is when I check for "kubevirt.io/schedulable=true"
it should only exist on nodes, VMs are currently able to be scheduled on.

Additional info:

Comment 1 Kedar Bidarkar 2022-04-06 12:15:26 UTC
Need to update more logs for this bug.

Comment 5 Peter Lauterbach 2022-04-20 12:49:49 UTC
What is the impact of this defect?  Is it just an incorrect state in kubevirt, or do VMs actually run on the "unschedulable" control plane nodes?
is there a side effect of the state differences? ie. kubevirt may try to start more VMs that the cluster has resources, and k8s will just not schedule them?
Something else?

Comment 6 Jed Lejosne 2022-06-09 12:31:48 UTC
So, the issue here seems to be that when the "worker" role is removed from a node, we basically ignore it, i.e. virt-handler keeps running and the node remains schedulable.
We probably do want to keep virt-handler around, especially if there are VMIs running on that node, since we really don't want to orphan those.
However, when the worker role is removed, we should set the kubevirt.io/schedulable label to false. That would prevent any new VMI from being created on that node.
Would that last part be considered a proper fix to this issue?

Comment 14 Kedar Bidarkar 2022-11-30 09:54:47 UTC
test again with build:
OCP-4.12.0-rc.1
CNV-v4.12.0-741
still get same result as comment 12

Comment 23 Denys Shchedrivyi 2023-05-29 23:28:19 UTC
I verified on CNV-v4.12.3-79:

I.First Scenario: steps provided by Lubo in comment #22:

# limited the virt-handler to work only on one node
> $ oc annotate --overwrite -n openshift-cnv hco kubevirt-hyperconverged kubevirt.kubevirt.io/jsonpatch='[{"op": "add", "path": "/spec/workloads", "value": {"nodePlacement": {"nodeSelector": {"kubernetes.io/hostname": "virt-den-412-k6ws2-worker-0-vbjgw"}}}}]'

# virt-handler ds state shows that it works only on one node
> $ oc get ds -n openshift-cnv | grep virt-handler
> virt-handler                   1         1         1       1            1           kubernetes.io/hostname=virt-den-412-k6ws2-worker-0-vbjgw,kubernetes.io/os=linux   4h11m

# there is only one virt-handler POD is running:
> $ oc get pod -n openshift-cnv | grep virt-handler
> virt-handler-npg5t                                                1/1     Running   0               8m53s

# Nodes without virt-handler POD marked as schedulable=false:
> $ for i in $(oc get node -o name);do echo $i;oc describe $i | grep kubevirt.io/schedulable; done
> node/virt-den-412-k6ws2-worker-0-hvrxl
>                    kubevirt.io/schedulable=false
> node/virt-den-412-k6ws2-worker-0-rc5ks
>                    kubevirt.io/schedulable=false
> node/virt-den-412-k6ws2-worker-0-vbjgw
>                    kubevirt.io/schedulable=true



II. Second scenario: based on original steps from the BZ description
 
# Mark the master nodes as Schedulable
> $ oc patch schedulers.config.openshift.io cluster --type='merge' --patch='{"spec": {"mastersSchedulable": true}}'

# virt-handler pods are now running on the Master nodes as well.
> $ oc get pod -n openshift-cnv | grep virt-handler
> virt-handler-7p2qj                                                1/1     Running   0               95s
> virt-handler-ftpzp                                                1/1     Running   0               95s
> virt-handler-kf2c5                                                1/1     Running   0               52s
> virt-handler-q8hxp                                                1/1     Running   0               95s
> virt-handler-t7g4q                                                1/1     Running   0               103s
> virt-handler-tt4z8                                                1/1     Running   0               103s

# daemonset also shows 6 pods:
> $ oc get ds -n openshift-cnv | grep virt-handler
> virt-handler                   6         6         6       6            6           kubernetes.io/os=linux          4h29m

# and I'm able to create VM on the master node (using node-selector)
> $ oc get vmi
> NAME                AGE   PHASE     IP             NODENAME                      READY
> vm-fedora-node1-1   53s   Running   10.129.0.176   virt-den-412-k6ws2-master-2   True

# Mark master nodes as non-schedulable
> $ oc patch schedulers.config.openshift.io cluster --type='merge' --patch='{"spec": {"mastersSchedulable": false}}'

# daemonset shows only 3 pods:
> $ oc get ds -n openshift-cnv | grep virt-handler
> virt-handler                   3         3         3       3            3           kubernetes.io/os=linux          4h32m

# however all 6 virt-handler pods are still running (as I understand - it is *expected behavior* because there could be running VMs on the master nodes)
> $ oc get pod -n openshift-cnv | grep virt-handler
> virt-handler-7p2qj                                                1/1     Running   0               6m34s
> virt-handler-ftpzp                                                1/1     Running   0               6m34s
> virt-handler-kf2c5                                                1/1     Running   0               5m51s
> virt-handler-q8hxp                                                1/1     Running   0               6m34s
> virt-handler-t7g4q                                                1/1     Running   0               6m42s
> virt-handler-tt4z8                                                1/1     Running   0               6m42s

# and my VM still active on the master node:
> $ oc get vmi
> NAME                AGE     PHASE     IP             NODENAME                      READY
> vm-fedora-node1-1   5m31s   Running   10.129.0.176   virt-den-412-k6ws2-master-2   True

# When I tried to restart VM - it stuck with `ErrorUnschedulable` state
> NAME                AGE     STATUS               READY
> vm-fedora-node1-1   9m33s   ErrorUnschedulable   False

>   Warning  FailedScheduling  109s  default-scheduler  0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
# ^^ it seems also expected

# Manually removed virt-handler from one of master nodes - after some time the `schedulable` label on this node switched to `false`
> $ oc describe node/virt-den-412-k6ws2-master-1 | grep kubevirt.io/schedulable
>                   kubevirt.io/schedulable=false

Comment 25 lpivarc 2023-06-06 12:24:36 UTC
LGTM. 
One more thing:

# When I tried to restart VM - it stuck with `ErrorUnschedulable` state 
Here you could use anti/affinity and you would be able to exercise this part as well. Not required.

Comment 26 Denys Shchedrivyi 2023-06-06 14:38:52 UTC
Thanks Lubo! 
Moving this BZ to Verified state

Comment 29 errata-xmlrpc 2023-11-08 14:05:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817

Comment 30 Red Hat Bugzilla 2024-03-08 04:25:07 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.