2219785 – With cluster-level evictionStrategy:LiveMigrate set in HCO CR certain VMs would fail to restart and get stuck during node drain

Bug 2219785 - With cluster-level evictionStrategy:LiveMigrate set in HCO CR certain VMs would fail to restart and get stuck during node drain

Summary: With cluster-level evictionStrategy:LiveMigrate set in HCO CR certain VMs wou...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Installation
Sub Component:
Version:	4.14.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.14.0
Assignee:	Simone Tiraboschi
QA Contact:	Akriti Gupta
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2219784 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-07-05 10:54 UTC by Akriti Gupta
Modified:	2023-11-08 14:06 UTC (History)
CC List:	8 users (show)
Fixed In Version:	hco-bundle-registry-container-v4.14.0.rhel9-1179
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 14:05:53 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt hyperconverged-cluster-operator pull 2423	None	open	Add LiveMigrateIfPossible to cluster level EvictionStrategy	2023-07-06 14:27:23 UTC
Red Hat Issue Tracker	CNV-30684	None	None	None	2023-07-05 10:57:43 UTC
Red Hat Product Errata	RHSA-2023:6817	None	None	None	2023-11-08 14:06:02 UTC

Description Akriti Gupta 2023-07-05 10:54:06 UTC

Description of problem:
vm with HPP storage class(Non Migratable) fails to restart on another node during node drain and stuck with error message: 
error when evicting pods/"virt-launcher-vm-rhel88-source-hpp-z5lbw" -n "default" (will retry after 5s): admission webhook "virt-launcher-eviction-interceptor.kubevirt.io" denied the request: VMI vm-rhel88-source-hpp is configured with an eviction strategy but is not live-migratable


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. create a VM with HPP storage class
2. Do node drain
3.

Actual results:
VM fails to restart on another node

Expected results:


Additional info:

Comment 3 Kedar Bidarkar 2023-07-05 12:24:29 UTC

*** Bug 2219784 has been marked as a duplicate of this bug. ***

Comment 4 Kedar Bidarkar 2023-07-05 13:23:41 UTC

evictionStrategy:LiveMigrateIfPossible is explained as part of this PR Description, https://github.com/kubevirt/kubevirt/pull/9798

HCO default should adopt using evictionStrategy:LiveMigrateIfPossible  as the default eviction strategy.

QE will be testing this and then would move it to HCO, for updating the default value.

Also currently directly updating evictionStrategy:LiveMigrateIfPossible  as the default eviction strategy, is not supported via HCO, even though KV CR support it.

We are using JSON Patch annotation to update the value in KV CR.

---
1) Updating KV CR via HCO CR using JSON Patch.
]$ oc annotate --overwrite -n openshift-cnv hyperconverged kubevirt-hyperconverged kubevirt.kubevirt.io/jsonpatch='[{
      "op": "add",
      "path": "/spec/configuration/evictionStrategy",
      "value": "LiveMigrateIfPossible"
  }]'

2) KV CR
]$ oc get kubevirt kubevirt-kubevirt-hyperconverged -n openshift-cnv -o yaml  | grep evictionStrategy
    evictionStrategy: LiveMigrateIfPossible

Comment 5 Kedar Bidarkar 2023-07-06 09:28:40 UTC

Also currently directly updating evictionStrategy:LiveMigrateIfPossible  as the default eviction strategy, is not supported via HCO, even though KV CR support it.

We need to update HCO CR with "evictionStrategy:LiveMigrateIfPossible" as the default eviction strategy.

Comment 6 Kedar Bidarkar 2023-07-06 09:59:53 UTC

QE has tested with evictionStrategy: LiveMigrateIfPossible , which addresses the problem stated in the description / comment1,

1) Node drain was successful for, VM with HPP StorageClass, with evictionStrategy: LiveMigrateIfPossible ; 
Due to HPP being the local storage, the VM was in Scheduling state and got back into running state, as soon as the Node was un-cordoned, after Node Drain.
2) GPU VM restarts on new node with evictionStrategy: LiveMigrateIfPossible, when node drained
3) normal VM Live Migrated to different node with evictionStrategy: LiveMigrateIfPossible, when node drained
4) No alerts when evictionStrategy: LiveMigrateIfPossible
5) alert fires for non-migratable vm when evictionStrategy: LiveMigrate (edited)

Comment 7 Simone Tiraboschi 2023-07-06 11:15:57 UTC

We have to consider that we have exactly the same problem on two different fronts (with a slightly different semantic).

On the HCO CR we have:

/spec/workloadUpdateStrategy/workloadUpdateMethods where we can set an ordered list (in terms of preference) of methods choosing between LiveMigrate, Evict and none.

Then we have the cluster level eviction strategy (/spec/evictionStrategy) where we can choose between [None;LiveMigrate;External;LiveMigrateIfPossible(*)] (*in a near future as the next step).

/spec/evictionStrategy is going to influence what is going to happen on node drains (and this is going to affect the last step of OCP upgrades or in general MCO actions):
- if we have LiveMigrate there and the VM could not be livemigrated, the drain will be blocked by the corresponding PDB.
This mean that during OCP upgrade the final reboot of a node running a not migratable VM will be eventually blocked, MCO will report it and the cluster admin will have eventually to explicitly shutdown the unmigratable VM in order to continue.
- with LiveMigrateIfPossible, unmigratable VMs will be automatically evicted to favour platform processes (upgrade/drains...) over workload stability.

/spec/workloadUpdateStrategy/workloadUpdateMethods is instead going to influence only what is going to virt-launcher pods after a CNV upgrade.

When we initially introduced it with https://github.com/kubevirt/hyperconverged-cluster-operator/pull/1348the default value was: {"LiveMigrate", "Evict"}
meaning that VMs should be tried first to be live migrated and if not possible eventually evicted.
Which is exactly the behaviour of LiveMigrateIfPossible.
But then we removed it as for: https://bugzilla.redhat.com/show_bug.cgi?id=2017394
and later on, when we finally really introduced it, we chose to have as the default only {"LiveMigrate"} to privilege by default workload stability over platform upgrades.
So we introduced OutdatedVirtualMachineInstanceWorkloads alert to track that at least one VMI is running with an outdated virt-launcher and the kubevirt.io/outdatedLauncherImage label to let the cluster admin easily track it.

Now setting by default LiveMigrateIfPossible on /spec/evictionStrategy and only {"LiveMigrate"} on /spec/workloadUpdateStrategy/workloadUpdateMethods is going to introduce two different behaviours on OCP (and in general node drains) upgrades (and in general on MCO actions) from CNV upgrades and this can become really confusing at cluster admin eyes.

For the sake of consistency I see two options here:

1. choose to continue privileging workload stability and continuity over platform upgrades as we did in the past; if not already there, we will probably have to introduce:
- a new alert (with its runbook) on our side clearly stating that a not live migratable VM is preventing the node drain and that it should be shut down (when possible)
- a specific label on the VM to let the cluster admin easily identify the problematic objects

2. amend /spec/workloadUpdateStrategy/workloadUpdateMethods setting {"LiveMigrate", "Evict"} as the default there to get a consistent behaviour (but we have to properly explain this is the release note because it's a relevant change).

Personally I still think that we should aim to guarantee (by default!) workload business continuity and simply let the cluster admin shutdown unmigratable VMs only when he think  its a good moment for that.
Then he can still explicitly choose `{"LiveMigrate", "Evict"}` and `LiveMigrateIfPossible` if he thinks that platform/product upgrades are more important than workload stability.

Comment 8 Fabian Deutsch 2023-07-07 08:46:48 UTC

Simone, thanks for the ... summary (?) :)

"""
Personally I still think that we should aim to guarantee (by default!) workload business continuity and simply let the cluster admin shutdown unmigratable VMs only when he think its a good moment for that.
"""

tl;dr Yes. And support in HCO to set LiveMigrateIfPossible

More nuanced.
I think we are abusing PDBs in KubeVirt, we don't protect the workload, but we use them as a signaling mechanism for LMs.
PDBs do not really indicate the criticality of a workload.

Why is this relevant: Because today we have set the expectation that VMs will be protected, and for the mid future we must keep this behavior (therefore the tl;dr Yes).
If we change it, then we need to do it backwards compatible without risking workloads. And this mechanism for protecting workloads could be PDBs. But this means, we need to remove our general reliance on PDBs for LM signaling.
And this is covered in https://issues.redhat.com/browse/CNV-28528

Thus I'm seeing the following steps:
1. VMs can block cluster upgrades, Admins need to kill VMs if required. Admin must have the ability to change the policy cluster wide to LiveMigrateIfPossible
2. Address https://issues.redhat.com/browse/CNV-28528 to not rely on PDBs
3. After #3 VMs will by default not (permanently) block drains anymore, we need to allow using PDBs for protecting selected workloads.
4. Find a migration path from todays state (all VMs are protected) to the future state (no VM is protected by default)

Comment 9 Simone Tiraboschi 2023-07-10 08:44:55 UTC

OK,
so now we have LiveMigrateIfPossible as an allowed option while LiveMigrate is still the default.

Moving this to ON_QA.

Comment 10 Akriti Gupta 2023-07-19 09:47:33 UTC

Verified on :  v4.14.0.rhel9-1259

evictionStrategy LiveMigrateIfPossible can be set through HCO

[akriti@fedora ~]$ oc edit hco kubevirt-hyperconverged -n openshift-cnv -o yaml
[akriti@fedora ~]$ oc get hco kubevirt-hyperconverged -n openshift-cnv -o yaml | grep evictionStrategy
  evictionStrategy: LiveMigrateIfPossible

Both  HPP and GPU VMs restarted on another node when evictionStrategy:LiveMigrateIfPossible in HCO

[akriti@fedora ~]$ oc get vm
NAME                   AGE     STATUS    READY
vm-rhel88-source-hpp   2m45s   Stopped   False
vm1-rhel88-ocs         11m     Stopped   False
[akriti@fedora ~]$ virtctl start vm-rhel88-source-hpp
VM vm-rhel88-source-hpp was scheduled to start
[akriti@fedora ~]$ virtctl start vm1-rhel88-ocs
VM vm1-rhel88-ocs was scheduled to start
[akriti@fedora ~]$ oc get vmi
NAME                   AGE   PHASE     IP             NODENAME                                         READY
vm-fedora-with-pvc     19h   Running   10.130.1.226   cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com   True
vm-rhel88-source-hpp   36s   Running   10.129.1.90    cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com   True
vm1-rhel88-ocs         28s   Running   10.129.1.89    cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com   True
[akriti@fedora ~]$ virtctl console vm-rhel88-source-hpp
Successfully connected to vm-rhel88-source-hpp console. The escape sequence is ^]

Red Hat Enterprise Linux 8.8 (Ootpa)
Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64

Activate the web console with: systemctl enable --now cockpit.socket

vm-rhel88-source-hpp login: cloud-user
Password: 
[cloud-user@vm-rhel88-source-hpp ~]$ [akriti@fedora ~]$ 
[akriti@fedora ~]$ virtctl console vm1-rhel88-ocs
Successfully connected to vm1-rhel88-ocs console. The escape sequence is ^]

Red Hat Enterprise Linux 8.8 (Ootpa)
Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64

Activate the web console with: systemctl enable --now cockpit.socket

vm1-rhel88-ocs login: cloud-user
Password: 
Last login: Wed Jul 19 05:23:21 on ttyS0
[cloud-user@vm1-rhel88-ocs ~]$ [akriti@fedora ~]$ 
[akriti@fedora ~]$ oc adm drain cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com --force=true --ignore-daemonsets=true --delete-emptydir-data=true
node/cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com cordoned
.
.
node/cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com drained
[akriti@fedora ~]$ oc adm uncordon cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com
node/cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com uncordoned
[akriti@fedora ~]$ oc get vmi
NAME                   AGE     PHASE     IP             NODENAME                                         READY
vm-fedora-with-pvc     19h     Running   10.130.1.226   cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com   True
vm-rhel88-source-hpp   7m52s   Running   10.129.1.97    cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com   True
vm1-rhel88-ocs         7m51s   Running   10.130.0.63    cnv-qe-infra-03.cnvqe3.lab.eng.rdu2.redhat.com   True
[akriti@fedora ~]$ virtctl console vm1-rhel88-ocs
Successfully connected to vm1-rhel88-ocs console. The escape sequence is ^]

Red Hat Enterprise Linux 8.8 (Ootpa)
Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64

Activate the web console with: systemctl enable --now cockpit.socket

vm1-rhel88-ocs login: cloud-user
Password: 
Last login: Wed Jul 19 05:31:58 on ttyS0
[cloud-user@vm1-rhel88-ocs ~]$ [akriti@fedora ~]$ 
[akriti@fedora ~]$ virtctl console vm-rhel88-source-hpp
Successfully connected to vm-rhel88-source-hpp console. The escape sequence is ^]

Red Hat Enterprise Linux 8.8 (Ootpa)
Kernel 4.18.0-477.17.1.el8_8.x86_64 on an x86_64

Activate the web console with: systemctl enable --now cockpit.socket

vm-rhel88-source-hpp login: cloud-user
Password: 
Last login: Wed Jul 19 05:31:18 on ttyS0
[cloud-user@vm-rhel88-source-hpp ~]$ [akriti@fedora ~]$

Comment 12 errata-xmlrpc 2023-11-08 14:05:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817

Note You need to log in before you can comment on or make changes to this bug.