Bug 2073880 - Cannot create VM on SNO cluster as live migration feature is not enabled
Summary: Cannot create VM on SNO cluster as live migration feature is not enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.10.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.10.1
Assignee: Simone Tiraboschi
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-11 03:40 UTC by Guohua Ouyang
Modified: 2023-05-05 08:19 UTC (History)
8 users (show)

Fixed In Version: hco-bundle-registry-container-v4.10.1-95
Doc Type: Known Issue
Doc Text:
Known issue: When a VMI has the spec.evictionStrategy field set to LiveMigrate, the VMIs can't be migrated or evicted and the upgrade fails. Cause: By default, VM Templates installed by the SSP operator have the spec.evictionStrategy field set to LiveMigrate. The LiveMigration feature gate is enabled even when the cluster in SNO mode has only 1 worker node. Workaround: You must implement a workaround to upgrade the cluster. There are 2 possible workarounds. 1. Remove the 'spec.evictionStrategy' field from the VM declaration. 2. Manually stop the VM before starting a OpenShift Virtualization upgrade.
Clone Of:
Environment:
Last Closed: 2022-05-18 20:27:35 UTC
Target Upstream Version:
Embargoed:
kmajcher: needinfo+


Attachments (Terms of Use)
feature gate yaml (13.07 KB, text/plain)
2022-04-11 03:40 UTC, Guohua Ouyang
no flags Details
error_on_wizard (127.09 KB, image/png)
2022-04-11 03:43 UTC, Guohua Ouyang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt hyperconverged-cluster-operator pull 1908 0 None Merged Avoid special handling for liveMigration FG on SNO 2022-04-28 15:27:26 UTC
Github kubevirt hyperconverged-cluster-operator pull 1909 0 None Merged [release-1.6] Avoid special handling for liveMigration FG on SNO 2022-04-28 15:27:27 UTC
Github kubevirt ssp-operator pull 343 0 None closed Remove evictionStrategy from VM in SNO 2022-04-28 15:27:28 UTC
Red Hat Issue Tracker CNV-17536 0 None None None 2022-10-27 06:36:29 UTC
Red Hat Product Errata RHSA-2022:4668 0 None None None 2022-05-18 20:28:57 UTC

Description Guohua Ouyang 2022-04-11 03:40:39 UTC
Created attachment 1871696 [details]
feature gate yaml

Description of problem:
On SNO cluster which has live migration feature disabled, not able to create VM. It throws error:

Error "LiveMigration feature gate is not enabled" for field "spec.template.spec.evictionStrategy".

Version-Release number of selected component (if applicable):
CNV 4.10.1 + OCP 4.10.0-0.nightly-2022-04-09-185509

How reproducible:
100%

Steps to Reproduce:
1. create VM on SNO cluster
2.
3.

Actual results:
It throws error:
Error "LiveMigration feature gate is not enabled" for field "spec.template.spec.evictionStrategy".

Expected results:
VM is created successfully

Additional info:

Comment 1 Guohua Ouyang 2022-04-11 03:43:41 UTC
Created attachment 1871697 [details]
error_on_wizard

Comment 2 Guohua Ouyang 2022-04-11 04:34:41 UTC
I think the better solution for this problem is enabling live migration on SNO instead of fixing the templates.

Comment 3 Simone Tiraboschi 2022-04-11 07:26:37 UTC
HCO started explicitly ignoring liveMigration feature gate livemigration eviction strategy on SNO (we a single node you cannot migrate a VM anywhere!) as a workaround for a virt-controller bug/behavior that was continuously and infinitely trying to migrate a VM on upgrades even with a single node, see: https://bugzilla.redhat.com/show_bug.cgi?id=2065308

Reverting that choice is going to reopen 2065308

Comment 4 Guohua Ouyang 2022-04-11 09:46:28 UTC
The workloadUpdateStrategy in HCO looks like below:
```
    workloadUpdateStrategy:
      batchEvictionInterval: 1m0s
      batchEvictionSize: 10
      workloadUpdateMethods:
      - LiveMigrate
```

Comment 5 Guohua Ouyang 2022-04-11 10:00:35 UTC
See the same error from CLI:
$ oc process rhel8-server-small -n openshift -p NAME=test -p DATA_SOURCE_NAME=rhel8-7b20b8e2e1c6 -p DATA_SOURCE_NAMESPACE=openshift-virtualization-os-images | oc create -f -
The request is invalid: spec.template.spec.evictionStrategy: LiveMigration feature gate is not enabled

Comment 6 Guohua Ouyang 2022-04-11 10:04:20 UTC
Move the bug to virt as it's reproduced on CLI.
CNV version: CNV-v4.10.1-60

Comment 7 sgott 2022-04-13 15:22:00 UTC
Raising the priority of this to Urgent as I concur that it is a blocker.

We've explored 3 different options:

1) remove the evictionStrategy from templates only in SNO mode. This has the downside that it requires an API change to the virt-ssp-operator, and is a temporary fix, so this is not ideal.
2) re-enable the feature gate in HCO. This is not so nice because it's highly visible and might have second order effects. See discussion in: https://bugzilla.redhat.com/show_bug.cgi?id=2065308
3) for SNO mode only, we will simply ignore evictionStrategy in virt-api's webhook.

We're planning to go with the third option since it has no real drawbacks as compared to the other strategies.

Comment 8 sgott 2022-04-19 16:07:52 UTC
Reflecting on Comment #7, there is actually a significant downside to ignoring the evictionStrategy. If that is all that's done, it would block upgrades--which is also a big problem. While that could also be addressed, it's no longer a quick one-line fix.

Removing evictionStrategy from templates is likely the more sensible approach.

FYI Simone, Krzysztof.

Comment 9 Simone Tiraboschi 2022-04-20 08:22:16 UTC
If a new API is made available, HCO can easily communicate the SNO mode to the ssp-operator so that it can  dynamically fine tune the templates.

Comment 10 Antonio Cardace 2022-04-22 12:20:08 UTC
Patch https://github.com/kubevirt/ssp-operator/pull/343 posted by Federico.

Comment 11 Kedar Bidarkar 2022-05-01 13:37:35 UTC
[kbidarka@localhost auth]$ oc get templates -n openshift windows10-desktop-medium -o yaml | grep evictionStrategy
        evictionStrategy: LiveMigrate
[kbidarka@localhost auth]$ oc get templates -n openshift rhel8-server-medium -o yaml | grep evictionStrategy
        evictionStrategy: LiveMigrate
[kbidarka@localhost auth]$ 
[kbidarka@localhost auth]$ oc get kubevirt kubevirt-kubevirt-hyperconverged -n openshift-cnv -o yaml | grep LiveMigrate 
    - LiveMigrate
[kbidarka@localhost auth]$ oc get infrastructure.config.openshift.io cluster -o json | jq ".status.infrastructureTopology"
"SingleReplica"

Comment 13 Kedar Bidarkar 2022-05-01 13:40:55 UTC
VERIFIED with v4.10.1-97

Comment 15 Kedar Bidarkar 2022-05-01 14:22:28 UTC
[kbidarka@localhost auth]$ oc get infrastructure.config.openshift.io cluster -o json | jq ".status.infrastructureTopology"
"SingleReplica"

[kbidarka@localhost manifest]$ oc -n openshift process rhel8-server-medium -p NAME=vm-rhel8-ocs -p CLOUD_USER_PASSWORD=redhat -o yaml > vm-rhel8-ocs.yaml

[kbidarka@localhost manifest]$ cat vm-rhel8-ocs.yaml | grep eviction
        evictionStrategy: LiveMigrate

[kbidarka@localhost manifest]$ oc apply -f vm-rhel8-ocs.yaml
virtualmachine.kubevirt.io/vm-rhel8-ocs created

[kbidarka@localhost manifest]$ virtctl start vm-rhel8-ocs
VM vm-rhel8-ocs was scheduled to start

[kbidarka@localhost manifest]$ oc get vmi 
NAME               AGE    PHASE     IP             NODENAME                                         READY
rhel8-loud-swift   31m    Running   11.xx.yy.zz   node-23.redhat.com   True
vm-rhel8-ocs       4m8s   Running   12.yy.zz.aa   node-23.redhat.com   True

[kbidarka@localhost manifest]$ virtctl console vm-rhel8-ocs
Successfully connected to vm-rhel8-ocs console. The escape sequence is ^]

Red Hat Enterprise Linux 8.5 (Ootpa)
Kernel 4.18.0-348.23.1.el8_5.x86_64 on an x86_64

Activate the web console with: systemctl enable --now cockpit.socket

vm-rhel8-ocs login: cloud-user
Password: 
Last failed login: Sun May  1 10:19:11 EDT 2022 on ttyS0
There was 1 failed login attempt since the last successful login.
[cloud-user@vm-rhel8-ocs ~]$ 
[cloud-user@vm-rhel8-ocs ~]$ 
[cloud-user@vm-rhel8-ocs ~]$ 
[cloud-user@vm-rhel8-ocs ~]$ [kbidarka@localhost manifest]$ 
[kbidarka@localhost manifest]$

Comment 16 Kedar Bidarkar 2022-05-01 14:23:14 UTC
VERIFIED with v4.10.1-97

Comment 19 ctomasko 2022-05-09 21:41:58 UTC
Adding known issue to the OpenShift Virtualization 4.10.1 release notes

Known issue:
When a VMI has the spec.evictionStrategy field set to LiveMigrate, the VMIs can't be migrated or evicted and the upgrade fails.

By default, VM Templates installed by the SSP operator have the spec.evictionStrategy field set to LiveMigrate. The LiveMigration feature gate is enabled even when the cluster, in SNO mode, has only 1 worker node. 

Workaround: 
You must implement one of the following workarounds to upgrade the cluster:

1. Remove the spec.evictionStrategy field from the VM declaration.

2. Manually stop the VM before starting a OpenShift Virtualization upgrade.

Comment 24 errata-xmlrpc 2022-05-18 20:27:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.1 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4668


Note You need to log in before you can comment on or make changes to this bug.