Created attachment 1871696 [details] feature gate yaml Description of problem: On SNO cluster which has live migration feature disabled, not able to create VM. It throws error: Error "LiveMigration feature gate is not enabled" for field "spec.template.spec.evictionStrategy". Version-Release number of selected component (if applicable): CNV 4.10.1 + OCP 4.10.0-0.nightly-2022-04-09-185509 How reproducible: 100% Steps to Reproduce: 1. create VM on SNO cluster 2. 3. Actual results: It throws error: Error "LiveMigration feature gate is not enabled" for field "spec.template.spec.evictionStrategy". Expected results: VM is created successfully Additional info:
Created attachment 1871697 [details] error_on_wizard
I think the better solution for this problem is enabling live migration on SNO instead of fixing the templates.
HCO started explicitly ignoring liveMigration feature gate livemigration eviction strategy on SNO (we a single node you cannot migrate a VM anywhere!) as a workaround for a virt-controller bug/behavior that was continuously and infinitely trying to migrate a VM on upgrades even with a single node, see: https://bugzilla.redhat.com/show_bug.cgi?id=2065308 Reverting that choice is going to reopen 2065308
The workloadUpdateStrategy in HCO looks like below: ``` workloadUpdateStrategy: batchEvictionInterval: 1m0s batchEvictionSize: 10 workloadUpdateMethods: - LiveMigrate ```
See the same error from CLI: $ oc process rhel8-server-small -n openshift -p NAME=test -p DATA_SOURCE_NAME=rhel8-7b20b8e2e1c6 -p DATA_SOURCE_NAMESPACE=openshift-virtualization-os-images | oc create -f - The request is invalid: spec.template.spec.evictionStrategy: LiveMigration feature gate is not enabled
Move the bug to virt as it's reproduced on CLI. CNV version: CNV-v4.10.1-60
Raising the priority of this to Urgent as I concur that it is a blocker. We've explored 3 different options: 1) remove the evictionStrategy from templates only in SNO mode. This has the downside that it requires an API change to the virt-ssp-operator, and is a temporary fix, so this is not ideal. 2) re-enable the feature gate in HCO. This is not so nice because it's highly visible and might have second order effects. See discussion in: https://bugzilla.redhat.com/show_bug.cgi?id=2065308 3) for SNO mode only, we will simply ignore evictionStrategy in virt-api's webhook. We're planning to go with the third option since it has no real drawbacks as compared to the other strategies.
Reflecting on Comment #7, there is actually a significant downside to ignoring the evictionStrategy. If that is all that's done, it would block upgrades--which is also a big problem. While that could also be addressed, it's no longer a quick one-line fix. Removing evictionStrategy from templates is likely the more sensible approach. FYI Simone, Krzysztof.
If a new API is made available, HCO can easily communicate the SNO mode to the ssp-operator so that it can dynamically fine tune the templates.
Patch https://github.com/kubevirt/ssp-operator/pull/343 posted by Federico.
[kbidarka@localhost auth]$ oc get templates -n openshift windows10-desktop-medium -o yaml | grep evictionStrategy evictionStrategy: LiveMigrate [kbidarka@localhost auth]$ oc get templates -n openshift rhel8-server-medium -o yaml | grep evictionStrategy evictionStrategy: LiveMigrate [kbidarka@localhost auth]$ [kbidarka@localhost auth]$ oc get kubevirt kubevirt-kubevirt-hyperconverged -n openshift-cnv -o yaml | grep LiveMigrate - LiveMigrate [kbidarka@localhost auth]$ oc get infrastructure.config.openshift.io cluster -o json | jq ".status.infrastructureTopology" "SingleReplica"
VERIFIED with v4.10.1-97
[kbidarka@localhost auth]$ oc get infrastructure.config.openshift.io cluster -o json | jq ".status.infrastructureTopology" "SingleReplica" [kbidarka@localhost manifest]$ oc -n openshift process rhel8-server-medium -p NAME=vm-rhel8-ocs -p CLOUD_USER_PASSWORD=redhat -o yaml > vm-rhel8-ocs.yaml [kbidarka@localhost manifest]$ cat vm-rhel8-ocs.yaml | grep eviction evictionStrategy: LiveMigrate [kbidarka@localhost manifest]$ oc apply -f vm-rhel8-ocs.yaml virtualmachine.kubevirt.io/vm-rhel8-ocs created [kbidarka@localhost manifest]$ virtctl start vm-rhel8-ocs VM vm-rhel8-ocs was scheduled to start [kbidarka@localhost manifest]$ oc get vmi NAME AGE PHASE IP NODENAME READY rhel8-loud-swift 31m Running 11.xx.yy.zz node-23.redhat.com True vm-rhel8-ocs 4m8s Running 12.yy.zz.aa node-23.redhat.com True [kbidarka@localhost manifest]$ virtctl console vm-rhel8-ocs Successfully connected to vm-rhel8-ocs console. The escape sequence is ^] Red Hat Enterprise Linux 8.5 (Ootpa) Kernel 4.18.0-348.23.1.el8_5.x86_64 on an x86_64 Activate the web console with: systemctl enable --now cockpit.socket vm-rhel8-ocs login: cloud-user Password: Last failed login: Sun May 1 10:19:11 EDT 2022 on ttyS0 There was 1 failed login attempt since the last successful login. [cloud-user@vm-rhel8-ocs ~]$ [cloud-user@vm-rhel8-ocs ~]$ [cloud-user@vm-rhel8-ocs ~]$ [cloud-user@vm-rhel8-ocs ~]$ [kbidarka@localhost manifest]$ [kbidarka@localhost manifest]$
Adding known issue to the OpenShift Virtualization 4.10.1 release notes Known issue: When a VMI has the spec.evictionStrategy field set to LiveMigrate, the VMIs can't be migrated or evicted and the upgrade fails. By default, VM Templates installed by the SSP operator have the spec.evictionStrategy field set to LiveMigrate. The LiveMigration feature gate is enabled even when the cluster, in SNO mode, has only 1 worker node. Workaround: You must implement one of the following workarounds to upgrade the cluster: 1. Remove the spec.evictionStrategy field from the VM declaration. 2. Manually stop the VM before starting a OpenShift Virtualization upgrade.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.1 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4668